Bug#509238: panic backtrace
Quoting: Christian Perrier >>>So, in short, in regular mode, it crashes (always at the same place) >>>but in vga=771 mode, it doesn't, right? >>Correct. >And I assume that you get no crash as well if you're using the >graphical installer. I had not tried the graphical installer, figuring that the more basic the better, but I just did. It appears that just before going to graphical, the installer puts the screen in mode 771. I get the smaller, sharper fonts. So, no. It doesn't crash in the graphical installer. BTW, the graphical installer looks good! >I'm puzzled to reassign this bug report. This is obviously a race >condition somewhere. Probably a weird kernel issue but investigating >further istricky and I fear that this bug report might remain >uninvestigated further for quite a while until it "magically" solves >in the future with a new kernel (though probably not for Lenny). That's probably the reasonable thing to do. >I propose documenting this in the errata file, at the minimum. Yes. A prominent (easy to find) note that would say something like "If your laptop crashes completely during installation, try replacing the boot option vga=normal with vga=771 or use the graphical installer" would have saved some time, but then the problem would not have become known. Thank you very much for your help! -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#509238: panic backtrace
Quoting The Eclectic One (eclec...@sdf.lonestar.org): > > >Quoting: Christian Perrier > >>Quoting The Eclectic One (eclec...@sdf.lonestar.org): > > >> First thought: race condition (the panic message contained a backtrace > >> of different threads), so then I tried multiple times with only one > >> change at at time: expert mode, regular mode, ethdetect -x, vga=771, > >> vga normal. It turns out that the culprit was the vga option. With > >> vga=771, I get no crash/panic in either expert or regular mode, with > > >So, in short, in regular mode, it crashes (always at the same place) > >but in vga=771 mode, it doesn't, right? > > Correct. And I assume that you get no crash as well if you're using the graphical installer. I'm puzzled to reassign this bug report. This is obviously a race condition somewhere. Probably a weird kernel issue but investigating further istricky and I fear that this bug report might remain uninvestigated further for quite a while until it "magically" solves in the future with a new kernel (though probably not for Lenny). I propose documenting this in the errata file, at the minimum. signature.asc Description: Digital signature
Bug#509238: panic backtrace
>Quoting: Christian Perrier >>Quoting The Eclectic One (eclec...@sdf.lonestar.org): >> First thought: race condition (the panic message contained a backtrace >> of different threads), so then I tried multiple times with only one >> change at at time: expert mode, regular mode, ethdetect -x, vga=771, >> vga normal. It turns out that the culprit was the vga option. With >> vga=771, I get no crash/panic in either expert or regular mode, with >So, in short, in regular mode, it crashes (always at the same place) >but in vga=771 mode, it doesn't, right? Correct. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#509238: panic backtrace
Quoting The Eclectic One (eclec...@sdf.lonestar.org): > First thought: race condition (the panic message contained a backtrace > of different threads), so then I tried multiple times with only one > change at at time: expert mode, regular mode, ethdetect -x, vga=771, > vga normal. It turns out that the culprit was the vga option. With > vga=771, I get no crash/panic in either expert or regular mode, with So, in short, in regular mode, it crashes (always at the same place) but in vga=771 mode, it doesn't, right? signature.asc Description: Digital signature
Bug#509238: panic backtrace
Quoting Christian Perrier >> Ok, tried a few more times. I usually get the same kernel panic screen, >Did you try in expert mode, ie choosing it from the "Advanced options" >in the boot menu. Yes, I tried expert as well as regular. >In expert mode, when you reach the HW detection step, you'll get a >question about PCMCIA options. They're not necessarily relevant but >checking if the crash happens before or after it would help No question about PCMCIA before getting to the network detection screen/status bar. > ... >Some drivers in 2.6.18 provided firmware blobs that have been >extracted from the source and are now provided as separate udebs: >r...@mykerinos:~> apt-cache search firmware 2100 >firmware-ipw2x00 - Binary firmware for Intel Pro Wireless 2100, 2200 and 2915 That explains the missing firmware. >If that firmware is needed for ipw2100, you'll be prompted about >thisbut you're not, which means the crash happens before..:-) Correct. >> Anything else I can do? More tests? >Before the network devices screen (for instance, when prompted for >language), could you switch to VT2 (Alt+F2) and, from there edit >/bin/ethdetect.sh >Add "-x" to the first line: >#!/bin/sh -x >Then go back to VT1, continue to the step where the crash happens and >sitch to VT4 before it happens. As ethdetect.sh will be in debug mode, >we'll see all its output and could narrow down the exact line where >the crash happens. Ok, did that and also vga=771. I was hoping to get a smaller font to see more of the backtrace. This is what I got: [ output of ethdetect -x - lsifaces (3 lines) sed (4), grep (4) and sed (3) ] Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth0 up Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth0 down Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth1 up Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth1 down Dec 22 15:42:48 main-menu[1328]: (process:9628) + check-missing-firmware Dec 22 15:42:48 main-menu[1328]: (process:9628) + sysfs-update-devnames Dec 22 15:42:48 main-menu[1328]: (process:9628) + cleanup Dec 22 15:42:48 main-menu[1328]: (process:9628) + rm -f /tmp/devnames-static.txt Dec 22 15:42:48 main-menu[1328]: DEBUG: resolver (libslang2-udeb): package doesn't exist (ignored) It didn't crash! After that I proceeded to console 1 and did see the screen explaining that I needed the missing firmware - ipw2100-1.3.fw. As I had thought, even with the missing firmware, the culprit wasn't the wireless device. First thought: race condition (the panic message contained a backtrace of different threads), so then I tried multiple times with only one change at at time: expert mode, regular mode, ethdetect -x, vga=771, vga normal. It turns out that the culprit was the vga option. With vga=771, I get no crash/panic in either expert or regular mode, with sh -x in ethdetect or not. It might still be a race condition but if so it seems to be triggered by something related to the display. Very strange. The display works fine (with a big font) in normal mode. In 771 mode, the font is smaller and the curses windows sharper. Even though I now have a work-around, I'm willing to keep debugging if if would be useful. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#509238: panic backtrace
Quoting The Eclectic One (eclec...@sdf.lonestar.org): > >This is where we would need to really narrow things down and where > >using the expert mode could help. > > Ok, tried a few more times. I usually get the same kernel panic screen, Did you try in expert mode, ie choosing it from the "Advanced options" in the boot menu. In expert mode, when you reach the HW detection step, you'll get a question about PCMCIA options. They're not necessarily relevant but checking if the crash happens before or after it would help > This is exactly as the screen froze. I wonder if the only line visible that > refers to eth0 is when it actually detected the main (wired) ethernet > interface. As this is the interface I want to use and it appears that it > generated no errors, it seems to be a good sign. All the ipw2100 lines > refer to the wireless interface, which works fine on 2.6.18. I never knew > it required separate firmware. Since it appears not to be included in the Some drivers in 2.6.18 provided firmware blobs that have been extracted from the source and are now provided as separate udebs: r...@mykerinos:~> apt-cache search firmware 2100 firmware-ipw2x00 - Binary firmware for Intel Pro Wireless 2100, 2200 and 2915 If that firmware is needed for ipw2100, you'll be prompted about thisbut you're not, which means the crash happens before..:-) > Anything else I can do? More tests? Before the network devices screen (for instance, when prompted for language), could you switch to VT2 (Alt+F2) and, from there edit /bin/ethdetect.sh ? Add "-x" to the first line: #!/bin/sh -x Then go back to VT1, continue to the step where the crash happens and sitch to VT4 before it happens. As ethdetect.sh will be in debug mode, we'll see all its output and could narrow down the exact line where the crash happens. (doing this in expert mode will give you more control about the moment where you need to switch to VT4) signature.asc Description: Digital signature
Bug#509238: panic backtrace
Quoting Christian Perrier >OK. Are you in the position of testing with something else than a USB >stick boot? Actually, before I gave up on CDs (ruined 11 CD-Rs, probably marginal media, drive or wodim problems) I had a CD made on a windows machine of the lenny installer RC1. This is what I just tried again. I get the same panic at the same point in the installation process. Of course the addresses are slightly different as the underlying kernel code is different. >The best would be using the "netboot" ISO (called mini.iso) from a CD. > ... >> Ok, it happens immediately after the identifying network hardware >> screen. So soon after in fact that I thought I would have no time to >"after identifying network HW" means 'after the system displays a >progress bar saying "Identifying network hardware", right? Correct. >This is where we would need to really narrow things down and where >using the expert mode could help. Ok, tried a few more times. I usually get the same kernel panic screen, but on one occasion (in expert mode), it crashed a little differently and I saw this: Dec 21 17:21:46 kernel: [ 173.096981] ipw2100: Intel(R) PRO/Wireless 2100 Network Driver, git-1.2.2 Dec 21 17:21:46 kernel: [ 173.096987] ipw2100: Dec 21 17:21:46 kernel: [ 173.096987] ipw2100: Copyright(c) 2003-2006 Intel Corpration Dec 21 17:21:46 kernel: [ 173.206202] ACPI: PCI Interrupt :02:03:0[A] -> Link [LKNB] -> GSI 5 (level, low) -> IRQ 5 Dec 21 17:21:46 kernel: [ 173.206206] ipw2100: Detected Intel PRO/WIreless 2100 Network Connection Dec 21 17:21:46 kernel: [ 173.206619] firmware: requesting ipw2100-1.3.fw Dec 21 17:21:46 net/hw-detect.hotplug: Detected hotpluggable network interface eth0 Dec 21 17:21:46 kernel: [ 173.349107] ipw2100: eth1: Firmware 'ipw2100-1.3.fw' not available or load failed. Dec 21 17:21:46 kernel: [ 173.349114] ipw2100: eth1: ipw2100_get_firmware failed: -2 Dec 21 17:21:46 kernel: [ 173.349118] ipw2100: eth1: Failed to power on the adapter. Dec 21 17:21:46 kernel: [ 173.349121] ipw2100: eth1: Failed to start the firmware. Dec 21 17:21:46 kernel: [ 173.349125] ipw2100Error calling register_netdev. Dec 21 17:21:46 kernel: [ 173.349443] ACPI: PCI interrupt for device :02:03.0 disabled Dec 21 17:21:46 kernel: [ 173.349450] ipw2100: probe of :02:03.0 failed with error -5 Dec 21 17:21:46 hw-detect: insmod /lib/modules/2.6.26-1-486/kernel/drivers ieee1394/sbp2.ko Dec 21 17:21:46 kernel: [ 173.480359] eth1394: eth1: IPv4 over IEEE 1394 (fw-host0) Dec 21 17:21:46 net/hw-detect.hotplug: Detected hotpluggable network interface e This is exactly as the screen froze. I wonder if the only line visible that refers to eth0 is when it actually detected the main (wired) ethernet interface. As this is the interface I want to use and it appears that it generated no errors, it seems to be a good sign. All the ipw2100 lines refer to the wireless interface, which works fine on 2.6.18. I never knew it required separate firmware. Since it appears not to be included in the installer, I presume I'd have to find it in the additional drivers media. In any case, the errors notwithstanding, it looks like the installer handled the wireless interface correctly. Am I right that it seems it's the eth1394 driver that is causing the crash? I've tried (based on the help example: hw-detect/start_pcmcia=false) to add hw-detect/start_eth1394=false but it doesn't seem to have an effect. It still crashes. BTW, the eth1394 driver is loaded without errors under 2.6.18, although it is not used. I have no fire-wire devices to test it. Early on, I passed the option to protect the firewire interface addresses (faffd800 - faffdfff) but it still crashed. I suppose it would help if I could get more lines in the console screens. What option can I pass the installer to have smaller type, or even better 2 side by side pages? I have a WUXGA (1920 X 1200) screen so it should be possible to see a much longer backtrace. >... >OK. Thanks for the help trying to narrow things. Being an obvious >problem with the kernel, we really need to triple check that it >happens or not with the last kernel package from unstable (which is >likely, but still...) If you point me to a boot.img.gz of the latest kernel, that I can zcat to the usb memory stick, I'd be happy to test it. Anything else I can do? More tests? -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#509238: panic backtrace
Quoting The Eclectic One (eclec...@sdf.lonestar.org): > >First of all, it would be nice if you could precise what exact version you > >tested. > > I tried both the rc1 debian-testing-i386-netinst.iso and the daily > build as of a few days ago. Same result. OK. Are you in the position of testing with something else than a USB stick boot ? The best would be using the "netboot" ISO (called mini.iso) from a CD. > Ok, I downloaded these 4 again. It turns out that the daily build iso > was the exact same I had tried last. The boot.imb.gz was different so Yes, daily builds have not been rebuilt since Dec 18th. Though this is not the source of your problems, we might prefer really testing with the version we will release. > Actually, I was able to capture the problem in non-expert mode. > I figure the least amount of changes made in the flow of execution > of the program, the easier it would be to duplicate the problem. There are no changes in the flow of execution, but more places where the process is stopped, which might help narrowing things down. > Ok, it happens immediately after the identifying network hardware > screen. So soon after in fact that I thought I would have no time to "after identifying network HW" means 'after the system displays a progress bar saying "Identifying network hardware", right? This is where we would need to really narrow things down and where using the expert mode could help. > [ 23.523397] EAX: EBX: df6f ECX: EDX: df413440 > [ 23.523397] ESI: df6f EDI: EBP: ESP: df449f84 > [ 23.523397] DS: 007b ES: 007b FS: GS: SS: 0068 > [ 23.523397] Process events/0 (pid: 5, ti=df448000 task=df43e860 > task.ti=df448000) > [ 23.523397] Stack: c025d42d df6f 0002 c025830e df413440 c0356f00 > c025833e > [ 23.523397]c025835b c012716d df413440 c01276e9 df413448 c0127794 > df43e860 > [ 23.523397]c012990b df449fc8 df449fc8 df413440 c0129763 > c012972d > [ 23.523397] Call Trace: > [ 23.523397] [] dev_deactivate+0x1e/0xbd > [ 23.523397] [] __linkwatch_run_queue+0x118/0x148 > [ 23.523397] [] linkwatch_event+0x0/0x22 > [ 23.523397] [] linkwatch_event+0x1d/0x22 > [ 23.523397] [] run_workqueue+0x75/0xee > [ 23.523397] [] worker_thread+0x0/0xb5 > [ 23.523397] [] worker_thread+0xab/0xb5 > [ 23.523397] [] autoremove_wake_function=0x0/0x2d > [ 23.523397] [] kthread+0x36/0x5b > [ 23.523397] [] kthread+0x0/0x5b > [ 23.523397] [] kernel_thread_helper+0x7/0x10 > [ 23.523397] === > [ 23.523397] Code: ff 42 50 5b c3 89 c1 8d 90 80 00 00 00 89 12 8d 81 a4 00 > 00 00 89 52 04 c7 42 08 00 00 00 00 83 c2 0c 39 c2 75 e7 31 c0 c3 89 c1 <8b> > 40 10 8b 50 30 85 d2 74 04 89 c8 ff d2 c3 53 89 c3 e8 a8 af > [ 23.523397] EIP: [] qdisc_reset+0x2/0x11 SS:ESP 0068:df449f84 > [ 23.915430] Kernel panic - not syncing: Fatall exception in interrupt OK. Thanks for the help trying to narrow things. Being an obvious problem with the kernel, we really need to triple check that it happens or not with the last kernel package from unstable (which is likely, but still...) signature.asc Description: Digital signature
Bug#509238: panic backtrace
>First of all, it would be nice if you could precise what exact version you >tested. I tried both the rc1 debian-testing-i386-netinst.iso and the daily build as of a few days ago. Same result. >The version we would like to see tested at this moment is: >- RC1, which you can download from: >hd-media image=A0: >http://ftp.nl.debian.org/debian/dists/testing/main/installer-i386/current/i= >mages/hd-media/ >netinst ISO: >http://cdimage.debian.org/cdimage/lenny_di_rc1/i386/iso-cd/debian-testing-i= >386-netinst.iso >- Daily builds: >hd-media: http://people.debian.org/~joeyh/d-i/images/daily/hd-media/ >netinst ISO: >http://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/i386/iso-c= >d/debian-testing-i386-netinst.iso Ok, I downloaded these 4 again. It turns out that the daily build iso was the exact same I had tried last. The boot.imb.gz was different so I re-made the usb stick "disk" with the one in the daily build link above and the daily build iso. So the backtrace below applies to the daily builds above. >Then we could probably narrow the problem by using the expert >modewhich will interrupt some steps more often and make the exact >moment the problem happens more easy to spot. Actually, I was able to capture the problem in non-expert mode. I figure the least amount of changes made in the flow of execution of the program, the easier it would be to duplicate the problem. I chose the defaults in the language selection, locale, etc... >Also, when you've spotted the moment where the installer hangs, please >retry the installation and, just before the hang happens, try >switching to console 4 (Alt+F4) and look at what's displayed there... Ok, it happens immediately after the identifying network hardware screen. So soon after in fact that I thought I would have no time to swtich to console 4, but quick fingers did it and quite a bit scrolled off the screen, ending in this: [ 23.523397] EAX: EBX: df6f ECX: EDX: df413440 [ 23.523397] ESI: df6f EDI: EBP: ESP: df449f84 [ 23.523397] DS: 007b ES: 007b FS: GS: SS: 0068 [ 23.523397] Process events/0 (pid: 5, ti=df448000 task=df43e860 task.ti=df448000) [ 23.523397] Stack: c025d42d df6f 0002 c025830e df413440 c0356f00 c025833e [ 23.523397]c025835b c012716d df413440 c01276e9 df413448 c0127794 df43e860 [ 23.523397]c012990b df449fc8 df449fc8 df413440 c0129763 c012972d [ 23.523397] Call Trace: [ 23.523397] [] dev_deactivate+0x1e/0xbd [ 23.523397] [] __linkwatch_run_queue+0x118/0x148 [ 23.523397] [] linkwatch_event+0x0/0x22 [ 23.523397] [] linkwatch_event+0x1d/0x22 [ 23.523397] [] run_workqueue+0x75/0xee [ 23.523397] [] worker_thread+0x0/0xb5 [ 23.523397] [] worker_thread+0xab/0xb5 [ 23.523397] [] autoremove_wake_function=0x0/0x2d [ 23.523397] [] kthread+0x36/0x5b [ 23.523397] [] kthread+0x0/0x5b [ 23.523397] [] kernel_thread_helper+0x7/0x10 [ 23.523397] === [ 23.523397] Code: ff 42 50 5b c3 89 c1 8d 90 80 00 00 00 89 12 8d 81 a4 00 00 00 89 52 04 c7 42 08 00 00 00 00 83 c2 0c 39 c2 75 e7 31 c0 c3 89 c1 <8b> 40 10 8b 50 30 85 d2 74 04 89 c8 ff d2 c3 53 89 c3 e8 a8 af [ 23.523397] EIP: [] qdisc_reset+0x2/0x11 SS:ESP 0068:df449f84 [ 23.915430] Kernel panic - not syncing: Fatall exception in interrupt Of course, the machine is now totally frozen, so no scrolling back. Any other tests? Let me know. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org