Bug#509238: panic backtrace

2008-12-24 Thread The Eclectic One

Quoting: Christian Perrier 
>>>So, in short, in regular mode, it crashes (always at the same place)
>>>but in vga=771 mode, it doesn't, right?

>>Correct.


>And I assume that you get no crash as well if you're using the
>graphical installer.

I had not tried the graphical installer, figuring that the more
basic the better, but I just did.  It appears that just before
going to graphical, the installer puts the screen in mode 771.
I get the smaller, sharper fonts.  So, no. It doesn't crash in
the graphical installer.

BTW, the graphical installer looks good!

>I'm puzzled to reassign this bug report. This is obviously a race
>condition somewhere. Probably a weird kernel issue but investigating
>further istricky and I fear that this bug report might remain
>uninvestigated further for quite a while until it "magically" solves
>in the future with a new kernel (though probably not for Lenny).

That's probably the reasonable thing to do.

>I propose documenting this in the errata file, at the minimum.

Yes.  A prominent (easy to find) note that would say something
like "If your laptop crashes completely during installation,
try replacing the boot option vga=normal with vga=771 or use
the graphical installer" would have saved some time, but then
the problem would not have become known.  

Thank you very much for your help!




-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-23 Thread Christian Perrier
Quoting The Eclectic One (eclec...@sdf.lonestar.org):
> 
> >Quoting: Christian Perrier 
> >>Quoting The Eclectic One (eclec...@sdf.lonestar.org):
> 
> >> First thought: race condition (the panic message contained a backtrace
> >> of different threads), so then I tried multiple times with only one
> >> change at at time: expert mode, regular mode, ethdetect -x, vga=771,
> >> vga normal.  It turns out that the culprit was the vga option.  With
> >> vga=771, I get no crash/panic in either expert or regular mode, with
> 
> >So, in short, in regular mode, it crashes (always at the same place)
> >but in vga=771 mode, it doesn't, right?
> 
> Correct.


And I assume that you get no crash as well if you're using the
graphical installer.

I'm puzzled to reassign this bug report. This is obviously a race
condition somewhere. Probably a weird kernel issue but investigating
further istricky and I fear that this bug report might remain
uninvestigated further for quite a while until it "magically" solves
in the future with a new kernel (though probably not for Lenny).

I propose documenting this in the errata file, at the minimum.



signature.asc
Description: Digital signature


Bug#509238: panic backtrace

2008-12-23 Thread The Eclectic One

>Quoting: Christian Perrier 
>>Quoting The Eclectic One (eclec...@sdf.lonestar.org):

>> First thought: race condition (the panic message contained a backtrace
>> of different threads), so then I tried multiple times with only one
>> change at at time: expert mode, regular mode, ethdetect -x, vga=771,
>> vga normal.  It turns out that the culprit was the vga option.  With
>> vga=771, I get no crash/panic in either expert or regular mode, with

>So, in short, in regular mode, it crashes (always at the same place)
>but in vga=771 mode, it doesn't, right?

Correct.



-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-22 Thread Christian Perrier
Quoting The Eclectic One (eclec...@sdf.lonestar.org):

> First thought: race condition (the panic message contained a backtrace
> of different threads), so then I tried multiple times with only one
> change at at time: expert mode, regular mode, ethdetect -x, vga=771,
> vga normal.  It turns out that the culprit was the vga option.  With
> vga=771, I get no crash/panic in either expert or regular mode, with

So, in short, in regular mode, it crashes (always at the same place)
but in vga=771 mode, it doesn't, right?




signature.asc
Description: Digital signature


Bug#509238: panic backtrace

2008-12-22 Thread The Eclectic One

Quoting Christian Perrier 

>> Ok, tried a few more times.  I usually get the same kernel panic screen,

>Did you try in expert mode, ie choosing it from the "Advanced options"
>in the boot menu.

Yes, I tried expert as well as regular.

>In expert mode, when you reach the HW detection step, you'll get a
>question about PCMCIA options. They're not necessarily relevant but
>checking if the crash happens before or after it would help

No question about PCMCIA before getting to the network detection
screen/status bar.

> ...

>Some drivers in 2.6.18 provided firmware blobs that have been
>extracted from the source and are now provided as separate udebs:

>r...@mykerinos:~> apt-cache search firmware 2100
>firmware-ipw2x00 - Binary firmware for Intel Pro Wireless 2100, 2200 and 2915

That explains the missing firmware.

>If that firmware is needed for ipw2100, you'll be prompted about
>thisbut you're not, which means the crash happens before..:-)

Correct.

>> Anything else I can do? More tests?

>Before the network devices screen (for instance, when prompted for
>language), could you switch to VT2 (Alt+F2) and, from there edit
>/bin/ethdetect.sh

>Add "-x" to the first line:

>#!/bin/sh -x

>Then go back to VT1, continue to the step where the crash happens and
>sitch to VT4 before it happens. As ethdetect.sh will be in debug mode,
>we'll see all its output and could narrow down the exact line where
>the crash happens.

Ok, did that and also vga=771.  I was hoping to get a smaller font to
see more of the backtrace.  This is what I got:


[ output of ethdetect -x - lsifaces (3 lines) sed (4), grep (4) and sed (3) ]
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth0 up
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth0 down
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth1 up
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth1 down
Dec 22 15:42:48 main-menu[1328]: (process:9628) + check-missing-firmware
Dec 22 15:42:48 main-menu[1328]: (process:9628) + sysfs-update-devnames
Dec 22 15:42:48 main-menu[1328]: (process:9628) + cleanup
Dec 22 15:42:48 main-menu[1328]: (process:9628) + rm -f /tmp/devnames-static.txt
Dec 22 15:42:48 main-menu[1328]: DEBUG: resolver (libslang2-udeb): package 
doesn't exist (ignored)

It didn't crash!  After that I proceeded to console 1 and did see the
screen explaining that I needed the missing firmware - ipw2100-1.3.fw.
As I had thought, even with the missing firmware, the culprit wasn't
the wireless device.

First thought: race condition (the panic message contained a backtrace
of different threads), so then I tried multiple times with only one
change at at time: expert mode, regular mode, ethdetect -x, vga=771,
vga normal.  It turns out that the culprit was the vga option.  With
vga=771, I get no crash/panic in either expert or regular mode, with
sh -x in ethdetect or not.  It might still be a race condition but if
so it seems to be triggered by something related to the display.  Very
strange.  The display works fine (with a big font) in normal mode.  In
771 mode, the font is smaller and the curses windows sharper.

Even though I now have a work-around, I'm willing to keep debugging
if if would be useful.




-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-21 Thread Christian Perrier
Quoting The Eclectic One (eclec...@sdf.lonestar.org):

> >This is where we would need to really narrow things down and where
> >using the expert mode could help.
> 
> Ok, tried a few more times.  I usually get the same kernel panic screen,

Did you try in expert mode, ie choosing it from the "Advanced options"
in the boot menu.

In expert mode, when you reach the HW detection step, you'll get a
question about PCMCIA options. They're not necessarily relevant but
checking if the crash happens before or after it would help



> This is exactly as the screen froze.  I wonder if the only line visible that
> refers to eth0 is when it actually detected the main (wired) ethernet
> interface.  As this is the interface I want to use and it appears that it
> generated no errors, it seems to be a good sign.  All the ipw2100 lines
> refer to the wireless interface, which works fine on 2.6.18.  I never knew
> it required separate firmware.  Since it appears not to be included in the

Some drivers in 2.6.18 provided firmware blobs that have been
extracted from the source and are now provided as separate udebs:

r...@mykerinos:~> apt-cache search firmware 2100
firmware-ipw2x00 - Binary firmware for Intel Pro Wireless 2100, 2200 and 2915

If that firmware is needed for ipw2100, you'll be prompted about
thisbut you're not, which means the crash happens before..:-)


> Anything else I can do? More tests?


Before the network devices screen (for instance, when prompted for
language), could you switch to VT2 (Alt+F2) and, from there edit
/bin/ethdetect.sh ?

Add "-x" to the first line:

#!/bin/sh -x


Then go back to VT1, continue to the step where the crash happens and
sitch to VT4 before it happens. As ethdetect.sh will be in debug mode,
we'll see all its output and could narrow down the exact line where
the crash happens.

(doing this in expert mode will give you more control about the moment
where you need to switch to VT4)



signature.asc
Description: Digital signature


Bug#509238: panic backtrace

2008-12-21 Thread The Eclectic One


Quoting Christian Perrier 
>OK. Are you in the position of testing with something else than a USB
>stick boot?

Actually, before I gave up on CDs (ruined 11 CD-Rs, probably marginal
media, drive or wodim problems) I had a CD made on a windows machine
of the lenny installer RC1.  This is what I just tried again.

I get the same panic at the same point in the installation process.
Of course the addresses are slightly different as the underlying
kernel code is different.

>The best would be using the "netboot" ISO (called mini.iso) from a CD.

> ...
>> Ok, it happens immediately after the identifying network hardware
>> screen.  So soon after in fact that I thought I would have no time to

>"after identifying network HW" means 'after the system displays a
>progress bar saying "Identifying network hardware", right?

Correct.

>This is where we would need to really narrow things down and where
>using the expert mode could help.

Ok, tried a few more times.  I usually get the same kernel panic screen,
but on one occasion (in expert mode), it crashed a little differently
and I saw this:


Dec 21 17:21:46 kernel: [  173.096981] ipw2100: Intel(R) PRO/Wireless 2100 
Network Driver, git-1.2.2
Dec 21 17:21:46 kernel: [  173.096987] ipw2100:
Dec 21 17:21:46 kernel: [  173.096987] ipw2100: Copyright(c) 2003-2006 Intel 
Corpration
Dec 21 17:21:46 kernel: [  173.206202] ACPI: PCI Interrupt :02:03:0[A] -> 
Link [LKNB] -> GSI 5 (level, low) -> IRQ 5
Dec 21 17:21:46 kernel: [  173.206206] ipw2100: Detected Intel PRO/WIreless 
2100 Network Connection
Dec 21 17:21:46 kernel: [  173.206619] firmware: requesting ipw2100-1.3.fw
Dec 21 17:21:46 net/hw-detect.hotplug: Detected hotpluggable network interface 
eth0
Dec 21 17:21:46 kernel: [  173.349107] ipw2100: eth1: Firmware 'ipw2100-1.3.fw' 
not available or load failed.
Dec 21 17:21:46 kernel: [  173.349114] ipw2100: eth1: ipw2100_get_firmware 
failed: -2
Dec 21 17:21:46 kernel: [  173.349118] ipw2100: eth1: Failed to power on the 
adapter.
Dec 21 17:21:46 kernel: [  173.349121] ipw2100: eth1: Failed to start the 
firmware.
Dec 21 17:21:46 kernel: [  173.349125] ipw2100Error calling register_netdev.
Dec 21 17:21:46 kernel: [  173.349443] ACPI: PCI interrupt for device 
:02:03.0 disabled
Dec 21 17:21:46 kernel: [  173.349450] ipw2100: probe of :02:03.0 failed 
with error -5
Dec 21 17:21:46 hw-detect: insmod /lib/modules/2.6.26-1-486/kernel/drivers 
ieee1394/sbp2.ko
Dec 21 17:21:46 kernel: [  173.480359] eth1394: eth1: IPv4 over IEEE 1394 
(fw-host0)
Dec 21 17:21:46 net/hw-detect.hotplug: Detected hotpluggable network interface e


This is exactly as the screen froze.  I wonder if the only line visible that
refers to eth0 is when it actually detected the main (wired) ethernet
interface.  As this is the interface I want to use and it appears that it
generated no errors, it seems to be a good sign.  All the ipw2100 lines
refer to the wireless interface, which works fine on 2.6.18.  I never knew
it required separate firmware.  Since it appears not to be included in the
installer, I presume I'd have to find it in the additional drivers media.
In any case, the errors notwithstanding, it looks like the installer handled
the wireless interface correctly.  Am I right that it seems it's the
eth1394 driver that is causing the crash?

I've tried (based on the help example: hw-detect/start_pcmcia=false) to add
hw-detect/start_eth1394=false but it doesn't seem to have an effect.  It
still crashes.  BTW, the eth1394 driver is loaded without errors under
2.6.18, although it is not used.  I have no fire-wire devices to test it.
Early on, I passed the option to protect the firewire interface addresses
(faffd800 - faffdfff) but it still crashed.

I suppose it would help if I could get more lines in the console screens.
What option can I pass the installer to have smaller type, or even better
2 side by side pages?  I have a WUXGA (1920 X 1200) screen so it should be
possible to see a much longer backtrace.

>...

>OK. Thanks for the help trying to narrow things. Being an obvious
>problem with the kernel, we really need to triple check that it
>happens or not with the last  kernel package from unstable (which is
>likely, but still...)

If you point me to a boot.img.gz of the latest kernel, that I can
zcat to the usb memory stick, I'd be happy to test it.

Anything else I can do? More tests?



-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-20 Thread Christian Perrier
Quoting The Eclectic One (eclec...@sdf.lonestar.org):

> >First of all, it would be nice if you could precise what exact version you
> >tested.
> 
> I tried both the rc1 debian-testing-i386-netinst.iso and the daily
> build as of a few days ago.  Same result.

OK. Are you in the position of testing with something else than a USB
stick boot ?

The best would be using the "netboot" ISO (called mini.iso) from a CD.


> Ok, I downloaded these 4 again.  It turns out that the daily build iso
> was the exact same I had tried last.  The boot.imb.gz was different so

Yes, daily builds have not been rebuilt since Dec 18th. Though this is
not the source of your problems, we might prefer really testing with
the version we will release.

> Actually, I was able to capture the problem in non-expert mode.
> I figure the least amount of changes made in the flow of execution
> of the program, the easier it would be to duplicate the problem.

There are no changes in the flow of execution, but more places where
the process is stopped, which might help narrowing things down.

> Ok, it happens immediately after the identifying network hardware
> screen.  So soon after in fact that I thought I would have no time to

"after identifying network HW" means 'after the system displays a
progress bar saying "Identifying network hardware", right?

This is where we would need to really narrow things down and where
using the expert mode could help.

> [   23.523397] EAX:  EBX: df6f ECX:  EDX: df413440
> [   23.523397] ESI: df6f EDI:  EBP:  ESP: df449f84
> [   23.523397]  DS: 007b ES: 007b FS:  GS:  SS: 0068
> [   23.523397] Process events/0 (pid: 5, ti=df448000 task=df43e860 
> task.ti=df448000)
> [   23.523397] Stack: c025d42d df6f 0002 c025830e df413440 c0356f00 
> c025833e 
> [   23.523397]c025835b c012716d df413440 c01276e9 df413448 c0127794 
>  df43e860
> [   23.523397]c012990b df449fc8 df449fc8 df413440  c0129763 
> c012972d 
> [   23.523397] Call Trace:
> [   23.523397]  [] dev_deactivate+0x1e/0xbd
> [   23.523397]  [] __linkwatch_run_queue+0x118/0x148
> [   23.523397]  [] linkwatch_event+0x0/0x22
> [   23.523397]  [] linkwatch_event+0x1d/0x22
> [   23.523397]  [] run_workqueue+0x75/0xee
> [   23.523397]  [] worker_thread+0x0/0xb5
> [   23.523397]  [] worker_thread+0xab/0xb5
> [   23.523397]  [] autoremove_wake_function=0x0/0x2d
> [   23.523397]  [] kthread+0x36/0x5b
> [   23.523397]  [] kthread+0x0/0x5b
> [   23.523397]  [] kernel_thread_helper+0x7/0x10
> [   23.523397]  ===
> [   23.523397] Code: ff 42 50 5b c3 89 c1 8d 90 80 00 00 00 89 12 8d 81 a4 00 
> 00 00 89 52 04 c7 42 08 00 00 00 00 83 c2 0c 39 c2 75 e7 31 c0 c3 89 c1 <8b> 
> 40 10 8b 50 30 85 d2 74 04 89 c8 ff d2 c3 53 89 c3 e8 a8 af
> [   23.523397] EIP: [] qdisc_reset+0x2/0x11 SS:ESP 0068:df449f84
> [   23.915430] Kernel panic - not syncing: Fatall exception in interrupt

OK. Thanks for the help trying to narrow things. Being an obvious
problem with the kernel, we really need to triple check that it
happens or not with the last  kernel package from unstable (which is
likely, but still...)



signature.asc
Description: Digital signature


Bug#509238: panic backtrace

2008-12-20 Thread The Eclectic One


>First of all, it would be nice if you could precise what exact version you
>tested.

I tried both the rc1 debian-testing-i386-netinst.iso and the daily
build as of a few days ago.  Same result.

>The version we would like to see tested at this moment is:

>- RC1, which you can download from:
>hd-media image=A0:
>http://ftp.nl.debian.org/debian/dists/testing/main/installer-i386/current/i=
>mages/hd-media/
>netinst ISO:
>http://cdimage.debian.org/cdimage/lenny_di_rc1/i386/iso-cd/debian-testing-i=
>386-netinst.iso

>- Daily builds:
>hd-media: http://people.debian.org/~joeyh/d-i/images/daily/hd-media/
>netinst ISO:
>http://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/i386/iso-c=
>d/debian-testing-i386-netinst.iso

Ok, I downloaded these 4 again.  It turns out that the daily build iso
was the exact same I had tried last.  The boot.imb.gz was different so
I re-made the usb stick "disk" with the one in the daily build link
above and the daily build iso.  So the backtrace below applies to the
daily builds above.

>Then we could probably narrow the problem by using the expert
>modewhich will interrupt some steps more often and make the exact
>moment the problem happens more easy to spot.

Actually, I was able to capture the problem in non-expert mode.
I figure the least amount of changes made in the flow of execution
of the program, the easier it would be to duplicate the problem.
I chose the defaults in the language selection, locale, etc...

>Also, when you've spotted the moment where the installer hangs, please
>retry the installation and, just before the hang happens, try
>switching to console 4 (Alt+F4) and look at what's displayed there...

Ok, it happens immediately after the identifying network hardware
screen.  So soon after in fact that I thought I would have no time to
swtich to console 4, but quick fingers did it and quite a bit scrolled
off the screen, ending in this:


[   23.523397] EAX:  EBX: df6f ECX:  EDX: df413440
[   23.523397] ESI: df6f EDI:  EBP:  ESP: df449f84
[   23.523397]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[   23.523397] Process events/0 (pid: 5, ti=df448000 task=df43e860 
task.ti=df448000)
[   23.523397] Stack: c025d42d df6f 0002 c025830e df413440 c0356f00 
c025833e 
[   23.523397]c025835b c012716d df413440 c01276e9 df413448 c0127794 
 df43e860
[   23.523397]c012990b df449fc8 df449fc8 df413440  c0129763 
c012972d 
[   23.523397] Call Trace:
[   23.523397]  [] dev_deactivate+0x1e/0xbd
[   23.523397]  [] __linkwatch_run_queue+0x118/0x148
[   23.523397]  [] linkwatch_event+0x0/0x22
[   23.523397]  [] linkwatch_event+0x1d/0x22
[   23.523397]  [] run_workqueue+0x75/0xee
[   23.523397]  [] worker_thread+0x0/0xb5
[   23.523397]  [] worker_thread+0xab/0xb5
[   23.523397]  [] autoremove_wake_function=0x0/0x2d
[   23.523397]  [] kthread+0x36/0x5b
[   23.523397]  [] kthread+0x0/0x5b
[   23.523397]  [] kernel_thread_helper+0x7/0x10
[   23.523397]  ===
[   23.523397] Code: ff 42 50 5b c3 89 c1 8d 90 80 00 00 00 89 12 8d 81 a4 00 
00 00 89 52 04 c7 42 08 00 00 00 00 83 c2 0c 39 c2 75 e7 31 c0 c3 89 c1 <8b> 40 
10 8b 50 30 85 d2 74 04 89 c8 ff d2 c3 53 89 c3 e8 a8 af
[   23.523397] EIP: [] qdisc_reset+0x2/0x11 SS:ESP 0068:df449f84
[   23.915430] Kernel panic - not syncing: Fatall exception in interrupt

Of course, the machine is now totally frozen, so no scrolling back.

Any other tests? Let me know.




-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org