Bug#509238: panic backtrace

2008-12-24 Thread The Eclectic One

Quoting: Christian Perrier bubu...@debian.org
So, in short, in regular mode, it crashes (always at the same place)
but in vga=771 mode, it doesn't, right?

Correct.


And I assume that you get no crash as well if you're using the
graphical installer.

I had not tried the graphical installer, figuring that the more
basic the better, but I just did.  It appears that just before
going to graphical, the installer puts the screen in mode 771.
I get the smaller, sharper fonts.  So, no. It doesn't crash in
the graphical installer.

BTW, the graphical installer looks good!

I'm puzzled to reassign this bug report. This is obviously a race
condition somewhere. Probably a weird kernel issue but investigating
further istricky and I fear that this bug report might remain
uninvestigated further for quite a while until it magically solves
in the future with a new kernel (though probably not for Lenny).

That's probably the reasonable thing to do.

I propose documenting this in the errata file, at the minimum.

Yes.  A prominent (easy to find) note that would say something
like If your laptop crashes completely during installation,
try replacing the boot option vga=normal with vga=771 or use
the graphical installer would have saved some time, but then
the problem would not have become known.  

Thank you very much for your help!




-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-23 Thread The Eclectic One

Quoting: Christian Perrier bubu...@debian.org
Quoting The Eclectic One (eclec...@sdf.lonestar.org):

 First thought: race condition (the panic message contained a backtrace
 of different threads), so then I tried multiple times with only one
 change at at time: expert mode, regular mode, ethdetect -x, vga=771,
 vga normal.  It turns out that the culprit was the vga option.  With
 vga=771, I get no crash/panic in either expert or regular mode, with

So, in short, in regular mode, it crashes (always at the same place)
but in vga=771 mode, it doesn't, right?

Correct.



-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-22 Thread The Eclectic One

Quoting Christian Perrier bubu...@debian.org

 Ok, tried a few more times.  I usually get the same kernel panic screen,

Did you try in expert mode, ie choosing it from the Advanced options
in the boot menu.

Yes, I tried expert as well as regular.

In expert mode, when you reach the HW detection step, you'll get a
question about PCMCIA options. They're not necessarily relevant but
checking if the crash happens before or after it would help

No question about PCMCIA before getting to the network detection
screen/status bar.

 ...

Some drivers in 2.6.18 provided firmware blobs that have been
extracted from the source and are now provided as separate udebs:

r...@mykerinos:~ apt-cache search firmware 2100
firmware-ipw2x00 - Binary firmware for Intel Pro Wireless 2100, 2200 and 2915

That explains the missing firmware.

If that firmware is needed for ipw2100, you'll be prompted about
thisbut you're not, which means the crash happens before..:-)

Correct.

 Anything else I can do? More tests?

Before the network devices screen (for instance, when prompted for
language), could you switch to VT2 (Alt+F2) and, from there edit
/bin/ethdetect.sh

Add -x to the first line:

#!/bin/sh -x

Then go back to VT1, continue to the step where the crash happens and
sitch to VT4 before it happens. As ethdetect.sh will be in debug mode,
we'll see all its output and could narrow down the exact line where
the crash happens.

Ok, did that and also vga=771.  I was hoping to get a smaller font to
see more of the backtrace.  This is what I got:


[ output of ethdetect -x - lsifaces (3 lines) sed (4), grep (4) and sed (3) ]
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth0 up
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth0 down
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth1 up
Dec 22 15:42:48 main-menu[1328]: (process:9628) + ip link set eth1 down
Dec 22 15:42:48 main-menu[1328]: (process:9628) + check-missing-firmware
Dec 22 15:42:48 main-menu[1328]: (process:9628) + sysfs-update-devnames
Dec 22 15:42:48 main-menu[1328]: (process:9628) + cleanup
Dec 22 15:42:48 main-menu[1328]: (process:9628) + rm -f /tmp/devnames-static.txt
Dec 22 15:42:48 main-menu[1328]: DEBUG: resolver (libslang2-udeb): package 
doesn't exist (ignored)

It didn't crash!  After that I proceeded to console 1 and did see the
screen explaining that I needed the missing firmware - ipw2100-1.3.fw.
As I had thought, even with the missing firmware, the culprit wasn't
the wireless device.

First thought: race condition (the panic message contained a backtrace
of different threads), so then I tried multiple times with only one
change at at time: expert mode, regular mode, ethdetect -x, vga=771,
vga normal.  It turns out that the culprit was the vga option.  With
vga=771, I get no crash/panic in either expert or regular mode, with
sh -x in ethdetect or not.  It might still be a race condition but if
so it seems to be triggered by something related to the display.  Very
strange.  The display works fine (with a big font) in normal mode.  In
771 mode, the font is smaller and the curses windows sharper.

Even though I now have a work-around, I'm willing to keep debugging
if if would be useful.




-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-21 Thread The Eclectic One


Quoting Christian Perrier bubu...@debian.org
OK. Are you in the position of testing with something else than a USB
stick boot?

Actually, before I gave up on CDs (ruined 11 CD-Rs, probably marginal
media, drive or wodim problems) I had a CD made on a windows machine
of the lenny installer RC1.  This is what I just tried again.

I get the same panic at the same point in the installation process.
Of course the addresses are slightly different as the underlying
kernel code is different.

The best would be using the netboot ISO (called mini.iso) from a CD.

 ...
 Ok, it happens immediately after the identifying network hardware
 screen.  So soon after in fact that I thought I would have no time to

after identifying network HW means 'after the system displays a
progress bar saying Identifying network hardware, right?

Correct.

This is where we would need to really narrow things down and where
using the expert mode could help.

Ok, tried a few more times.  I usually get the same kernel panic screen,
but on one occasion (in expert mode), it crashed a little differently
and I saw this:


Dec 21 17:21:46 kernel: [  173.096981] ipw2100: Intel(R) PRO/Wireless 2100 
Network Driver, git-1.2.2
Dec 21 17:21:46 kernel: [  173.096987] ipw2100:
Dec 21 17:21:46 kernel: [  173.096987] ipw2100: Copyright(c) 2003-2006 Intel 
Corpration
Dec 21 17:21:46 kernel: [  173.206202] ACPI: PCI Interrupt :02:03:0[A] - 
Link [LKNB] - GSI 5 (level, low) - IRQ 5
Dec 21 17:21:46 kernel: [  173.206206] ipw2100: Detected Intel PRO/WIreless 
2100 Network Connection
Dec 21 17:21:46 kernel: [  173.206619] firmware: requesting ipw2100-1.3.fw
Dec 21 17:21:46 net/hw-detect.hotplug: Detected hotpluggable network interface 
eth0
Dec 21 17:21:46 kernel: [  173.349107] ipw2100: eth1: Firmware 'ipw2100-1.3.fw' 
not available or load failed.
Dec 21 17:21:46 kernel: [  173.349114] ipw2100: eth1: ipw2100_get_firmware 
failed: -2
Dec 21 17:21:46 kernel: [  173.349118] ipw2100: eth1: Failed to power on the 
adapter.
Dec 21 17:21:46 kernel: [  173.349121] ipw2100: eth1: Failed to start the 
firmware.
Dec 21 17:21:46 kernel: [  173.349125] ipw2100Error calling register_netdev.
Dec 21 17:21:46 kernel: [  173.349443] ACPI: PCI interrupt for device 
:02:03.0 disabled
Dec 21 17:21:46 kernel: [  173.349450] ipw2100: probe of :02:03.0 failed 
with error -5
Dec 21 17:21:46 hw-detect: insmod /lib/modules/2.6.26-1-486/kernel/drivers 
ieee1394/sbp2.ko
Dec 21 17:21:46 kernel: [  173.480359] eth1394: eth1: IPv4 over IEEE 1394 
(fw-host0)
Dec 21 17:21:46 net/hw-detect.hotplug: Detected hotpluggable network interface e


This is exactly as the screen froze.  I wonder if the only line visible that
refers to eth0 is when it actually detected the main (wired) ethernet
interface.  As this is the interface I want to use and it appears that it
generated no errors, it seems to be a good sign.  All the ipw2100 lines
refer to the wireless interface, which works fine on 2.6.18.  I never knew
it required separate firmware.  Since it appears not to be included in the
installer, I presume I'd have to find it in the additional drivers media.
In any case, the errors notwithstanding, it looks like the installer handled
the wireless interface correctly.  Am I right that it seems it's the
eth1394 driver that is causing the crash?

I've tried (based on the help example: hw-detect/start_pcmcia=false) to add
hw-detect/start_eth1394=false but it doesn't seem to have an effect.  It
still crashes.  BTW, the eth1394 driver is loaded without errors under
2.6.18, although it is not used.  I have no fire-wire devices to test it.
Early on, I passed the option to protect the firewire interface addresses
(faffd800 - faffdfff) but it still crashed.

I suppose it would help if I could get more lines in the console screens.
What option can I pass the installer to have smaller type, or even better
2 side by side pages?  I have a WUXGA (1920 X 1200) screen so it should be
possible to see a much longer backtrace.

...

OK. Thanks for the help trying to narrow things. Being an obvious
problem with the kernel, we really need to triple check that it
happens or not with the last  kernel package from unstable (which is
likely, but still...)

If you point me to a boot.img.gz of the latest kernel, that I can
zcat to the usb memory stick, I'd be happy to test it.

Anything else I can do? More tests?



-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509238: panic backtrace

2008-12-20 Thread The Eclectic One


First of all, it would be nice if you could precise what exact version you
tested.

I tried both the rc1 debian-testing-i386-netinst.iso and the daily
build as of a few days ago.  Same result.

The version we would like to see tested at this moment is:

- RC1, which you can download from:
hd-media image=A0:
http://ftp.nl.debian.org/debian/dists/testing/main/installer-i386/current/i=
mages/hd-media/
netinst ISO:
http://cdimage.debian.org/cdimage/lenny_di_rc1/i386/iso-cd/debian-testing-i=
386-netinst.iso

- Daily builds:
hd-media: http://people.debian.org/~joeyh/d-i/images/daily/hd-media/
netinst ISO:
http://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/i386/iso-c=
d/debian-testing-i386-netinst.iso

Ok, I downloaded these 4 again.  It turns out that the daily build iso
was the exact same I had tried last.  The boot.imb.gz was different so
I re-made the usb stick disk with the one in the daily build link
above and the daily build iso.  So the backtrace below applies to the
daily builds above.

Then we could probably narrow the problem by using the expert
modewhich will interrupt some steps more often and make the exact
moment the problem happens more easy to spot.

Actually, I was able to capture the problem in non-expert mode.
I figure the least amount of changes made in the flow of execution
of the program, the easier it would be to duplicate the problem.
I chose the defaults in the language selection, locale, etc...

Also, when you've spotted the moment where the installer hangs, please
retry the installation and, just before the hang happens, try
switching to console 4 (Alt+F4) and look at what's displayed there...

Ok, it happens immediately after the identifying network hardware
screen.  So soon after in fact that I thought I would have no time to
swtich to console 4, but quick fingers did it and quite a bit scrolled
off the screen, ending in this:


[   23.523397] EAX:  EBX: df6f ECX:  EDX: df413440
[   23.523397] ESI: df6f EDI:  EBP:  ESP: df449f84
[   23.523397]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[   23.523397] Process events/0 (pid: 5, ti=df448000 task=df43e860 
task.ti=df448000)
[   23.523397] Stack: c025d42d df6f 0002 c025830e df413440 c0356f00 
c025833e 
[   23.523397]c025835b c012716d df413440 c01276e9 df413448 c0127794 
 df43e860
[   23.523397]c012990b df449fc8 df449fc8 df413440  c0129763 
c012972d 
[   23.523397] Call Trace:
[   23.523397]  [c025d42d] dev_deactivate+0x1e/0xbd
[   23.523397]  [c025830e] __linkwatch_run_queue+0x118/0x148
[   23.523397]  [c025833e] linkwatch_event+0x0/0x22
[   23.523397]  [c025835b] linkwatch_event+0x1d/0x22
[   23.523397]  [c012716d] run_workqueue+0x75/0xee
[   23.523397]  [c01276e9] worker_thread+0x0/0xb5
[   23.523397]  [c0127794] worker_thread+0xab/0xb5
[   23.523397]  [c012990b] autoremove_wake_function=0x0/0x2d
[   23.523397]  [c0129763] kthread+0x36/0x5b
[   23.523397]  [c012972d] kthread+0x0/0x5b
[   23.523397]  [c0104937] kernel_thread_helper+0x7/0x10
[   23.523397]  ===
[   23.523397] Code: ff 42 50 5b c3 89 c1 8d 90 80 00 00 00 89 12 8d 81 a4 00 
00 00 89 52 04 c7 42 08 00 00 00 00 83 c2 0c 39 c2 75 e7 31 c0 c3 89 c1 8b 40 
10 8b 50 30 85 d2 74 04 89 c8 ff d2 c3 53 89 c3 e8 a8 af
[   23.523397] EIP: [c025d074] qdisc_reset+0x2/0x11 SS:ESP 0068:df449f84
[   23.915430] Kernel panic - not syncing: Fatall exception in interrupt

Of course, the machine is now totally frozen, so no scrolling back.

Any other tests? Let me know.




-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509238: debian-installer: lenny installer (daily build) locks up after net hw detection screen

2008-12-19 Thread The Eclectic One
Package: debian-installer
Version: lenny installer
Severity: critical
Justification: breaks the whole system



-- System Information:
Debian Release: 4.0  --- Not really.  It's the lenny installer
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)

As the above system information shows, Etch works perfectly on this
system, although it was not easy to install on a blank HD when the OS
was woody.

The installation that triggered this bug was attempted by building
a bootable USB memory stick (boot.img.gz (downloaded today) +
debian-testing-i386-netinst.iso) to install lenny on a new blank HD.

Attempted work-arounds.

As can be seen for the output of lswh (done from Etch and appended
below), this laptop has lots of built-in hardware.  In addition to
the suggested temporary options (noapic, nolapic, acpi=off, irqpoll)
and hw-detect/start_pcmcia=false, I have excluded all the i/o
addresses that are likely to cause a problem: the pcmcia system,
the firewire port, the wireless adapter (ipw2100), the AC97 sound
system and the modem.

Obviously, since this is a net install, the network card (that in
Etch uses the b44 driver just fine) is mandatory.

Immediately after the network hardware detection is apparently
complete, the screen goes dark, the right 2 LEDs next to the power
button flash at 1 second interval and the system totally freezes.
Holding the power button for about 10 seconds is necessary to shut
down the machine.

I'd be glad to run specific tests if asked.  Any other suggestions
of work-arounds welcome.

Output of lshw follows:

dell-8600
description: Portable Computer
product: Inspiron 8600
vendor: Dell Computer Corporation
serial: xxx
width: 32 bits
capabilities: smbios-2.3 dmi-2.3
configuration: boot=normal chassis=portable 
uuid=xxxC----
  *-core
   description: Motherboard
   product: 0X1069
   vendor: Dell Computer Corporation
   physical id: 0
   serial: .xxx.xx.
 *-firmware
  description: BIOS
  vendor: Dell Computer Corporation
  physical id: 0
  version: A11 (10/25/2004)
  size: 64KB
  capacity: 448KB
  capabilities: isa pci pcmcia pnp apm upgrade shadowing cdboot 
bootselect int13floppy720 int5printscreen int9keyboard int14serial int17printer 
int10video acpi usb agp smartbattery biosbootspecification netboot
 *-cpu
  description: CPU
  product: Intel(R) Pentium(R) M processor 1400MHz
  vendor: Intel Corp.
  physical id: 400
  bus info: c...@0
  version: 6.9.5
  slot: Microprocessor
  size: 1400MHz
  capacity: 1700MHz
  width: 32 bits
  clock: 133MHz
  capabilities: fpu fpu_exception wp vme de pse tsc msr mce cx8 sep 
mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 tm pbe est tm2
*-cache:0
 description: L1 cache
 physical id: 700
 size: 8KB
 capacity: 8KB
 capabilities: internal write-back data
*-cache:1
 description: L2 cache
 physical id: 701
 size: 1MB
 capacity: 1MB
 clock: 66MHz (15.0ns)
 capabilities: pipeline-burst internal varies unified
 *-memory
  description: System Memory
  physical id: 1000
  slot: System board or motherboard
  size: 512MB
  capacity: 1GB
*-bank:0
 description: DIMM DDR Synchronous 333 MHz (3.0 ns)
 physical id: 0
 slot: DIMM_A
 size: 256MB
 width: 64 bits
 clock: 333MHz (3.0ns)
*-bank:1
 description: DIMM DDR Synchronous 333 MHz (3.0 ns)
 physical id: 1
 slot: DIMM_B
 size: 256MB
 width: 64 bits
 clock: 333MHz (3.0ns)
 *-pci
  description: Host bridge
  product: 82855PM Processor to I/O Controller
  vendor: Intel Corporation
  physical id: e800
  bus info: p...@00:00.0
  version: 03
  width: 32 bits
  clock: 33MHz
  resources: iomemory:e800-efff
*-pci:0
 description: PCI bridge
 product: 82855PM Processor to AGP Controller
 vendor: Intel Corporation
 physical id: 1
 bus info: p...@00:01.0
 version: 03
 width: 32 bits
 clock: 66MHz
 capabilities: pci normal_decode bus_master
   *-display
description: VGA compatible controller
product: NV28 [GeForce4 Ti 4200 Go AGP 8x]
vendor: nVidia Corporation
physical id: 0
bus info: