Graa, I reinserted the mailing lists there to the CCs, it may be good for those guys too to be able to share whatever hints they may have :P
On Fri, Jul 08, 2005 at 05:07:51PM +0300, Pauli Borodulin wrote: >>It's a lot better than the VGA console which requires me >>to plug in the 100MBps connection at the office, until >>I get around to looking into a VNC connection which is >>still mostly useless if telnet would work. > >Nice connection you have there. We're using the DRAC4 web management for >graphical remote console access, and it works quite nicely even over As do we. And that's a placebo effect for sure, the wlan should be able to handle this ;) >10Mbps. And hey, anyways, you only need it in some special occasions. I >haven't seen any problems with USB. We are only using 32-bit. I'm just mystified that an identical setup works. I figured I'd upgrade the DRAC 4/I bios, as 1.20 is installed and 1.30 is out, for what it's worth, but Red Hat seems to ship some el mysterioso lockfile thing that I have to find somewhere... >There are some nasty 64-bit problems in Linux. I wouldn't be surprised >if it's the reason for your grief. Some Opteron people are depressed >with the situation. Perfect. >I read the file >http://mjt.nysv.org/kernelbugfest/netconsole_panic_2.6.12.2 thru'. Do >you have some kind of watchdog enabled? I have seen the "NMI Watchdog >detected LOCKUP on CPU0" message during bootup as a symptom for broken >watchdog stuff on 64-bit x86 servers. Try "nmi_watchdog=0" as kernel Maybe I should clean the kernel conf up a bit, but nmi_watchdog=0 if it disables everythin a-ok didn't help :P >parameter. If that doesn't help, try disabling all watchdogs from the >kernel. Also you could try adding "noapic" as kernel parameter on boot. >Collect some output and report back :-) noapic and friends didn't help. http://mjt.nysv.org/kernelbugfest/usb_bug_2_2.6.12.2 http://mjt.nysv.org/kernelbugfest/usb_bug_2_2.6.12.2.ksymoops What does the old poke_blanked_console refer to? I just rebooted the box with console=ttyS0,9600 as well (removed above) and I still didn't see any issues. Am I to assume pci=routeirq is truly so deprecated I shouldn't bother with it even? What's up with nmi_watchdog=0 and testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! anyway? Any hints on why the telnet interface spews only à at me after grub, that or nothing? I'd _really_ hate to launch these boxes 32-bit or without working remote administration, so I'll still do a bit more work but this is starting to suck :> I'm sure no one has any guarantees this'll just be fixed in the future and I can launch with 64 bits and schedule a reboot later on... Thanks! And here's the ksymoops-decoded output: ksymoops 2.4.9 on x86_64 2.6.12.2-amd64. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.12.2-amd64/ (default) -m /boot/System.map-2.6.12.2-amd64 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -16 cycles, maxerr 1300 cycles) CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff 3 cycles, maxerr 867 cycles) CPU 3: Syncing TSC to CPU 0. testing NMI watchdog ... <6>CPU 3: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 1326 cycles) CPU#0: NMI appears to be stuck (0->0)! e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection e100: Intel(R) PRO/100 Network Driver, 3.4.8-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation Kernel BUG at "arch/x86_64/kernel/traps.c":338 invalid operand: 0000 [1] SMP CPU 0 Pid: 1227, comm: khubd Not tainted 2.6.12.2-amd64 RIP: 0010:[<ffffffff8010ff66>] <ffffffff8010ff66>{out_of_line_bug+0} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: 0018:ffff81013b27fbc0 EFLAGS: 00010206 RAX: ffffffff00000000 RBX: ffff81013acbebc0 RCX: 0000000000000040 RDX: 000000013fd3a100 RSI: ffff8100bfdd4870 RDI: ffff81013acbebc0 RBP: 000000013fd3a0c0 R08: 0000000000000000 R09: 00000000000003e8 R10: 0000000000000000 R11: 0000000000000002 R12: ffff81013d8b5000 R13: ffff81013e354468 R14: 0000000000000296 R15: 0000000000000010 FS: 00002aaaaae00640(0000) GS:ffffffff80637e80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaab040720 CR3: 0000000000101000 CR4: 00000000000006e0 Stack: ffffffff8800fffb 0000000000000296 ffff81013acbebc0 ffff81013e354400 0000000000000002 0000000000000010 0000000000000000 0000000080000080 ffffffff88010fd1 ffffffff8801bd00 Call Trace:<ffffffff8800fffb>{:usbcore:hcd_submit_urb+492} <ffffffff88010fd1>{:usbcore:usb_submit_urb+847} <ffffffff88011219>{:usbcore:usb_start_wait_urb+88} <ffffffff80131a40>{printk+141} <ffffffff880113c6>{:usbcore:usb_internal_control_msg+137} <ffffffff88011484>{:usbcore:usb_control_msg+143} <ffffffff8800d596>{:usbcore:hub_port_init+643} <ffffffff8024ab8d>{kobject_get+18} <ffffffff8800dcbb>{:usbcore:hub_port_connect_change+535} <ffffffff8800e308>{:usbcore:hub_events+1038} <ffffffff8800e443>{:usbcore:hub_thread+37} <ffffffff801461fd>{autoremove_wake_function+0} <ffffffff801461fd>{autoremove_wake_function+0} <ffffffff8010f45b>{child_rip+8} <ffffffff8800e41e>{:usbcore:hub_thread+0} <ffffffff8010f453>{child_rip+0} Code: 0f 0b f9 d7 3f 80 ff ff ff ff 52 01 c3 53 e8 2a 99 00 00 89 >>RIP; ffffffff8010ff66 <out_of_line_bug+0/d> <===== Trace; ffffffff8800fffb <_end+7999ffb/7ef8a000> Trace; ffffffff88011219 <_end+799b219/7ef8a000> Trace; ffffffff880113c6 <_end+799b3c6/7ef8a000> Trace; ffffffff88011484 <_end+799b484/7ef8a000> Trace; ffffffff8024ab8d <kobject_get+12/17> Trace; ffffffff8800e308 <_end+7998308/7ef8a000> Trace; ffffffff801461fd <autoremove_wake_function+0/2e> Trace; ffffffff8010f45b <child_rip+8/11> Trace; ffffffff8010f453 <child_rip+0/11> Code; ffffffff8010ff66 <out_of_line_bug+0/d> 0000000000000000 <_RIP>: Code; ffffffff8010ff66 <out_of_line_bug+0/d> <===== 0: 0f 0b ud2a <===== Code; ffffffff8010ff68 <out_of_line_bug+2/d> 2: f9 stc Code; ffffffff8010ff69 <out_of_line_bug+3/d> 3: d7 xlat %ds:(%ebx) Code; ffffffff8010ff6a <out_of_line_bug+4/d> 4: 3f (bad) Code; ffffffff8010ff6b <out_of_line_bug+5/d> 5: 80 ff ff cmp $0xff,%bh Code; ffffffff8010ff6e <out_of_line_bug+8/d> 8: ff (bad) Code; ffffffff8010ff6f <out_of_line_bug+9/d> 9: ff 52 01 callq *0x1(%rdx) Code; ffffffff8010ff72 <out_of_line_bug+c/d> c: c3 retq Code; ffffffff8010ff73 <oops_begin+0/54> d: 53 push %rbx Code; ffffffff8010ff74 <oops_begin+1/54> e: e8 2a 99 00 00 callq 993d <_RIP+0x993d> Code; ffffffff8010ff79 <oops_begin+6/54> 13: 89 00 mov %eax,(%rax) e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex 1 warning and 1 error issued. Results may not be reliable. -- mjt
signature.asc
Description: Digital signature