Package: linux-image-2.6.26-2-686-bigmem Version: 2.6.26-17 Severity: important
This problem is repeatable on two of our Sun X2200 servers (two * quad-core Opteron 2376 CPUs and 28GB of RAM). I found a couple of similar bug reports (#496917 and #536236) , but they are filed agains amd64 kernels. Ours is the stock x86 bigmem kernel out of Lenny, so I figured it I'd file a separate report. This is unlikely to be a hardware issue, because it shows up on two different systems. Each of them had memtest86+ running for several days before deployment. Right now the machines are running vanilla 2.6.30.1 kernels from kernel.org, compiled with lenny's config-2.6.26-2-686-bigmem, and the problem is gone. The problem is that random CPUs intermittently get locked up, with the following kernel messages showing repeatedly: ... [48420.342829] BUG: soft lockup - CPU#5 stuck for 62s! [swapper:0] [48420.342829] Modules linked in: tcp_diag inet_diag binfmt_misc nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 serio_raw shpchp psmouse pci_hotplug i2c_nforce2 pcspkr joydev button i2c_core evdev ext3 jbd mbcache sd_mod usbhid hid ff_memless ide_pci_generic amd74xx ide_core sata_nv ata_generic tg3 libata scsi_mod ehci_hcd ohci_hcd dock usbcore thermal processor fan thermal_sys [48420.342829] [48420.342829] Pid: 0, comm: swapper Not tainted (2.6.26-2-686-bigmem #1) [48420.342829] EIP: 0060:[<c011a124>] EFLAGS: 00000246 CPU: 5 [48420.342829] EIP is at native_safe_halt+0x2/0x3 [48420.342829] EAX: f74be000 EBX: c0107656 ECX: 0f07b000 EDX: 00524d4b [48420.342829] ESI: 00000005 EDI: 00000000 EBP: 00000000 ESP: f74bffa8 [48420.342829] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [48420.342829] CR0: 8005003b CR2: 080f2c58 CR3: 37585000 CR4: 000006f0 [48420.342829] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [48420.342829] DR6: ffff0ff0 DR7: 00000400 [48420.342829] [<c0107683>] default_idle+0x2d/0x53 [48420.342829] [<c01075ce>] cpu_idle+0xab/0xcb [48420.342829] ======================= ... The CPU#N part of the error message can be anything from 0 to 7. And the process name in square brackets can also be anything from a system process to a user-run script. The machines are pretty much stock Sun X2200 servers with two quad-core Opteron 2376 CPUs, 28GB of RAM, and one SATA disk. Below is the output of lspci. Please let me know if you require more information. trunko:~# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control 01:05.0 VGA compatible controller: ASPEED Technology, Inc. AST2000 05:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev b5) 06:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3) 06:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3) trunko:~# free total used free shared buffers cached Mem: 29120968 954532 28166436 0 304752 274220 -/+ buffers/cache: 375560 28745408 Swap: 2048248 0 2048248 trunko:~# fdisk -l Disk /dev/sda: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sda1 1 13 104391 83 Linux /dev/sda2 14 19457 156183930 5 Extended /dev/sda5 * 14 64 409626 83 Linux /dev/sda6 65 319 2048256 82 Linux swap / Solaris /dev/sda7 320 1212 7172991 83 Linux /dev/sda8 1213 1340 1028128+ 83 Linux /dev/sda9 1341 1468 1028128+ 83 Linux /dev/sda10 1469 2361 7172991 83 Linux /dev/sda11 2362 19457 137323588+ 83 Linux trunko:~# lsmod Module Size Used by binfmt_misc 7020 1 nfsd 207552 9 exportfs 3712 1 nfsd nfs 220756 10 lockd 57984 2 nfsd,nfs nfs_acl 2632 2 nfsd,nfs auth_rpcgss 32752 2 nfsd,nfs sunrpc 164612 34 nfsd,nfs,lockd,nfs_acl,auth_rpcgss ipv6 232800 38 ipmi_si 34828 0 ipmi_msghandler 30676 1 ipmi_si i2c_nforce2 6248 0 joydev 8800 0 serio_raw 4696 0 psmouse 37468 0 shpchp 27108 0 pci_hotplug 24628 1 shpchp button 5120 0 processor 34600 0 i2c_core 20880 1 i2c_nforce2 pcspkr 2096 0 evdev 8220 3 ext3 107448 7 jbd 41072 1 ext3 mbcache 6984 1 ext3 sd_mod 23924 9 ide_pci_generic 3624 0 amd74xx 5420 0 ide_core 87756 2 ide_pci_generic,amd74xx usbhid 31452 0 hid 36068 1 usbhid sata_nv 19636 8 ata_generic 4332 0 tg3 94696 0 libphy 19512 1 tg3 libata 151032 2 sata_nv,ata_generic scsi_mod 135076 2 sd_mod,libata ehci_hcd 30492 0 ohci_hcd 19880 0 usbcore 125860 4 usbhid,ehci_hcd,ohci_hcd thermal 12664 0 fan 4032 0 thermal_sys 13424 3 processor,thermal,fan -- Package-specific info: -- System Information: Debian Release: 5.0.2 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.30.1-i686-bigmem-cdf (SMP w/8 CPU cores) Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/bash Versions of packages linux-image-2.6.26-2-686-bigmem depends on: ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy ii initramfs-tools [linux-initra 0.92o tools for generating an initramfs ii module-init-tools 3.4-1 tools for managing Linux kernel mo Versions of packages linux-image-2.6.26-2-686-bigmem recommends: ii libc6-i686 2.7-18 GNU C Library: Shared libraries [i Versions of packages linux-image-2.6.26-2-686-bigmem suggests: ii grub 0.97-47lenny2 GRand Unified Bootloader (Legacy v pn linux-doc-2.6.26 <none> (no description available) -- debconf information: linux-image-2.6.26-2-686-bigmem/preinst/overwriting-modules-2.6.26-2-686-bigmem: true shared/kernel-image/really-run-bootloader: true linux-image-2.6.26-2-686-bigmem/preinst/lilo-has-ramdisk: linux-image-2.6.26-2-686-bigmem/postinst/bootloader-test-error-2.6.26-2-686-bigmem: linux-image-2.6.26-2-686-bigmem/postinst/depmod-error-2.6.26-2-686-bigmem: false linux-image-2.6.26-2-686-bigmem/preinst/initrd-2.6.26-2-686-bigmem: linux-image-2.6.26-2-686-bigmem/preinst/abort-overwrite-2.6.26-2-686-bigmem: linux-image-2.6.26-2-686-bigmem/preinst/bootloader-initrd-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/postinst/depmod-error-initrd-2.6.26-2-686-bigmem: false linux-image-2.6.26-2-686-bigmem/postinst/create-kimage-link-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/preinst/lilo-initrd-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/prerm/would-invalidate-boot-loader-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/preinst/failed-to-move-modules-2.6.26-2-686-bigmem: linux-image-2.6.26-2-686-bigmem/prerm/removing-running-kernel-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/postinst/old-dir-initrd-link-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/preinst/elilo-initrd-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/preinst/abort-install-2.6.26-2-686-bigmem: linux-image-2.6.26-2-686-bigmem/postinst/old-initrd-link-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/postinst/old-system-map-link-2.6.26-2-686-bigmem: true linux-image-2.6.26-2-686-bigmem/postinst/bootloader-error-2.6.26-2-686-bigmem: linux-image-2.6.26-2-686-bigmem/postinst/kimage-is-a-directory: -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org