Re: CONFIG_PREEMPT -> crash under load in 2.6.20?
Nix wrote: >>> I can't tell if magic sysrq dies, because as far as I know there's no >>> way to get magic sysrq to do much visible when you're in X, and I can't >>> get anything to go over the network kernel syslog because the network is >>> dead. >> You should still be able to use SysRQ even in X. I tested right now. >> 1. Have X running already and then start X in another VT >>$ X :2 vt10 >> 2. Hit Alt+SysRQ+K >>---> X dies, display gets corrupted, and keyboard input ignored >> 3. ssh in from another machine and switch back to the running X instance >># chvt 7 > > The network's dead; that's impossible. I'm sorry, you misunderstand: I meant the above steps as a method of confirming that SysRQ normally still works while X is running, not as anything useful to do after your system has hung. Now that I re-read what you wrote initially, however, I think I somewhat misunderstood what you wrote anyway, and you probably already knew that SyrRQ worked in X. Anyway... > 22:58:47 up 10 days, 22:20, 37 users, load average: 12.71, 11.14, 18.22 > > No problems, and I've been loading the system really rather hard today > (as that line makes clear). I think the problem I'm seeing really *is* > tied to _PREEMPT. Yeah, that's pretty indicative. As for me, I just tried disabling CONFIG_CC_OPTIMIZE_FOR_SIZE; so far so good, but I don't even have 4 hours uptime yet. We'll see. >> It might be helpful if you reported your hardware information; I'd be >> interested in seeing if there's much in common with my own machine. > > Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as > soon as possible). Random info starting with loaded modules: [info cut] Nothing really in common with mine. Oh well. -Corey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_PREEMPT -> crash under load in 2.6.20?
On 4 Mar 2007, Corey Hickey told this: > Nix wrote: >> I can't tell if magic sysrq dies, because as far as I know there's no >> way to get magic sysrq to do much visible when you're in X, and I can't >> get anything to go over the network kernel syslog because the network is >> dead. > > You should still be able to use SysRQ even in X. I tested right now. > 1. Have X running already and then start X in another VT >$ X :2 vt10 > 2. Hit Alt+SysRQ+K >---> X dies, display gets corrupted, and keyboard input ignored > 3. ssh in from another machine and switch back to the running X instance ># chvt 7 The network's dead; that's impossible. However, I'll try SAK followed by a reg-and-flags dump: I'd entirely forgotten about SAK. If that doesn't work I'll try to get a kexec dump. (This is all moot if, as seems likely, the system's too dead to respond to the keyboard: we shall see.) >> I could begin a (really laborious, ~1 day per iteration) bisection to >> try to track this down, but before I start, has anyone seen this before? >> Is its cause known? > > I've been seeing the same behavior under different circumstances; I > don't know if it's related. > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html That's using _VOLUNTARY, so I sort of doubt that it's the *same* problem, but I suppose it might be related. (But this is hypothesis in the absence of data :) ) > For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing > I had noticed, however, is that with the problem I'm experiencing, > changing some config options inexplicably delayed the onset of the > lockup, but the lockup occurred nonetheless. Have you been running with > CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all? 22:58:47 up 10 days, 22:20, 37 users, load average: 12.71, 11.14, 18.22 No problems, and I've been loading the system really rather hard today (as that line makes clear). I think the problem I'm seeing really *is* tied to _PREEMPT. > It might be helpful if you reported your hardware information; I'd be > interested in seeing if there's much in common with my own machine. Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as soon as possible). Random info starting with loaded modules: loop 11080 0 - Live 0xf0a56000 radeon 107808 2 - Live 0xf0a5c000 drm 62356 3 radeon, Live 0xf0a2f000 processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 6 model name : AMD Athlon(tm) Processor stepping: 2 cpu MHz : 1250.178 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts bogomips: 2501.59 clflush size: 32 CPU0 0: 944756984 IO-APIC-edge timer 1:1397516 IO-APIC-edge i8042 2: 0XT-PIC-XTcascade 6: 5 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 1 IO-APIC-edge rtc 12:2877936 IO-APIC-edge i8042 14: 20262275 IO-APIC-edge ide2 16: 330856361 IO-APIC-fasteoi EMU10K1, [EMAIL PROTECTED]::01:00.0 18: 115648745 IO-APIC-fasteoi gordianet 19: 11059053 IO-APIC-fasteoi ide0, ide1 21: 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb2 NMI: 0 LOC: 944792893 ERR: 0 MIS: 49 -001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : :00:11.1 01f0-01f7 : :00:11.1 01f0-01f7 : ide2 0295-0296 : w83627hf 0376-0376 : :00:11.1 0378-037a : parport0 037b-037f : parport0 03c0-03df : vga+ 03f2-03f5 : floppy 03f6-03f6 : :00:11.1 03f6-03f6 : ide2 03f7-03f7 : floppy DIR 03f8-03ff : serial 0400-0407 : vt596_smbus 0cf8-0cff : PCI conf1 b000-bfff : PCI Bus #01 b800-b8ff : :01:00.0 c800-c81f : :00:11.2 c800-c81f : uhci_hcd cc00-cc1f : :00:11.3 cc00-cc1f : uhci_hcd d000-d07f : :00:07.0 d400-d41f : :00:05.0 d400-d41f : EMU10K1 d800-d80f : :00:0c.0 d800-d807 : ide0 d808-d80f : ide1 dc00-dc03 : :00:0c.0 dc02-dc02 : ide1 e000-e007 : :00:0c.0 e000-e007 : ide1 e400-e403 : :00:0c.0 e402-e402 : ide0 e800-e807 : :00:0c.0 e800-e807 : ide0 ec00-ec07 : :00:05.1 ec00-ec07 : emu10k1-gp fc00-fc0f : :00:11.1 fc00-fc07 : ide2 fc08-fc0f : ide3 Here's some lspci output: 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] Subsystem: VIA Technologies, Inc. Unknown device Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B-
Re: CONFIG_PREEMPT -> crash under load in 2.6.20?
Nix wrote: > The lockups are almost total: network traffic ceases, the keyboard goes > dead, nothing hits the disk. Once, however, it locked up while I was > playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but > instead went into a ~1.5s-long tight loop. (Perhaps this was the card > running on its own with no CPU assistance: I don't know enough about > emu10k1 to know if that's plausible.) I've seen the same thing. As I understand it, that's just the card looping through its buffer repeatedly, which isn't getting changed when the software is locked up. > I can't tell if magic sysrq dies, because as far as I know there's no > way to get magic sysrq to do much visible when you're in X, and I can't > get anything to go over the network kernel syslog because the network is > dead. You should still be able to use SysRQ even in X. I tested right now. 1. Have X running already and then start X in another VT $ X :2 vt10 2. Hit Alt+SysRQ+K ---> X dies, display gets corrupted, and keyboard input ignored 3. ssh in from another machine and switch back to the running X instance # chvt 7 > I could begin a (really laborious, ~1 day per iteration) bisection to > try to track this down, but before I start, has anyone seen this before? > Is its cause known? I've been seeing the same behavior under different circumstances; I don't know if it's related. http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing I had noticed, however, is that with the problem I'm experiencing, changing some config options inexplicably delayed the onset of the lockup, but the lockup occurred nonetheless. Have you been running with CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all? It might be helpful if you reported your hardware information; I'd be interested in seeing if there's much in common with my own machine. -Corey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_PREEMPT - crash under load in 2.6.20?
Nix wrote: The lockups are almost total: network traffic ceases, the keyboard goes dead, nothing hits the disk. Once, however, it locked up while I was playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but instead went into a ~1.5s-long tight loop. (Perhaps this was the card running on its own with no CPU assistance: I don't know enough about emu10k1 to know if that's plausible.) I've seen the same thing. As I understand it, that's just the card looping through its buffer repeatedly, which isn't getting changed when the software is locked up. I can't tell if magic sysrq dies, because as far as I know there's no way to get magic sysrq to do much visible when you're in X, and I can't get anything to go over the network kernel syslog because the network is dead. You should still be able to use SysRQ even in X. I tested right now. 1. Have X running already and then start X in another VT $ X :2 vt10 2. Hit Alt+SysRQ+K --- X dies, display gets corrupted, and keyboard input ignored 3. ssh in from another machine and switch back to the running X instance # chvt 7 I could begin a (really laborious, ~1 day per iteration) bisection to try to track this down, but before I start, has anyone seen this before? Is its cause known? I've been seeing the same behavior under different circumstances; I don't know if it's related. http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing I had noticed, however, is that with the problem I'm experiencing, changing some config options inexplicably delayed the onset of the lockup, but the lockup occurred nonetheless. Have you been running with CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all? It might be helpful if you reported your hardware information; I'd be interested in seeing if there's much in common with my own machine. -Corey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_PREEMPT - crash under load in 2.6.20?
On 4 Mar 2007, Corey Hickey told this: Nix wrote: I can't tell if magic sysrq dies, because as far as I know there's no way to get magic sysrq to do much visible when you're in X, and I can't get anything to go over the network kernel syslog because the network is dead. You should still be able to use SysRQ even in X. I tested right now. 1. Have X running already and then start X in another VT $ X :2 vt10 2. Hit Alt+SysRQ+K --- X dies, display gets corrupted, and keyboard input ignored 3. ssh in from another machine and switch back to the running X instance # chvt 7 The network's dead; that's impossible. However, I'll try SAK followed by a reg-and-flags dump: I'd entirely forgotten about SAK. If that doesn't work I'll try to get a kexec dump. (This is all moot if, as seems likely, the system's too dead to respond to the keyboard: we shall see.) I could begin a (really laborious, ~1 day per iteration) bisection to try to track this down, but before I start, has anyone seen this before? Is its cause known? I've been seeing the same behavior under different circumstances; I don't know if it's related. http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html That's using _VOLUNTARY, so I sort of doubt that it's the *same* problem, but I suppose it might be related. (But this is hypothesis in the absence of data :) ) For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing I had noticed, however, is that with the problem I'm experiencing, changing some config options inexplicably delayed the onset of the lockup, but the lockup occurred nonetheless. Have you been running with CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all? 22:58:47 up 10 days, 22:20, 37 users, load average: 12.71, 11.14, 18.22 No problems, and I've been loading the system really rather hard today (as that line makes clear). I think the problem I'm seeing really *is* tied to _PREEMPT. It might be helpful if you reported your hardware information; I'd be interested in seeing if there's much in common with my own machine. Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as soon as possible). Random info starting with loaded modules: loop 11080 0 - Live 0xf0a56000 radeon 107808 2 - Live 0xf0a5c000 drm 62356 3 radeon, Live 0xf0a2f000 processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 6 model name : AMD Athlon(tm) Processor stepping: 2 cpu MHz : 1250.178 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts bogomips: 2501.59 clflush size: 32 CPU0 0: 944756984 IO-APIC-edge timer 1:1397516 IO-APIC-edge i8042 2: 0XT-PIC-XTcascade 6: 5 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 1 IO-APIC-edge rtc 12:2877936 IO-APIC-edge i8042 14: 20262275 IO-APIC-edge ide2 16: 330856361 IO-APIC-fasteoi EMU10K1, [EMAIL PROTECTED]::01:00.0 18: 115648745 IO-APIC-fasteoi gordianet 19: 11059053 IO-APIC-fasteoi ide0, ide1 21: 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb2 NMI: 0 LOC: 944792893 ERR: 0 MIS: 49 -001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : :00:11.1 01f0-01f7 : :00:11.1 01f0-01f7 : ide2 0295-0296 : w83627hf 0376-0376 : :00:11.1 0378-037a : parport0 037b-037f : parport0 03c0-03df : vga+ 03f2-03f5 : floppy 03f6-03f6 : :00:11.1 03f6-03f6 : ide2 03f7-03f7 : floppy DIR 03f8-03ff : serial 0400-0407 : vt596_smbus 0cf8-0cff : PCI conf1 b000-bfff : PCI Bus #01 b800-b8ff : :01:00.0 c800-c81f : :00:11.2 c800-c81f : uhci_hcd cc00-cc1f : :00:11.3 cc00-cc1f : uhci_hcd d000-d07f : :00:07.0 d400-d41f : :00:05.0 d400-d41f : EMU10K1 d800-d80f : :00:0c.0 d800-d807 : ide0 d808-d80f : ide1 dc00-dc03 : :00:0c.0 dc02-dc02 : ide1 e000-e007 : :00:0c.0 e000-e007 : ide1 e400-e403 : :00:0c.0 e402-e402 : ide0 e800-e807 : :00:0c.0 e800-e807 : ide0 ec00-ec07 : :00:05.1 ec00-ec07 : emu10k1-gp fc00-fc0f : :00:11.1 fc00-fc07 : ide2 fc08-fc0f : ide3 Here's some lspci output: 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] Subsystem: VIA Technologies, Inc. Unknown device Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort-
Re: CONFIG_PREEMPT - crash under load in 2.6.20?
Nix wrote: I can't tell if magic sysrq dies, because as far as I know there's no way to get magic sysrq to do much visible when you're in X, and I can't get anything to go over the network kernel syslog because the network is dead. You should still be able to use SysRQ even in X. I tested right now. 1. Have X running already and then start X in another VT $ X :2 vt10 2. Hit Alt+SysRQ+K --- X dies, display gets corrupted, and keyboard input ignored 3. ssh in from another machine and switch back to the running X instance # chvt 7 The network's dead; that's impossible. I'm sorry, you misunderstand: I meant the above steps as a method of confirming that SysRQ normally still works while X is running, not as anything useful to do after your system has hung. Now that I re-read what you wrote initially, however, I think I somewhat misunderstood what you wrote anyway, and you probably already knew that SyrRQ worked in X. Anyway... 22:58:47 up 10 days, 22:20, 37 users, load average: 12.71, 11.14, 18.22 No problems, and I've been loading the system really rather hard today (as that line makes clear). I think the problem I'm seeing really *is* tied to _PREEMPT. Yeah, that's pretty indicative. As for me, I just tried disabling CONFIG_CC_OPTIMIZE_FOR_SIZE; so far so good, but I don't even have 4 hours uptime yet. We'll see. It might be helpful if you reported your hardware information; I'd be interested in seeing if there's much in common with my own machine. Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as soon as possible). Random info starting with loaded modules: [info cut] Nothing really in common with mine. Oh well. -Corey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CONFIG_PREEMPT -> crash under load in 2.6.20?
Since upgrading to 2.6.20, my Athlon 4 has been locking up on a very-roughly-daily basis, generally in periods of some load (I've never seen it lock up when idle, but have seen it lock up with a load average of 0.5). I'm fairly sure this didn't happen with 2.6.19 and am certain that it didn't with 2.6.18. The lockups are almost total: network traffic ceases, the keyboard goes dead, nothing hits the disk. Once, however, it locked up while I was playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but instead went into a ~1.5s-long tight loop. (Perhaps this was the card running on its own with no CPU assistance: I don't know enough about emu10k1 to know if that's plausible.) I turned off CONFIG_PREEMPT and went to CONFIG_PREEMPT_VOLUNTARY, and the lockups ceased. I can't tell if magic sysrq dies, because as far as I know there's no way to get magic sysrq to do much visible when you're in X, and I can't get anything to go over the network kernel syslog because the network is dead. I could begin a (really laborious, ~1 day per iteration) bisection to try to track this down, but before I start, has anyone seen this before? Is its cause known? Crashing .config follows (non-crashing one identical but for CONFIG_PREEMPT, as above): CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE="usr/initramfs.hades" CONFIG_INITRAMFS_ROOT_UID=99 CONFIG_INITRAMFS_ROOT_GID=101 CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=m CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_CFQ=y CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_X86_PC=y CONFIG_MK7=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_USE_3DNOW=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_PREEMPT=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=y CONFIG_VM86=y CONFIG_NOHIGHMEM=y CONFIG_PAGE_OFFSET=0xC000 CONFIG_PROC_MM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MTRR=y CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_PHYSICAL_START=0x10 CONFIG_PHYSICAL_ALIGN=0x10 CONFIG_PM=y CONFIG_PM_LEGACY=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_ISA_DMA_API=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_FIB_HASH=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=m CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_PARPORT_1284=y CONFIG_BLK_DEV_FD=y CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_NBD=m CONFIG_CDROM_PKTCDVD=y CONFIG_CDROM_PKTCDVD_BUFFERS=16 CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y CONFIG_BLK_DEV_IDECD=y CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_OFFBOARD=y CONFIG_BLK_DEV_GENERIC=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_PDC202XX_NEW=y CONFIG_BLK_DEV_VIA82CXXX=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_IDEDMA_AUTO=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_MD=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_SNAPSHOT=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y CONFIG_INPUT=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_PREEMPT - crash under load in 2.6.20?
Since upgrading to 2.6.20, my Athlon 4 has been locking up on a very-roughly-daily basis, generally in periods of some load (I've never seen it lock up when idle, but have seen it lock up with a load average of 0.5). I'm fairly sure this didn't happen with 2.6.19 and am certain that it didn't with 2.6.18. The lockups are almost total: network traffic ceases, the keyboard goes dead, nothing hits the disk. Once, however, it locked up while I was playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but instead went into a ~1.5s-long tight loop. (Perhaps this was the card running on its own with no CPU assistance: I don't know enough about emu10k1 to know if that's plausible.) I turned off CONFIG_PREEMPT and went to CONFIG_PREEMPT_VOLUNTARY, and the lockups ceased. I can't tell if magic sysrq dies, because as far as I know there's no way to get magic sysrq to do much visible when you're in X, and I can't get anything to go over the network kernel syslog because the network is dead. I could begin a (really laborious, ~1 day per iteration) bisection to try to track this down, but before I start, has anyone seen this before? Is its cause known? Crashing .config follows (non-crashing one identical but for CONFIG_PREEMPT, as above): CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE=usr/initramfs.hades CONFIG_INITRAMFS_ROOT_UID=99 CONFIG_INITRAMFS_ROOT_GID=101 CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=m CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_CFQ=y CONFIG_DEFAULT_IOSCHED=cfq CONFIG_X86_PC=y CONFIG_MK7=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_USE_3DNOW=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_PREEMPT=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=y CONFIG_VM86=y CONFIG_NOHIGHMEM=y CONFIG_PAGE_OFFSET=0xC000 CONFIG_PROC_MM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MTRR=y CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_PHYSICAL_START=0x10 CONFIG_PHYSICAL_ALIGN=0x10 CONFIG_PM=y CONFIG_PM_LEGACY=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_ISA_DMA_API=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_FIB_HASH=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG=cubic CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=m CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_PARPORT_1284=y CONFIG_BLK_DEV_FD=y CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_NBD=m CONFIG_CDROM_PKTCDVD=y CONFIG_CDROM_PKTCDVD_BUFFERS=16 CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y CONFIG_BLK_DEV_IDECD=y CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_OFFBOARD=y CONFIG_BLK_DEV_GENERIC=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_PDC202XX_NEW=y CONFIG_BLK_DEV_VIA82CXXX=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_IDEDMA_AUTO=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_MD=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_SNAPSHOT=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y CONFIG_INPUT=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768