Re: CONFIG_PREEMPT -> crash under load in 2.6.20?

2007-03-04 Thread Corey Hickey
Nix wrote:
>>> I can't tell if magic sysrq dies, because as far as I know there's no
>>> way to get magic sysrq to do much visible when you're in X, and I can't
>>> get anything to go over the network kernel syslog because the network is
>>> dead.
>> You should still be able to use SysRQ even in X. I tested right now.
>> 1. Have X running already and then start X in another VT
>>$ X :2 vt10
>> 2. Hit Alt+SysRQ+K
>>---> X dies, display gets corrupted, and keyboard input ignored
>> 3. ssh in from another machine and switch back to the running X instance
>># chvt 7
> 
> The network's dead; that's impossible.

I'm sorry, you misunderstand: I meant the above steps as a method of
confirming that SysRQ normally still works while X is running, not as
anything useful to do after your system has hung.

Now that I re-read what you wrote initially, however, I think I somewhat
misunderstood what you wrote anyway, and you probably already knew that
SyrRQ worked in X.

Anyway...

>  22:58:47 up 10 days, 22:20, 37 users,  load average: 12.71, 11.14, 18.22
> 
> No problems, and I've been loading the system really rather hard today
> (as that line makes clear). I think the problem I'm seeing really *is*
> tied to _PREEMPT.

Yeah, that's pretty indicative. As for me, I just tried disabling
CONFIG_CC_OPTIMIZE_FOR_SIZE; so far so good, but I don't even have 4
hours uptime yet. We'll see.

>> It might be helpful if you reported your hardware information; I'd be
>> interested in seeing if there's much in common with my own machine.
> 
> Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as
> soon as possible). Random info starting with loaded modules:

[info cut]

Nothing really in common with mine. Oh well.

-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_PREEMPT -> crash under load in 2.6.20?

2007-03-04 Thread Nix
On 4 Mar 2007, Corey Hickey told this:
> Nix wrote:
>> I can't tell if magic sysrq dies, because as far as I know there's no
>> way to get magic sysrq to do much visible when you're in X, and I can't
>> get anything to go over the network kernel syslog because the network is
>> dead.
>
> You should still be able to use SysRQ even in X. I tested right now.
> 1. Have X running already and then start X in another VT
>$ X :2 vt10
> 2. Hit Alt+SysRQ+K
>---> X dies, display gets corrupted, and keyboard input ignored
> 3. ssh in from another machine and switch back to the running X instance
># chvt 7

The network's dead; that's impossible.

However, I'll try SAK followed by a reg-and-flags dump: I'd entirely
forgotten about SAK. If that doesn't work I'll try to get a kexec dump.
(This is all moot if, as seems likely, the system's too dead to respond
to the keyboard: we shall see.)

>> I could begin a (really laborious, ~1 day per iteration) bisection to
>> try to track this down, but before I start, has anyone seen this before? 
>> Is its cause known?
>
> I've been seeing the same behavior under different circumstances; I
> don't know if it's related.
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html

That's using _VOLUNTARY, so I sort of doubt that it's the *same*
problem, but I suppose it might be related. (But this is hypothesis in
the absence of data :) )

> For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing
> I had noticed, however, is that with the problem I'm experiencing,
> changing some config options inexplicably delayed the onset of the
> lockup, but the lockup occurred nonetheless. Have you been running with
> CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all?

 22:58:47 up 10 days, 22:20, 37 users,  load average: 12.71, 11.14, 18.22

No problems, and I've been loading the system really rather hard today
(as that line makes clear). I think the problem I'm seeing really *is*
tied to _PREEMPT.

> It might be helpful if you reported your hardware information; I'd be
> interested in seeing if there's much in common with my own machine.

Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as
soon as possible). Random info starting with loaded modules:

loop 11080 0 - Live 0xf0a56000
radeon 107808 2 - Live 0xf0a5c000
drm 62356 3 radeon, Live 0xf0a2f000

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 6
model name  : AMD Athlon(tm) Processor
stepping: 2
cpu MHz : 1250.178
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts
bogomips: 2501.59
clflush size: 32

   CPU0   
  0:  944756984   IO-APIC-edge  timer
  1:1397516   IO-APIC-edge  i8042
  2:  0XT-PIC-XTcascade
  6:  5   IO-APIC-edge  floppy
  7:  0   IO-APIC-edge  parport0
  8:  1   IO-APIC-edge  rtc
 12:2877936   IO-APIC-edge  i8042
 14:   20262275   IO-APIC-edge  ide2
 16:  330856361   IO-APIC-fasteoi   EMU10K1, [EMAIL PROTECTED]::01:00.0
 18:  115648745   IO-APIC-fasteoi   gordianet
 19:   11059053   IO-APIC-fasteoi   ide0, ide1
 21:  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb2
NMI:  0 
LOC:  944792893 
ERR:  0
MIS: 49


-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : :00:11.1
01f0-01f7 : :00:11.1
  01f0-01f7 : ide2
0295-0296 : w83627hf
0376-0376 : :00:11.1
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f2-03f5 : floppy
03f6-03f6 : :00:11.1
  03f6-03f6 : ide2
03f7-03f7 : floppy DIR
03f8-03ff : serial
0400-0407 : vt596_smbus
0cf8-0cff : PCI conf1
b000-bfff : PCI Bus #01
  b800-b8ff : :01:00.0
c800-c81f : :00:11.2
  c800-c81f : uhci_hcd
cc00-cc1f : :00:11.3
  cc00-cc1f : uhci_hcd
d000-d07f : :00:07.0
d400-d41f : :00:05.0
  d400-d41f : EMU10K1
d800-d80f : :00:0c.0
  d800-d807 : ide0
  d808-d80f : ide1
dc00-dc03 : :00:0c.0
  dc02-dc02 : ide1
e000-e007 : :00:0c.0
  e000-e007 : ide1
e400-e403 : :00:0c.0
  e402-e402 : ide0
e800-e807 : :00:0c.0
  e800-e807 : ide0
ec00-ec07 : :00:05.1
  ec00-ec07 : emu10k1-gp
fc00-fc0f : :00:11.1
  fc00-fc07 : ide2
  fc08-fc0f : ide3


Here's some lspci output:

00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
Subsystem: VIA Technologies, Inc. Unknown device 
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- 

Re: CONFIG_PREEMPT -> crash under load in 2.6.20?

2007-03-04 Thread Corey Hickey
Nix wrote:
> The lockups are almost total: network traffic ceases, the keyboard goes
> dead, nothing hits the disk. Once, however, it locked up while I was
> playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but
> instead went into a ~1.5s-long tight loop. (Perhaps this was the card
> running on its own with no CPU assistance: I don't know enough about
> emu10k1 to know if that's plausible.)

I've seen the same thing. As I understand it, that's just the card
looping through its buffer repeatedly, which isn't getting changed when
the software is locked up.

> I can't tell if magic sysrq dies, because as far as I know there's no
> way to get magic sysrq to do much visible when you're in X, and I can't
> get anything to go over the network kernel syslog because the network is
> dead.

You should still be able to use SysRQ even in X. I tested right now.
1. Have X running already and then start X in another VT
   $ X :2 vt10
2. Hit Alt+SysRQ+K
   ---> X dies, display gets corrupted, and keyboard input ignored
3. ssh in from another machine and switch back to the running X instance
   # chvt 7

> I could begin a (really laborious, ~1 day per iteration) bisection to
> try to track this down, but before I start, has anyone seen this before? 
> Is its cause known?

I've been seeing the same behavior under different circumstances; I
don't know if it's related.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html

For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing
I had noticed, however, is that with the problem I'm experiencing,
changing some config options inexplicably delayed the onset of the
lockup, but the lockup occurred nonetheless. Have you been running with
CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all?

It might be helpful if you reported your hardware information; I'd be
interested in seeing if there's much in common with my own machine.

-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_PREEMPT - crash under load in 2.6.20?

2007-03-04 Thread Corey Hickey
Nix wrote:
 The lockups are almost total: network traffic ceases, the keyboard goes
 dead, nothing hits the disk. Once, however, it locked up while I was
 playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but
 instead went into a ~1.5s-long tight loop. (Perhaps this was the card
 running on its own with no CPU assistance: I don't know enough about
 emu10k1 to know if that's plausible.)

I've seen the same thing. As I understand it, that's just the card
looping through its buffer repeatedly, which isn't getting changed when
the software is locked up.

 I can't tell if magic sysrq dies, because as far as I know there's no
 way to get magic sysrq to do much visible when you're in X, and I can't
 get anything to go over the network kernel syslog because the network is
 dead.

You should still be able to use SysRQ even in X. I tested right now.
1. Have X running already and then start X in another VT
   $ X :2 vt10
2. Hit Alt+SysRQ+K
   --- X dies, display gets corrupted, and keyboard input ignored
3. ssh in from another machine and switch back to the running X instance
   # chvt 7

 I could begin a (really laborious, ~1 day per iteration) bisection to
 try to track this down, but before I start, has anyone seen this before? 
 Is its cause known?

I've been seeing the same behavior under different circumstances; I
don't know if it's related.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html

For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing
I had noticed, however, is that with the problem I'm experiencing,
changing some config options inexplicably delayed the onset of the
lockup, but the lockup occurred nonetheless. Have you been running with
CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all?

It might be helpful if you reported your hardware information; I'd be
interested in seeing if there's much in common with my own machine.

-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_PREEMPT - crash under load in 2.6.20?

2007-03-04 Thread Nix
On 4 Mar 2007, Corey Hickey told this:
 Nix wrote:
 I can't tell if magic sysrq dies, because as far as I know there's no
 way to get magic sysrq to do much visible when you're in X, and I can't
 get anything to go over the network kernel syslog because the network is
 dead.

 You should still be able to use SysRQ even in X. I tested right now.
 1. Have X running already and then start X in another VT
$ X :2 vt10
 2. Hit Alt+SysRQ+K
--- X dies, display gets corrupted, and keyboard input ignored
 3. ssh in from another machine and switch back to the running X instance
# chvt 7

The network's dead; that's impossible.

However, I'll try SAK followed by a reg-and-flags dump: I'd entirely
forgotten about SAK. If that doesn't work I'll try to get a kexec dump.
(This is all moot if, as seems likely, the system's too dead to respond
to the keyboard: we shall see.)

 I could begin a (really laborious, ~1 day per iteration) bisection to
 try to track this down, but before I start, has anyone seen this before? 
 Is its cause known?

 I've been seeing the same behavior under different circumstances; I
 don't know if it's related.

 http://www.uwsg.iu.edu/hypermail/linux/kernel/0703.0/1147.html

That's using _VOLUNTARY, so I sort of doubt that it's the *same*
problem, but I suppose it might be related. (But this is hypothesis in
the absence of data :) )

 For me, though, CONFIG_PREEMPT doesn't seem to have an effect. One thing
 I had noticed, however, is that with the problem I'm experiencing,
 changing some config options inexplicably delayed the onset of the
 lockup, but the lockup occurred nonetheless. Have you been running with
 CONFIG_PREEMPT_VOLUNTARY for a while now and not seen any problems at all?

 22:58:47 up 10 days, 22:20, 37 users,  load average: 12.71, 11.14, 18.22

No problems, and I've been loading the system really rather hard today
(as that line makes clear). I think the problem I'm seeing really *is*
tied to _PREEMPT.

 It might be helpful if you reported your hardware information; I'd be
 interested in seeing if there's much in common with my own machine.

Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as
soon as possible). Random info starting with loaded modules:

loop 11080 0 - Live 0xf0a56000
radeon 107808 2 - Live 0xf0a5c000
drm 62356 3 radeon, Live 0xf0a2f000

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 6
model name  : AMD Athlon(tm) Processor
stepping: 2
cpu MHz : 1250.178
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts
bogomips: 2501.59
clflush size: 32

   CPU0   
  0:  944756984   IO-APIC-edge  timer
  1:1397516   IO-APIC-edge  i8042
  2:  0XT-PIC-XTcascade
  6:  5   IO-APIC-edge  floppy
  7:  0   IO-APIC-edge  parport0
  8:  1   IO-APIC-edge  rtc
 12:2877936   IO-APIC-edge  i8042
 14:   20262275   IO-APIC-edge  ide2
 16:  330856361   IO-APIC-fasteoi   EMU10K1, [EMAIL PROTECTED]::01:00.0
 18:  115648745   IO-APIC-fasteoi   gordianet
 19:   11059053   IO-APIC-fasteoi   ide0, ide1
 21:  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb2
NMI:  0 
LOC:  944792893 
ERR:  0
MIS: 49


-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : :00:11.1
01f0-01f7 : :00:11.1
  01f0-01f7 : ide2
0295-0296 : w83627hf
0376-0376 : :00:11.1
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f2-03f5 : floppy
03f6-03f6 : :00:11.1
  03f6-03f6 : ide2
03f7-03f7 : floppy DIR
03f8-03ff : serial
0400-0407 : vt596_smbus
0cf8-0cff : PCI conf1
b000-bfff : PCI Bus #01
  b800-b8ff : :01:00.0
c800-c81f : :00:11.2
  c800-c81f : uhci_hcd
cc00-cc1f : :00:11.3
  cc00-cc1f : uhci_hcd
d000-d07f : :00:07.0
d400-d41f : :00:05.0
  d400-d41f : EMU10K1
d800-d80f : :00:0c.0
  d800-d807 : ide0
  d808-d80f : ide1
dc00-dc03 : :00:0c.0
  dc02-dc02 : ide1
e000-e007 : :00:0c.0
  e000-e007 : ide1
e400-e403 : :00:0c.0
  e402-e402 : ide0
e800-e807 : :00:0c.0
  e800-e807 : ide0
ec00-ec07 : :00:05.1
  ec00-ec07 : emu10k1-gp
fc00-fc0f : :00:11.1
  fc00-fc07 : ide2
  fc08-fc0f : ide3


Here's some lspci output:

00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
Subsystem: VIA Technologies, Inc. Unknown device 
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 

Re: CONFIG_PREEMPT - crash under load in 2.6.20?

2007-03-04 Thread Corey Hickey
Nix wrote:
 I can't tell if magic sysrq dies, because as far as I know there's no
 way to get magic sysrq to do much visible when you're in X, and I can't
 get anything to go over the network kernel syslog because the network is
 dead.
 You should still be able to use SysRQ even in X. I tested right now.
 1. Have X running already and then start X in another VT
$ X :2 vt10
 2. Hit Alt+SysRQ+K
--- X dies, display gets corrupted, and keyboard input ignored
 3. ssh in from another machine and switch back to the running X instance
# chvt 7
 
 The network's dead; that's impossible.

I'm sorry, you misunderstand: I meant the above steps as a method of
confirming that SysRQ normally still works while X is running, not as
anything useful to do after your system has hung.

Now that I re-read what you wrote initially, however, I think I somewhat
misunderstood what you wrote anyway, and you probably already knew that
SyrRQ worked in X.

Anyway...

  22:58:47 up 10 days, 22:20, 37 users,  load average: 12.71, 11.14, 18.22
 
 No problems, and I've been loading the system really rather hard today
 (as that line makes clear). I think the problem I'm seeing really *is*
 tied to _PREEMPT.

Yeah, that's pretty indicative. As for me, I just tried disabling
CONFIG_CC_OPTIMIZE_FOR_SIZE; so far so good, but I don't even have 4
hours uptime yet. We'll see.

 It might be helpful if you reported your hardware information; I'd be
 interested in seeing if there's much in common with my own machine.
 
 Athlon 4 (UP), 768Mb RAM. No ACPI (to rule out a large nasty spot as
 soon as possible). Random info starting with loaded modules:

[info cut]

Nothing really in common with mine. Oh well.

-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CONFIG_PREEMPT -> crash under load in 2.6.20?

2007-03-03 Thread Nix
Since upgrading to 2.6.20, my Athlon 4 has been locking up on a
very-roughly-daily basis, generally in periods of some load (I've never
seen it lock up when idle, but have seen it lock up with a load average
of 0.5). I'm fairly sure this didn't happen with 2.6.19 and am certain
that it didn't with 2.6.18.

The lockups are almost total: network traffic ceases, the keyboard goes
dead, nothing hits the disk. Once, however, it locked up while I was
playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but
instead went into a ~1.5s-long tight loop. (Perhaps this was the card
running on its own with no CPU assistance: I don't know enough about
emu10k1 to know if that's plausible.)

I turned off CONFIG_PREEMPT and went to CONFIG_PREEMPT_VOLUNTARY, and
the lockups ceased.


I can't tell if magic sysrq dies, because as far as I know there's no
way to get magic sysrq to do much visible when you're in X, and I can't
get anything to go over the network kernel syslog because the network is
dead.

I could begin a (really laborious, ~1 day per iteration) bisection to
try to track this down, but before I start, has anyone seen this before? 
Is its cause known?


Crashing .config follows (non-crashing one identical but for
CONFIG_PREEMPT, as above):

CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_RELAY=y
CONFIG_INITRAMFS_SOURCE="usr/initramfs.hades"
CONFIG_INITRAMFS_ROOT_UID=99
CONFIG_INITRAMFS_ROOT_GID=101
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_X86_PC=y
CONFIG_MK7=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_PREEMPT=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_VM86=y
CONFIG_NOHIGHMEM=y
CONFIG_PAGE_OFFSET=0xC000
CONFIG_PROC_MM=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MTRR=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_PHYSICAL_START=0x10
CONFIG_PHYSICAL_ALIGN=0x10
CONFIG_PM=y
CONFIG_PM_LEGACY=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_ISA_DMA_API=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_FIB_HASH=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_1284=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=16
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_OFFBOARD=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_AUTO=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_MD=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_MIRROR=y
CONFIG_DM_ZERO=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768

CONFIG_PREEMPT - crash under load in 2.6.20?

2007-03-03 Thread Nix
Since upgrading to 2.6.20, my Athlon 4 has been locking up on a
very-roughly-daily basis, generally in periods of some load (I've never
seen it lock up when idle, but have seen it lock up with a load average
of 0.5). I'm fairly sure this didn't happen with 2.6.19 and am certain
that it didn't with 2.6.18.

The lockups are almost total: network traffic ceases, the keyboard goes
dead, nothing hits the disk. Once, however, it locked up while I was
playing an ogg (emu10k1 / SB Live), and the sound did *not* die, but
instead went into a ~1.5s-long tight loop. (Perhaps this was the card
running on its own with no CPU assistance: I don't know enough about
emu10k1 to know if that's plausible.)

I turned off CONFIG_PREEMPT and went to CONFIG_PREEMPT_VOLUNTARY, and
the lockups ceased.


I can't tell if magic sysrq dies, because as far as I know there's no
way to get magic sysrq to do much visible when you're in X, and I can't
get anything to go over the network kernel syslog because the network is
dead.

I could begin a (really laborious, ~1 day per iteration) bisection to
try to track this down, but before I start, has anyone seen this before? 
Is its cause known?


Crashing .config follows (non-crashing one identical but for
CONFIG_PREEMPT, as above):

CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_RELAY=y
CONFIG_INITRAMFS_SOURCE=usr/initramfs.hades
CONFIG_INITRAMFS_ROOT_UID=99
CONFIG_INITRAMFS_ROOT_GID=101
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED=cfq
CONFIG_X86_PC=y
CONFIG_MK7=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_PREEMPT=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_VM86=y
CONFIG_NOHIGHMEM=y
CONFIG_PAGE_OFFSET=0xC000
CONFIG_PROC_MM=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MTRR=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_PHYSICAL_START=0x10
CONFIG_PHYSICAL_ALIGN=0x10
CONFIG_PM=y
CONFIG_PM_LEGACY=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_ISA_DMA_API=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_FIB_HASH=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG=cubic
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_1284=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=16
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_OFFBOARD=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_AUTO=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_MD=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_MIRROR=y
CONFIG_DM_ZERO=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768