Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
On Sun, Mar 27, 2005 at 08:05:13PM +0200, Christophe Saout wrote: > Am Sonntag, den 27.03.2005, 19:26 +0200 schrieb Andi Kleen: > > > > preempt_schedule_irq is not an i386 specific function and seems to take > > > special care of BKL preemption and since reiserfs does use the BKL to do > > > certain things I think this actually might be the problem...? > > > > Hmm, preempt_schedule_irq is not in mainline as far as I can see. > > My patches are always for mainline; i dont do a special > > patch kit for -mm* > > PREEMPT_BKL has been in mainline since 2.6.11-rc1, preempt_schedule_irq > made it in 2.6.11-rc3. Please look here: > http://linux.bkbits.net:8080/linux-2.6/search/?expr=preempt_schedule_irq=ChangeSet+comments Hmm, true. I must have missed it with the last merge. Looking at the changeset your simple patch is probably ok. > > Now that I looked into it I think that it's obviously the correct > solution. Agreed. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
On Sun, Mar 27, 2005 at 08:05:13PM +0200, Christophe Saout wrote: Am Sonntag, den 27.03.2005, 19:26 +0200 schrieb Andi Kleen: preempt_schedule_irq is not an i386 specific function and seems to take special care of BKL preemption and since reiserfs does use the BKL to do certain things I think this actually might be the problem...? Hmm, preempt_schedule_irq is not in mainline as far as I can see. My patches are always for mainline; i dont do a special patch kit for -mm* PREEMPT_BKL has been in mainline since 2.6.11-rc1, preempt_schedule_irq made it in 2.6.11-rc3. Please look here: http://linux.bkbits.net:8080/linux-2.6/search/?expr=preempt_schedule_irqsearch=ChangeSet+comments Hmm, true. I must have missed it with the last merge. Looking at the changeset your simple patch is probably ok. Now that I looked into it I think that it's obviously the correct solution. Agreed. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
Am Sonntag, den 27.03.2005, 19:26 +0200 schrieb Andi Kleen: > > preempt_schedule_irq is not an i386 specific function and seems to take > > special care of BKL preemption and since reiserfs does use the BKL to do > > certain things I think this actually might be the problem...? > > Hmm, preempt_schedule_irq is not in mainline as far as I can see. > My patches are always for mainline; i dont do a special > patch kit for -mm* PREEMPT_BKL has been in mainline since 2.6.11-rc1, preempt_schedule_irq made it in 2.6.11-rc3. Please look here: http://linux.bkbits.net:8080/linux-2.6/search/?expr=preempt_schedule_irq=ChangeSet+comments For i386 the first change was to switch to preempt_schedule in this code path: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED] preempt_schedule takes care of setting PREEMPT_ACTIVE and resetting it afterwards, so I removed that from the assembler code. Then preempt_schedule_irq has been introduced to move the sti/cli back around the call to schedule: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED] So in the end the only thing that the patch I proposed was doing is to *additionally* handle the PREEMPT_BKL case so that schedule doesn't accidentally release the BKL semaphore when it shouldn't because we are preempting and nobody explicitly called schedule. Several other archs have done the same. No bug has shown up until the recent -mm kernel where the execution of this code path actually became possible (the "jc -> jnc" fix some lines above). > It looks like a unfortunate interaction with some other patches > in mm. Andrew, can you disable CONFIG_PREEMPT on x86-64 in > mm for now? These things are in 2.6.11 (except that they never got called because of the wrong interrupt flag check in the IRQ handler). > > Unfortunately I don't have a amd64 machine to play with, so can somebody > > please check this? > > How did you generate the crash dumps above then? Well, nobody minds if I play with a webserver in the middle of the night, as long as it works during the day. Shoot me. :) Both servers are running fine since I applied my patch last night. Now that I looked into it I think that it's obviously the correct solution. signature.asc Description: This is a digitally signed message part
Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
On Fri, Mar 25, 2005 at 08:26:25PM +0100, Christophe Saout wrote: > Fortunately the kernel locked up and there was no data corruption. > > I've got PREEMPT and PREEMPT_BKL enabled under UP. > > I just took a look at the change and found this: > > x86-64 does this (in entry.S): > > bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */ > jnc retint_restore_args > movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx) > sti > call schedule > cli > GET_THREAD_INFO(%rcx) > movl $0,threadinfo_preempt_count(%rcx) > jmp exit_intr > > while i386 does this: > > testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ? > jz restore_all > call preempt_schedule_irq > jmp need_resched > > preempt_schedule_irq is not an i386 specific function and seems to take > special care of BKL preemption and since reiserfs does use the BKL to do > certain things I think this actually might be the problem...? Hmm, preempt_schedule_irq is not in mainline as far as I can see. My patches are always for mainline; i dont do a special patch kit for -mm* It looks like a unfortunate interaction with some other patches in mm. Andrew, can you disable CONFIG_PREEMPT on x86-64 in mm for now? Just calling preempt_schedule_irq may also work, but most likely the patch that introduces that function needs careful reading if it does not require more support from architectures. BTW I suspect it will break other archs too... > Unfortunately I don't have a amd64 machine to play with, so can somebody > please check this? How did you generate the crash dumps above then? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
On Fri, Mar 25, 2005 at 08:26:25PM +0100, Christophe Saout wrote: Fortunately the kernel locked up and there was no data corruption. I've got PREEMPT and PREEMPT_BKL enabled under UP. I just took a look at the change and found this: x86-64 does this (in entry.S): bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */ jnc retint_restore_args movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx) sti call schedule cli GET_THREAD_INFO(%rcx) movl $0,threadinfo_preempt_count(%rcx) jmp exit_intr while i386 does this: testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ? jz restore_all call preempt_schedule_irq jmp need_resched preempt_schedule_irq is not an i386 specific function and seems to take special care of BKL preemption and since reiserfs does use the BKL to do certain things I think this actually might be the problem...? Hmm, preempt_schedule_irq is not in mainline as far as I can see. My patches are always for mainline; i dont do a special patch kit for -mm* It looks like a unfortunate interaction with some other patches in mm. Andrew, can you disable CONFIG_PREEMPT on x86-64 in mm for now? Just calling preempt_schedule_irq may also work, but most likely the patch that introduces that function needs careful reading if it does not require more support from architectures. BTW I suspect it will break other archs too... Unfortunately I don't have a amd64 machine to play with, so can somebody please check this? How did you generate the crash dumps above then? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
Am Sonntag, den 27.03.2005, 19:26 +0200 schrieb Andi Kleen: preempt_schedule_irq is not an i386 specific function and seems to take special care of BKL preemption and since reiserfs does use the BKL to do certain things I think this actually might be the problem...? Hmm, preempt_schedule_irq is not in mainline as far as I can see. My patches are always for mainline; i dont do a special patch kit for -mm* PREEMPT_BKL has been in mainline since 2.6.11-rc1, preempt_schedule_irq made it in 2.6.11-rc3. Please look here: http://linux.bkbits.net:8080/linux-2.6/search/?expr=preempt_schedule_irqsearch=ChangeSet+comments For i386 the first change was to switch to preempt_schedule in this code path: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED] preempt_schedule takes care of setting PREEMPT_ACTIVE and resetting it afterwards, so I removed that from the assembler code. Then preempt_schedule_irq has been introduced to move the sti/cli back around the call to schedule: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED] So in the end the only thing that the patch I proposed was doing is to *additionally* handle the PREEMPT_BKL case so that schedule doesn't accidentally release the BKL semaphore when it shouldn't because we are preempting and nobody explicitly called schedule. Several other archs have done the same. No bug has shown up until the recent -mm kernel where the execution of this code path actually became possible (the jc - jnc fix some lines above). It looks like a unfortunate interaction with some other patches in mm. Andrew, can you disable CONFIG_PREEMPT on x86-64 in mm for now? These things are in 2.6.11 (except that they never got called because of the wrong interrupt flag check in the IRQ handler). Unfortunately I don't have a amd64 machine to play with, so can somebody please check this? How did you generate the crash dumps above then? Well, nobody minds if I play with a webserver in the middle of the night, as long as it works during the day. Shoot me. :) Both servers are running fine since I applied my patch last night. Now that I looked into it I think that it's obviously the correct solution. signature.asc Description: This is a digitally signed message part
x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
Hi, > +x86_64-fix-config_preempt.patch > > x86_64-fix-config_preempt.patch > x86_64: Fix CONFIG_PREEMPT Has this one been stress-tested? I've got the impression that things have become a lot worse. I've been seeing things like these: Mar 25 01:00:48 websrv2 REISERFS: panic (device dm-1): clm-6000: do_balance, fs generation has changed Mar 25 01:00:48 websrv2 Mar 25 01:00:48 websrv2 --- [cut here ] - [please bite here ] - Mar 25 01:00:48 websrv2 Kernel BUG at prints:362 Mar 25 01:00:48 websrv2 invalid operand: [1] PREEMPT Mar 25 01:00:48 websrv2 CPU 0 Mar 25 01:00:48 websrv2 Modules linked in: iptable_nat ipt_MARK iptable_mangle ipt_LOG ipt_multiport ipt_owner ipt_mark ipt_state ipt_REJECT iptable_filter ip_tables twofish serpent blowfish ext3 jbd reiser4 sha256 aes dm_crypt ip_conntrack_irc ip_conntrack_ftp ip_conntrack via_rhine 8139too crc32 Mar 25 01:00:48 websrv2 Pid: 25172, comm: rm Not tainted 2.6.12-rc1-cs1 Mar 25 01:00:48 websrv2 RIP: 0010:[] {reiserfs_panic+211} Mar 25 01:00:48 websrv2 RSP: 0018:81001efe37b8 EFLAGS: 00010292 Mar 25 01:00:48 websrv2 RAX: 0059 RBX: 803fbcac RCX: c100 Mar 25 01:00:48 websrv2 RDX: RSI: 81007d0b31f0 RDI: Mar 25 01:00:48 websrv2 RBP: 81004f960060 R08: 81001efe2000 R09: 0002 Mar 25 01:00:48 websrv2 R10: R11: 80340ef0 R12: 81007f850230 Mar 25 01:00:48 websrv2 R13: 81007f85 R14: R15: 81004f9565d0 Mar 25 01:00:48 websrv2 FS: 2aabaae0() GS:805be800() knlGS:55563dc0 Mar 25 01:00:48 websrv2 CS: 0010 DS: ES: CR0: 8005003b Mar 25 01:00:48 websrv2 CR2: 2aaff008 CR3: 1ebbd000 CR4: 06e0 Mar 25 01:00:48 websrv2 Process rm (pid: 25172, threadinfo 81001efe2000, task 81007d0b31f0) Mar 25 01:00:48 websrv2 Stack: 00300010 81001efe38a8 81001efe37d8 81001c041530 Mar 25 01:00:48 websrv2 81001efe39d8 801d4e42 81007e659a00 0063 Mar 25 01:00:48 websrv2 0063 Mar 25 01:00:48 websrv2 Call Trace:{pathrelse_and_restore+66} {retint_kernel+46} Mar 25 01:00:48 websrv2 {do_balance+39} {do_balance+6901} Mar 25 01:00:48 websrv2 {unfix_nodes+128} {do_balance+10555} Mar 25 01:00:48 websrv2 {reiserfs_cut_from_item+1673} {reiserfs_unlink+362} Mar 25 01:00:48 websrv2 {vfs_unlink+462} {sys_unlink+233} Mar 25 01:00:48 websrv2 {sys_getdents+232} {error_exit+0} Mar 25 01:00:48 websrv2 {system_call+126} Mar 25 01:00:48 websrv2 Mar 25 01:00:48 websrv2 Code: 0f 0b b8 c1 3f 80 ff ff ff ff 6a 01 4d 85 ed 48 c7 c2 40 ba Mar 25 01:00:48 websrv2 RIP {reiserfs_panic+211} RSP or Mar 25 16:39:21 websrv2 VFS: brelse: Trying to free free buffer Mar 25 16:39:21 websrv2 Badness in __brelse at fs/buffer.c:1295 Mar 25 16:39:21 websrv2 Mar 25 16:39:21 websrv2 Call Trace:{__find_get_block+479} {__getblk+37} Mar 25 16:39:21 websrv2 {do_journal_end+2181} {keventd_create_kthread+0} Mar 25 16:39:21 websrv2 {reiserfs_sync_fs+64} {sync_supers+211} Mar 25 16:39:21 websrv2 {wb_kupdate+42} {pdflush+399} Mar 25 16:39:21 websrv2 {wb_kupdate+0} {keventd_create_kthread+0} Mar 25 16:39:21 websrv2 {pdflush+0} {kthread+205} Mar 25 16:39:21 websrv2 {child_rip+8} {keventd_create_kthread+0} Mar 25 16:39:21 websrv2 {kthread+0} {child_rip+0} Fortunately the kernel locked up and there was no data corruption. I've got PREEMPT and PREEMPT_BKL enabled under UP. I just took a look at the change and found this: x86-64 does this (in entry.S): bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */ jnc retint_restore_args movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx) sti call schedule cli GET_THREAD_INFO(%rcx) movl $0,threadinfo_preempt_count(%rcx) jmp exit_intr while i386 does this: testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ? jz restore_all call preempt_schedule_irq jmp need_resched preempt_schedule_irq is not an i386 specific function and seems to take special care of BKL preemption and since reiserfs does use the BKL to do certain things I think this actually might be the problem...? I'm not saying that this fix is wrong (it is obviously the right fix) but it causes another problem to show up. Unfortunately I don't have a amd64 machine to play with, so can somebody please check this? signature.asc Description: This is a digitally signed message part
x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2
Hi, +x86_64-fix-config_preempt.patch x86_64-fix-config_preempt.patch x86_64: Fix CONFIG_PREEMPT Has this one been stress-tested? I've got the impression that things have become a lot worse. I've been seeing things like these: Mar 25 01:00:48 websrv2 REISERFS: panic (device dm-1): clm-6000: do_balance, fs generation has changed Mar 25 01:00:48 websrv2 Mar 25 01:00:48 websrv2 --- [cut here ] - [please bite here ] - Mar 25 01:00:48 websrv2 Kernel BUG at prints:362 Mar 25 01:00:48 websrv2 invalid operand: [1] PREEMPT Mar 25 01:00:48 websrv2 CPU 0 Mar 25 01:00:48 websrv2 Modules linked in: iptable_nat ipt_MARK iptable_mangle ipt_LOG ipt_multiport ipt_owner ipt_mark ipt_state ipt_REJECT iptable_filter ip_tables twofish serpent blowfish ext3 jbd reiser4 sha256 aes dm_crypt ip_conntrack_irc ip_conntrack_ftp ip_conntrack via_rhine 8139too crc32 Mar 25 01:00:48 websrv2 Pid: 25172, comm: rm Not tainted 2.6.12-rc1-cs1 Mar 25 01:00:48 websrv2 RIP: 0010:[801cfe13] 801cfe13{reiserfs_panic+211} Mar 25 01:00:48 websrv2 RSP: 0018:81001efe37b8 EFLAGS: 00010292 Mar 25 01:00:48 websrv2 RAX: 0059 RBX: 803fbcac RCX: c100 Mar 25 01:00:48 websrv2 RDX: RSI: 81007d0b31f0 RDI: Mar 25 01:00:48 websrv2 RBP: 81004f960060 R08: 81001efe2000 R09: 0002 Mar 25 01:00:48 websrv2 R10: R11: 80340ef0 R12: 81007f850230 Mar 25 01:00:48 websrv2 R13: 81007f85 R14: R15: 81004f9565d0 Mar 25 01:00:48 websrv2 FS: 2aabaae0() GS:805be800() knlGS:55563dc0 Mar 25 01:00:48 websrv2 CS: 0010 DS: ES: CR0: 8005003b Mar 25 01:00:48 websrv2 CR2: 2aaff008 CR3: 1ebbd000 CR4: 06e0 Mar 25 01:00:48 websrv2 Process rm (pid: 25172, threadinfo 81001efe2000, task 81007d0b31f0) Mar 25 01:00:48 websrv2 Stack: 00300010 81001efe38a8 81001efe37d8 81001c041530 Mar 25 01:00:48 websrv2 81001efe39d8 801d4e42 81007e659a00 0063 Mar 25 01:00:48 websrv2 0063 Mar 25 01:00:48 websrv2 Call Trace:801d4e42{pathrelse_and_restore+66} 8010efe6{retint_kernel+46} Mar 25 01:00:48 websrv2 801bb847{do_balance+39} 801bd315{do_balance+6901} Mar 25 01:00:48 websrv2 801cbd90{unfix_nodes+128} 801be15b{do_balance+10555} Mar 25 01:00:48 websrv2 801d7bf9{reiserfs_cut_from_item+1673} 801bfcfa{reiserfs_unlink+362} Mar 25 01:00:48 websrv2 801873ae{vfs_unlink+462} 801874f9{sys_unlink+233} Mar 25 01:00:48 websrv2 8018a268{sys_getdents+232} 8010f221{error_exit+0} Mar 25 01:00:48 websrv2 8010e906{system_call+126} Mar 25 01:00:48 websrv2 Mar 25 01:00:48 websrv2 Code: 0f 0b b8 c1 3f 80 ff ff ff ff 6a 01 4d 85 ed 48 c7 c2 40 ba Mar 25 01:00:48 websrv2 RIP 801cfe13{reiserfs_panic+211} RSP 81001efe37b8 or Mar 25 16:39:21 websrv2 VFS: brelse: Trying to free free buffer Mar 25 16:39:21 websrv2 Badness in __brelse at fs/buffer.c:1295 Mar 25 16:39:21 websrv2 Mar 25 16:39:21 websrv2 Call Trace:8017787f{__find_get_block+479} 8017a175{__getblk+37} Mar 25 16:39:21 websrv2 801de3d5{do_journal_end+2181} 80147d70{keventd_create_kthread+0} Mar 25 16:39:21 websrv2 801cbf50{reiserfs_sync_fs+64} 8017c0b3{sync_supers+211} Mar 25 16:39:21 websrv2 8015a22a{wb_kupdate+42} 8015ae8f{pdflush+399} Mar 25 16:39:21 websrv2 8015a200{wb_kupdate+0} 80147d70{keventd_create_kthread+0} Mar 25 16:39:21 websrv2 8015ad00{pdflush+0} 80147d2d{kthread+205} Mar 25 16:39:21 websrv2 8010f3d7{child_rip+8} 80147d70{keventd_create_kthread+0} Mar 25 16:39:21 websrv2 80147c60{kthread+0} 8010f3cf{child_rip+0} Fortunately the kernel locked up and there was no data corruption. I've got PREEMPT and PREEMPT_BKL enabled under UP. I just took a look at the change and found this: x86-64 does this (in entry.S): bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */ jnc retint_restore_args movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx) sti call schedule cli GET_THREAD_INFO(%rcx) movl $0,threadinfo_preempt_count(%rcx) jmp exit_intr while i386 does this: testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ? jz restore_all call preempt_schedule_irq jmp need_resched preempt_schedule_irq is not an i386 specific function and seems to take special care of BKL preemption and since reiserfs does use the BKL to do certain things I think this actually might be the problem...? I'm not saying that this fix is wrong (it is obviously the right fix) but it causes another problem to show up. Unfortunately I don't have