Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
Just re-ran the test 4-5 times, could not reproduce this one, but I'll keep running this kernel w/patch for a while and see if it happens again. On Fri, 26 Jan 2007, Andrew Morton wrote: > On Wed, 24 Jan 2007 18:37:15 -0500 (EST) > Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and > > > others > > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become > > > unmapped > > > during memcpy() or similar operations. Try disabling preempt -- that > > > seems to > > > be the > > > common factor. > > > > > > > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > > the body of a message to [EMAIL PROTECTED] > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > After I run some other tests, I am going to re-run this test and see if it > > OOPSes again with PREEMPT off. > > Strange. The below debug patch might catch it - please run with this > applied. > > > --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging > +++ a/arch/i386/mm/highmem.c > @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu > { > enum fixed_addresses idx; > unsigned long vaddr; > + static unsigned warn_count = 10; > > + if (unlikely(warn_count == 0)) > + goto skip; > + > + if (unlikely(in_interrupt())) { > + if (in_irq()) { > + if (type != KM_IRQ0 && type != KM_IRQ1 && > + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ && > + type != KM_BOUNCE_READ) { > + WARN_ON(1); > + warn_count--; > + } > + } else if (!irqs_disabled()) { /* softirq */ > + if (type != KM_IRQ0 && type != KM_IRQ1 && > + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 && > + type != KM_SKB_SUNRPC_DATA && > + type != KM_SKB_DATA_SOFTIRQ && > + type != KM_BOUNCE_READ) { > + WARN_ON(1); > + warn_count--; > + } > + } > + } > + > + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) { > + if (!irqs_disabled()) { > + WARN_ON(1); > + warn_count--; > + } > + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { > + if (irq_count() == 0 && !irqs_disabled()) { > + WARN_ON(1); > + warn_count--; > + } > + } > +skip: > /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ > pagefault_disable(); > if (!PageHighMem(page)) > _ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Fri, 26 Jan 2007, Andrew Morton wrote: > On Wed, 24 Jan 2007 18:37:15 -0500 (EST) > Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and > > > others > > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become > > > unmapped > > > during memcpy() or similar operations. Try disabling preempt -- that > > > seems to > > > be the > > > common factor. > > > > > > > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > > the body of a message to [EMAIL PROTECTED] > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > After I run some other tests, I am going to re-run this test and see if it > > OOPSes again with PREEMPT off. > > Strange. The below debug patch might catch it - please run with this > applied. > > > --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging > +++ a/arch/i386/mm/highmem.c > @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu > { > enum fixed_addresses idx; > unsigned long vaddr; > + static unsigned warn_count = 10; > > + if (unlikely(warn_count == 0)) > + goto skip; > + > + if (unlikely(in_interrupt())) { > + if (in_irq()) { > + if (type != KM_IRQ0 && type != KM_IRQ1 && > + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ && > + type != KM_BOUNCE_READ) { > + WARN_ON(1); > + warn_count--; > + } > + } else if (!irqs_disabled()) { /* softirq */ > + if (type != KM_IRQ0 && type != KM_IRQ1 && > + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 && > + type != KM_SKB_SUNRPC_DATA && > + type != KM_SKB_DATA_SOFTIRQ && > + type != KM_BOUNCE_READ) { > + WARN_ON(1); > + warn_count--; > + } > + } > + } > + > + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) { > + if (!irqs_disabled()) { > + WARN_ON(1); > + warn_count--; > + } > + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { > + if (irq_count() == 0 && !irqs_disabled()) { > + WARN_ON(1); > + warn_count--; > + } > + } > +skip: > /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ > pagefault_disable(); > if (!PageHighMem(page)) > _ > > The RAID5 bug may be hard to trigger, I have only made it happen once so far (but only tried it once, don't like locking up the raid :)), I will re-run the test after applying this patch. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Wed, 24 Jan 2007 18:37:15 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> wrote: > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and > > others > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become > > unmapped > > during memcpy() or similar operations. Try disabling preempt -- that seems > > to > > be the > > common factor. > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > After I run some other tests, I am going to re-run this test and see if it > OOPSes again with PREEMPT off. Strange. The below debug patch might catch it - please run with this applied. --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging +++ a/arch/i386/mm/highmem.c @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu { enum fixed_addresses idx; unsigned long vaddr; + static unsigned warn_count = 10; + if (unlikely(warn_count == 0)) + goto skip; + + if (unlikely(in_interrupt())) { + if (in_irq()) { + if (type != KM_IRQ0 && type != KM_IRQ1 && + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ && + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } else if (!irqs_disabled()) { /* softirq */ + if (type != KM_IRQ0 && type != KM_IRQ1 && + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 && + type != KM_SKB_SUNRPC_DATA && + type != KM_SKB_DATA_SOFTIRQ && + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } + } + + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) { + if (!irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { + if (irq_count() == 0 && !irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } +skip: /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Wed, 24 Jan 2007 18:37:15 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html After I run some other tests, I am going to re-run this test and see if it OOPSes again with PREEMPT off. Strange. The below debug patch might catch it - please run with this applied. --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging +++ a/arch/i386/mm/highmem.c @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu { enum fixed_addresses idx; unsigned long vaddr; + static unsigned warn_count = 10; + if (unlikely(warn_count == 0)) + goto skip; + + if (unlikely(in_interrupt())) { + if (in_irq()) { + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_BIO_SRC_IRQ type != KM_BIO_DST_IRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } else if (!irqs_disabled()) { /* softirq */ + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_SOFTIRQ0 type != KM_SOFTIRQ1 + type != KM_SKB_SUNRPC_DATA + type != KM_SKB_DATA_SOFTIRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } + } + + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) { + if (!irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { + if (irq_count() == 0 !irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } +skip: /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Fri, 26 Jan 2007, Andrew Morton wrote: On Wed, 24 Jan 2007 18:37:15 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html After I run some other tests, I am going to re-run this test and see if it OOPSes again with PREEMPT off. Strange. The below debug patch might catch it - please run with this applied. --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging +++ a/arch/i386/mm/highmem.c @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu { enum fixed_addresses idx; unsigned long vaddr; + static unsigned warn_count = 10; + if (unlikely(warn_count == 0)) + goto skip; + + if (unlikely(in_interrupt())) { + if (in_irq()) { + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_BIO_SRC_IRQ type != KM_BIO_DST_IRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } else if (!irqs_disabled()) { /* softirq */ + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_SOFTIRQ0 type != KM_SOFTIRQ1 + type != KM_SKB_SUNRPC_DATA + type != KM_SKB_DATA_SOFTIRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } + } + + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) { + if (!irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { + if (irq_count() == 0 !irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } +skip: /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) _ The RAID5 bug may be hard to trigger, I have only made it happen once so far (but only tried it once, don't like locking up the raid :)), I will re-run the test after applying this patch. Justin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
Just re-ran the test 4-5 times, could not reproduce this one, but I'll keep running this kernel w/patch for a while and see if it happens again. On Fri, 26 Jan 2007, Andrew Morton wrote: On Wed, 24 Jan 2007 18:37:15 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html After I run some other tests, I am going to re-run this test and see if it OOPSes again with PREEMPT off. Strange. The below debug patch might catch it - please run with this applied. --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging +++ a/arch/i386/mm/highmem.c @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu { enum fixed_addresses idx; unsigned long vaddr; + static unsigned warn_count = 10; + if (unlikely(warn_count == 0)) + goto skip; + + if (unlikely(in_interrupt())) { + if (in_irq()) { + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_BIO_SRC_IRQ type != KM_BIO_DST_IRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } else if (!irqs_disabled()) { /* softirq */ + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_SOFTIRQ0 type != KM_SOFTIRQ1 + type != KM_SKB_SUNRPC_DATA + type != KM_SKB_DATA_SOFTIRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } + } + + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) { + if (!irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { + if (irq_count() == 0 !irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } +skip: /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Mon, 22 Jan 2007, Chuck Ebbert wrote: > Justin Piszcz wrote: > > My .config is attached, please let me know if any other information is > > needed and please CC (lkml) as I am not on the list, thanks! > > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to > > the RAID5 running XFS. > > > > Any idea what happened here? > > > > [473795.214705] BUG: unable to handle kernel paging request at virtual > > address fffb92b0 > > [473795.214715] printing eip: > > [473795.214718] c0358b14 > > [473795.214721] *pde = 3067 > > [473795.214723] *pte = > > [473795.214726] Oops: [#1] > > [473795.214729] PREEMPT SMP [473795.214736] CPU:0 > > [473795.214737] EIP:0060:[]Not tainted VLI > > [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1) > > [473795.214746] EIP is at copy_data+0x6c/0x179 > > [473795.214750] eax: ebx: 1000 ecx: 0354 edx: > > fffb9000 > > [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 1000 esp: > > f7927dc4 > > [473795.214757] ds: 007b es: 007b ss: 0068 > > [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 > > task.ti=f7926000) > > [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 > > 006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80 > > 0190 c0358d00 aa6fee88 [473795.214863]d7c5794c 0001 > > da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call > > Trace: > > [473795.214880] [] compute_parity5+0xdf/0x497 > > [473795.214887] [] handle_stripe+0x930/0x2986 > > [473795.214892] [] find_busiest_group+0x124/0x4fd > > [473795.214898] [] release_stripe+0x21/0x2e > > [473795.214902] [] raid5d+0x100/0x161 > > [473795.214907] [] md_thread+0x40/0x103 > > [473795.214912] [] autoremove_wake_function+0x0/0x4b > > [473795.214917] [] md_thread+0x0/0x103 > > [473795.214922] [] kthread+0xfc/0x100 > > [473795.214926] [] kthread+0x0/0x100 > > [473795.214930] [] kernel_thread_helper+0x7/0x1c > > [473795.214935] === > > [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01 > > c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02 > > a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 > > [473795.215017] EIP: [] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4 > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and > others > have reported starting with 2.6.19: pages mapped with kmap_atomic() become > unmapped > during memcpy() or similar operations. Try disabling preempt -- that seems to > be the > common factor. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > After I run some other tests, I am going to re-run this test and see if it OOPSes again with PREEMPT off. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Mon, 22 Jan 2007, Chuck Ebbert wrote: Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? [473795.214705] BUG: unable to handle kernel paging request at virtual address fffb92b0 [473795.214715] printing eip: [473795.214718] c0358b14 [473795.214721] *pde = 3067 [473795.214723] *pte = [473795.214726] Oops: [#1] [473795.214729] PREEMPT SMP [473795.214736] CPU:0 [473795.214737] EIP:0060:[c0358b14]Not tainted VLI [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1) [473795.214746] EIP is at copy_data+0x6c/0x179 [473795.214750] eax: ebx: 1000 ecx: 0354 edx: fffb9000 [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 1000 esp: f7927dc4 [473795.214757] ds: 007b es: 007b ss: 0068 [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000) [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88 [473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call Trace: [473795.214880] [c0358d00] compute_parity5+0xdf/0x497 [473795.214887] [c035b0dd] handle_stripe+0x930/0x2986 [473795.214892] [c01146b9] find_busiest_group+0x124/0x4fd [473795.214898] [c03580e0] release_stripe+0x21/0x2e [473795.214902] [c035d233] raid5d+0x100/0x161 [473795.214907] [c036b03c] md_thread+0x40/0x103 [473795.214912] [c012dbbe] autoremove_wake_function+0x0/0x4b [473795.214917] [c036affc] md_thread+0x0/0x103 [473795.214922] [c012da1a] kthread+0xfc/0x100 [473795.214926] [c012d91e] kthread+0x0/0x100 [473795.214930] [c0103b4b] kernel_thread_helper+0x7/0x1c [473795.214935] === [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02 f3 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 [473795.215017] EIP: [c0358b14] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4 Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html After I run some other tests, I am going to re-run this test and see if it OOPSes again with PREEMPT off. Justin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Tue, 23 Jan 2007, Michael Tokarev wrote: > Justin Piszcz wrote: > > > > On Tue, 23 Jan 2007, Michael Tokarev wrote: > > > >> Disabling pre-emption on critical and/or server machines seems to be a good > >> idea in the first place. IMHO anyway.. ;) > > > > So bottom line is make sure not to use preemption on servers or else you > > will get weird spinlock/deadlocks on RAID devices--GOOD To know! > > This is not a reason. The reason is that preemption usually works worse > on servers, esp. high-loaded servers - the more often you interrupt a > (kernel) work, the more nedleess context switches you'll have, and the > more slow the whole thing works. > > Another point is that with preemption enabled, we have more chances to > hit one or another bug somewhere. Those bugs should be found and fixed > for sure, but important servers/data isn't a place usually for bughunting. > > /mjt > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Thanks for the update/info. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
Justin Piszcz wrote: > > On Tue, 23 Jan 2007, Michael Tokarev wrote: > >> Disabling pre-emption on critical and/or server machines seems to be a good >> idea in the first place. IMHO anyway.. ;) > > So bottom line is make sure not to use preemption on servers or else you > will get weird spinlock/deadlocks on RAID devices--GOOD To know! This is not a reason. The reason is that preemption usually works worse on servers, esp. high-loaded servers - the more often you interrupt a (kernel) work, the more nedleess context switches you'll have, and the more slow the whole thing works. Another point is that with preemption enabled, we have more chances to hit one or another bug somewhere. Those bugs should be found and fixed for sure, but important servers/data isn't a place usually for bughunting. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Tue, 23 Jan 2007, Michael Tokarev wrote: > Justin Piszcz wrote: > [] > > Is this a bug that can or will be fixed or should I disable pre-emption on > > critical and/or server machines? > > Disabling pre-emption on critical and/or server machines seems to be a good > idea in the first place. IMHO anyway.. ;) > > /mjt > So for a server system, the following options should be as follows: Preemption Model (No Forced Preemption (Server)) ---> [ ] Preempt The Big Kernel Lock Also, my mobo has HPET timer support in the BIOS, is there any reason to use this on a server? I do run X on it via the Intel 965 chipset video. So bottom line is make sure not to use preemption on servers or else you will get weird spinlock/deadlocks on RAID devices--GOOD To know! Thanks! Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
Justin Piszcz wrote: [] > Is this a bug that can or will be fixed or should I disable pre-emption on > critical and/or server machines? Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) /mjt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Tue, 23 Jan 2007, Neil Brown wrote: > On Monday January 22, [EMAIL PROTECTED] wrote: > > Justin Piszcz wrote: > > > My .config is attached, please let me know if any other information is > > > needed and please CC (lkml) as I am not on the list, thanks! > > > > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba > > > to > > > the RAID5 running XFS. > > > > > > Any idea what happened here? > > > > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin > > and others > > have reported starting with 2.6.19: pages mapped with kmap_atomic() > > become unmapped > > during memcpy() or similar operations. Try disabling preempt -- that > > seems to be the > > common factor. > > That is exactly the conclusion I had just come to (a kmap_atomic page > must be being unmapped during memcpy). I wasn't aware that others had > reported it - thanks for that. > > Turning off CONFIG_PREEMPT certainly seems like a good idea. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Is this a bug that can or will be fixed or should I disable pre-emption on critical and/or server machines? Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Tue, 23 Jan 2007, Michael Tokarev wrote: Justin Piszcz wrote: On Tue, 23 Jan 2007, Michael Tokarev wrote: Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) So bottom line is make sure not to use preemption on servers or else you will get weird spinlock/deadlocks on RAID devices--GOOD To know! This is not a reason. The reason is that preemption usually works worse on servers, esp. high-loaded servers - the more often you interrupt a (kernel) work, the more nedleess context switches you'll have, and the more slow the whole thing works. Another point is that with preemption enabled, we have more chances to hit one or another bug somewhere. Those bugs should be found and fixed for sure, but important servers/data isn't a place usually for bughunting. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for the update/info. Justin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Tue, 23 Jan 2007, Neil Brown wrote: On Monday January 22, [EMAIL PROTECTED] wrote: Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. That is exactly the conclusion I had just come to (a kmap_atomic page must be being unmapped during memcpy). I wasn't aware that others had reported it - thanks for that. Turning off CONFIG_PREEMPT certainly seems like a good idea. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Is this a bug that can or will be fixed or should I disable pre-emption on critical and/or server machines? Justin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
Justin Piszcz wrote: [] Is this a bug that can or will be fixed or should I disable pre-emption on critical and/or server machines? Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) /mjt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Tue, 23 Jan 2007, Michael Tokarev wrote: Justin Piszcz wrote: [] Is this a bug that can or will be fixed or should I disable pre-emption on critical and/or server machines? Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) /mjt So for a server system, the following options should be as follows: Preemption Model (No Forced Preemption (Server)) --- [ ] Preempt The Big Kernel Lock Also, my mobo has HPET timer support in the BIOS, is there any reason to use this on a server? I do run X on it via the Intel 965 chipset video. So bottom line is make sure not to use preemption on servers or else you will get weird spinlock/deadlocks on RAID devices--GOOD To know! Thanks! Justin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
Justin Piszcz wrote: On Tue, 23 Jan 2007, Michael Tokarev wrote: Disabling pre-emption on critical and/or server machines seems to be a good idea in the first place. IMHO anyway.. ;) So bottom line is make sure not to use preemption on servers or else you will get weird spinlock/deadlocks on RAID devices--GOOD To know! This is not a reason. The reason is that preemption usually works worse on servers, esp. high-loaded servers - the more often you interrupt a (kernel) work, the more nedleess context switches you'll have, and the more slow the whole thing works. Another point is that with preemption enabled, we have more chances to hit one or another bug somewhere. Those bugs should be found and fixed for sure, but important servers/data isn't a place usually for bughunting. /mjt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Monday January 22, [EMAIL PROTECTED] wrote: > On 1/22/07, Neil Brown <[EMAIL PROTECTED]> wrote: > > On Monday January 22, [EMAIL PROTECTED] wrote: > > > Justin Piszcz wrote: > > > > My .config is attached, please let me know if any other information is > > > > needed and please CC (lkml) as I am not on the list, thanks! > > > > > > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba > > > > to > > > > the RAID5 running XFS. > > > > > > > > Any idea what happened here? > > > > > > > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin > > > and others > > > have reported starting with 2.6.19: pages mapped with kmap_atomic() > > > become unmapped > > > during memcpy() or similar operations. Try disabling preempt -- that > > > seems to be the > > > common factor. > > > > That is exactly the conclusion I had just come to (a kmap_atomic page > > must be being unmapped during memcpy). I wasn't aware that others had > > reported it - thanks for that. > > > > Turning off CONFIG_PREEMPT certainly seems like a good idea. > > > Coming from an ARM background I am not yet versed in the inner > workings of kmap_atomic, but if you have time for a question I am > curious as to why spin_lock(>lock) is not sufficient pre-emption > protection for copy_data() in this case? > Presumably there is a bug somewhere. kmap_atomic itself calls inc_preempt_count so that preemption should be disabled at least until the kunmap_atomic is called. But apparently not. The symptoms point exactly to the page getting unmapped when it shouldn't. Until that bug is found and fixed, the work around of turning of CONFIG_PREEMPT seems to make sense. Of course it would be great if someone who can easily reproduce this bug could do the 'git bisect' thing to find out where the bug crept in. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On 1/22/07, Neil Brown <[EMAIL PROTECTED]> wrote: On Monday January 22, [EMAIL PROTECTED] wrote: > Justin Piszcz wrote: > > My .config is attached, please let me know if any other information is > > needed and please CC (lkml) as I am not on the list, thanks! > > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to > > the RAID5 running XFS. > > > > Any idea what happened here? > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin > and others > have reported starting with 2.6.19: pages mapped with kmap_atomic() > become unmapped > during memcpy() or similar operations. Try disabling preempt -- that > seems to be the > common factor. That is exactly the conclusion I had just come to (a kmap_atomic page must be being unmapped during memcpy). I wasn't aware that others had reported it - thanks for that. Turning off CONFIG_PREEMPT certainly seems like a good idea. Coming from an ARM background I am not yet versed in the inner workings of kmap_atomic, but if you have time for a question I am curious as to why spin_lock(>lock) is not sufficient pre-emption protection for copy_data() in this case? NeilBrown Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Monday January 22, [EMAIL PROTECTED] wrote: > Justin Piszcz wrote: > > My .config is attached, please let me know if any other information is > > needed and please CC (lkml) as I am not on the list, thanks! > > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to > > the RAID5 running XFS. > > > > Any idea what happened here? > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin > and others > have reported starting with 2.6.19: pages mapped with kmap_atomic() > become unmapped > during memcpy() or similar operations. Try disabling preempt -- that > seems to be the > common factor. That is exactly the conclusion I had just come to (a kmap_atomic page must be being unmapped during memcpy). I wasn't aware that others had reported it - thanks for that. Turning off CONFIG_PREEMPT certainly seems like a good idea. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? [473795.214705] BUG: unable to handle kernel paging request at virtual address fffb92b0 [473795.214715] printing eip: [473795.214718] c0358b14 [473795.214721] *pde = 3067 [473795.214723] *pte = [473795.214726] Oops: [#1] [473795.214729] PREEMPT SMP [473795.214736] CPU:0 [473795.214737] EIP:0060:[]Not tainted VLI [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1) [473795.214746] EIP is at copy_data+0x6c/0x179 [473795.214750] eax: ebx: 1000 ecx: 0354 edx: fffb9000 [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 1000 esp: f7927dc4 [473795.214757] ds: 007b es: 007b ss: 0068 [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000) [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88 [473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call Trace: [473795.214880] [] compute_parity5+0xdf/0x497 [473795.214887] [] handle_stripe+0x930/0x2986 [473795.214892] [] find_busiest_group+0x124/0x4fd [473795.214898] [] release_stripe+0x21/0x2e [473795.214902] [] raid5d+0x100/0x161 [473795.214907] [] md_thread+0x40/0x103 [473795.214912] [] autoremove_wake_function+0x0/0x4b [473795.214917] [] md_thread+0x0/0x103 [473795.214922] [] kthread+0xfc/0x100 [473795.214926] [] kthread+0x0/0x100 [473795.214930] [] kernel_thread_helper+0x7/0x1c [473795.214935] === [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 [473795.215017] EIP: [] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4 Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? [473795.214705] BUG: unable to handle kernel paging request at virtual address fffb92b0 [473795.214715] printing eip: [473795.214718] c0358b14 [473795.214721] *pde = 3067 [473795.214723] *pte = [473795.214726] Oops: [#1] [473795.214729] PREEMPT SMP [473795.214736] CPU:0 [473795.214737] EIP:0060:[c0358b14]Not tainted VLI [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1) [473795.214746] EIP is at copy_data+0x6c/0x179 [473795.214750] eax: ebx: 1000 ecx: 0354 edx: fffb9000 [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 1000 esp: f7927dc4 [473795.214757] ds: 007b es: 007b ss: 0068 [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000) [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88 [473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call Trace: [473795.214880] [c0358d00] compute_parity5+0xdf/0x497 [473795.214887] [c035b0dd] handle_stripe+0x930/0x2986 [473795.214892] [c01146b9] find_busiest_group+0x124/0x4fd [473795.214898] [c03580e0] release_stripe+0x21/0x2e [473795.214902] [c035d233] raid5d+0x100/0x161 [473795.214907] [c036b03c] md_thread+0x40/0x103 [473795.214912] [c012dbbe] autoremove_wake_function+0x0/0x4b [473795.214917] [c036affc] md_thread+0x0/0x103 [473795.214922] [c012da1a] kthread+0xfc/0x100 [473795.214926] [c012d91e] kthread+0x0/0x100 [473795.214930] [c0103b4b] kernel_thread_helper+0x7/0x1c [473795.214935] === [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02 f3 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 [473795.215017] EIP: [c0358b14] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4 Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Monday January 22, [EMAIL PROTECTED] wrote: Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. That is exactly the conclusion I had just come to (a kmap_atomic page must be being unmapped during memcpy). I wasn't aware that others had reported it - thanks for that. Turning off CONFIG_PREEMPT certainly seems like a good idea. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On 1/22/07, Neil Brown [EMAIL PROTECTED] wrote: On Monday January 22, [EMAIL PROTECTED] wrote: Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. That is exactly the conclusion I had just come to (a kmap_atomic page must be being unmapped during memcpy). I wasn't aware that others had reported it - thanks for that. Turning off CONFIG_PREEMPT certainly seems like a good idea. Coming from an ARM background I am not yet versed in the inner workings of kmap_atomic, but if you have time for a question I am curious as to why spin_lock(sh-lock) is not sufficient pre-emption protection for copy_data() in this case? NeilBrown Regards, Dan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Monday January 22, [EMAIL PROTECTED] wrote: On 1/22/07, Neil Brown [EMAIL PROTECTED] wrote: On Monday January 22, [EMAIL PROTECTED] wrote: Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? Without digging too deeply, I'd say you've hit the same bug Sami Farin and others have reported starting with 2.6.19: pages mapped with kmap_atomic() become unmapped during memcpy() or similar operations. Try disabling preempt -- that seems to be the common factor. That is exactly the conclusion I had just come to (a kmap_atomic page must be being unmapped during memcpy). I wasn't aware that others had reported it - thanks for that. Turning off CONFIG_PREEMPT certainly seems like a good idea. Coming from an ARM background I am not yet versed in the inner workings of kmap_atomic, but if you have time for a question I am curious as to why spin_lock(sh-lock) is not sufficient pre-emption protection for copy_data() in this case? Presumably there is a bug somewhere. kmap_atomic itself calls inc_preempt_count so that preemption should be disabled at least until the kunmap_atomic is called. But apparently not. The symptoms point exactly to the page getting unmapped when it shouldn't. Until that bug is found and fixed, the work around of turning of CONFIG_PREEMPT seems to make sense. Of course it would be great if someone who can easily reproduce this bug could do the 'git bisect' thing to find out where the bug crept in. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
On Sat, 20 Jan 2007, Justin Piszcz wrote: > My .config is attached, please let me know if any other information is > needed and please CC (lkml) as I am not on the list, thanks! > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to > the RAID5 running XFS. > > Any idea what happened here? > It happened again under heavy read I/O when I was running md5sum -c on some of my files. [ 551.942958] BUG: unable to handle kernel paging request at virtual address fffb97b0 [ 551.942970] printing eip: [ 551.942972] c0358bd8 [ 551.942974] *pde = 3067 [ 551.942976] *pte = [ 551.942980] Oops: 0002 [#1] [ 551.942982] PREEMPT SMP [ 551.942989] CPU:0 [ 551.942990] EIP:0060:[]Not tainted VLI [ 551.942991] EFLAGS: 00010286 (2.6.19.2 #1) [ 551.942999] EIP is at copy_data+0x130/0x179 [ 551.943001] eax: ebx: 1000 ecx: 0214 edx: fffb9000 [ 551.943005] esi: dd2007b0 edi: fffb97b0 ebp: 1000 esp: f76ffe1c [ 551.943007] ds: 007b es: 007b ss: 0068 [ 551.943011] Process md4_raid5 (pid: 1309, ti=f76fe000 task=f7081560 task.ti=f76fe000) [ 551.943013] Stack: c1d880c0 0003 cd2f0540 dd20 000e 00a8 [ 551.943027]1000 cd2f0540 dd1f1adc f6435c48 dd1f1ad8 c035a977 34f3db20 c027be16 [ 551.943043]c0553328 0002 0002 c01146b9 f6435c48 c0553328 f6435c48 dd1f193c [ 551.943056] Call Trace: [ 551.943059] [] handle_stripe+0x1ca/0x2986 [ 551.943065] [] __next_cpu+0x22/0x33 [ 551.943072] [] find_busiest_group+0x124/0x4fd [ 551.943136] [] __wake_up+0x32/0x43 [ 551.943140] [] release_stripe+0x21/0x2e [ 551.943145] [] raid5d+0x100/0x161 [ 551.943150] [] md_thread+0x40/0x103 [ 551.943155] [] autoremove_wake_function+0x0/0x4b [ 551.943160] [] md_thread+0x0/0x103 [ 551.943165] [] kthread+0xfc/0x100 [ 551.943169] [] kthread+0x0/0x100 [ 551.943173] [] kernel_thread_helper+0x7/0x1c [ 551.943178] === [ 551.943180] Code: 8b 4c 24 08 8b 41 2c 8b 4c 24 1c 03 54 08 08 8b 44 24 0c 85 c0 0f 85 3a ff ff ff 89 d9 c1 e9 02 8b 44 24 18 8d 3c 02 03 74 24 10 a5 89 d9 83 e1 03 74 02 f3 a4 e9 37 ff ff ff 01 ee 89 74 24 [ 551.943254] EIP: [] copy_data+0x130/0x179 SS:ESP 0068:f76ffe1c [ 551.943262] <6>note: md4_raid5[1309] exited with preempt_count 3 I will run resync/check on this array and then see if that fixes it. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? [473795.214705] BUG: unable to handle kernel paging request at virtual address fffb92b0 [473795.214715] printing eip: [473795.214718] c0358b14 [473795.214721] *pde = 3067 [473795.214723] *pte = [473795.214726] Oops: [#1] [473795.214729] PREEMPT SMP [473795.214736] CPU:0 [473795.214737] EIP:0060:[]Not tainted VLI [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1) [473795.214746] EIP is at copy_data+0x6c/0x179 [473795.214750] eax: ebx: 1000 ecx: 0354 edx: fffb9000 [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 1000 esp: f7927dc4 [473795.214757] ds: 007b es: 007b ss: 0068 [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000) [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88 [473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call Trace: [473795.214880] [] compute_parity5+0xdf/0x497 [473795.214887] [] handle_stripe+0x930/0x2986 [473795.214892] [] find_busiest_group+0x124/0x4fd [473795.214898] [] release_stripe+0x21/0x2e [473795.214902] [] raid5d+0x100/0x161 [473795.214907] [] md_thread+0x40/0x103 [473795.214912] [] autoremove_wake_function+0x0/0x4b [473795.214917] [] md_thread+0x0/0x103 [473795.214922] [] kthread+0xfc/0x100 [473795.214926] [] kthread+0x0/0x100 [473795.214930] [] kernel_thread_helper+0x7/0x1c [473795.214935] === [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 [473795.215017] EIP: [] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4 [473795.215024] <6>note: md4_raid5[1305] exited with preempt_count 2 # mdadm -D /dev/md4 /dev/md4: Version : 01.00.03 Creation Time : Wed Jan 10 15:58:52 2007 Raid Level : raid5 Array Size : 1562834432 (1490.44 GiB 1600.34 GB) Device Size : 781417216 (372.61 GiB 400.09 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 4 Persistence : Superblock is persistent Update Time : Sat Jan 20 07:15:01 2007 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K Name : 4 UUID : 7f453e18:893e4dd9:6e810372:4c724f49 Events : 33 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 811 active sync /dev/sdf1 2 8 1132 active sync /dev/sdh1 3 8 653 active sync /dev/sde1 5 8 494 active sync /dev/sdd1 config.bz2 Description: Binary data
Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? [473795.214705] BUG: unable to handle kernel paging request at virtual address fffb92b0 [473795.214715] printing eip: [473795.214718] c0358b14 [473795.214721] *pde = 3067 [473795.214723] *pte = [473795.214726] Oops: [#1] [473795.214729] PREEMPT SMP [473795.214736] CPU:0 [473795.214737] EIP:0060:[c0358b14]Not tainted VLI [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1) [473795.214746] EIP is at copy_data+0x6c/0x179 [473795.214750] eax: ebx: 1000 ecx: 0354 edx: fffb9000 [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 1000 esp: f7927dc4 [473795.214757] ds: 007b es: 007b ss: 0068 [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000) [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88 [473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call Trace: [473795.214880] [c0358d00] compute_parity5+0xdf/0x497 [473795.214887] [c035b0dd] handle_stripe+0x930/0x2986 [473795.214892] [c01146b9] find_busiest_group+0x124/0x4fd [473795.214898] [c03580e0] release_stripe+0x21/0x2e [473795.214902] [c035d233] raid5d+0x100/0x161 [473795.214907] [c036b03c] md_thread+0x40/0x103 [473795.214912] [c012dbbe] autoremove_wake_function+0x0/0x4b [473795.214917] [c036affc] md_thread+0x0/0x103 [473795.214922] [c012da1a] kthread+0xfc/0x100 [473795.214926] [c012d91e] kthread+0x0/0x100 [473795.214930] [c0103b4b] kernel_thread_helper+0x7/0x1c [473795.214935] === [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02 f3 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 [473795.215017] EIP: [c0358b14] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4 [473795.215024] 6note: md4_raid5[1305] exited with preempt_count 2 # mdadm -D /dev/md4 /dev/md4: Version : 01.00.03 Creation Time : Wed Jan 10 15:58:52 2007 Raid Level : raid5 Array Size : 1562834432 (1490.44 GiB 1600.34 GB) Device Size : 781417216 (372.61 GiB 400.09 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 4 Persistence : Superblock is persistent Update Time : Sat Jan 20 07:15:01 2007 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K Name : 4 UUID : 7f453e18:893e4dd9:6e810372:4c724f49 Events : 33 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 811 active sync /dev/sdf1 2 8 1132 active sync /dev/sdh1 3 8 653 active sync /dev/sde1 5 8 494 active sync /dev/sdd1 config.bz2 Description: Binary data
Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)
On Sat, 20 Jan 2007, Justin Piszcz wrote: My .config is attached, please let me know if any other information is needed and please CC (lkml) as I am not on the list, thanks! Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to the RAID5 running XFS. Any idea what happened here? It happened again under heavy read I/O when I was running md5sum -c on some of my files. [ 551.942958] BUG: unable to handle kernel paging request at virtual address fffb97b0 [ 551.942970] printing eip: [ 551.942972] c0358bd8 [ 551.942974] *pde = 3067 [ 551.942976] *pte = [ 551.942980] Oops: 0002 [#1] [ 551.942982] PREEMPT SMP [ 551.942989] CPU:0 [ 551.942990] EIP:0060:[c0358bd8]Not tainted VLI [ 551.942991] EFLAGS: 00010286 (2.6.19.2 #1) [ 551.942999] EIP is at copy_data+0x130/0x179 [ 551.943001] eax: ebx: 1000 ecx: 0214 edx: fffb9000 [ 551.943005] esi: dd2007b0 edi: fffb97b0 ebp: 1000 esp: f76ffe1c [ 551.943007] ds: 007b es: 007b ss: 0068 [ 551.943011] Process md4_raid5 (pid: 1309, ti=f76fe000 task=f7081560 task.ti=f76fe000) [ 551.943013] Stack: c1d880c0 0003 cd2f0540 dd20 000e 00a8 [ 551.943027]1000 cd2f0540 dd1f1adc f6435c48 dd1f1ad8 c035a977 34f3db20 c027be16 [ 551.943043]c0553328 0002 0002 c01146b9 f6435c48 c0553328 f6435c48 dd1f193c [ 551.943056] Call Trace: [ 551.943059] [c035a977] handle_stripe+0x1ca/0x2986 [ 551.943065] [c027be16] __next_cpu+0x22/0x33 [ 551.943072] [c01146b9] find_busiest_group+0x124/0x4fd [ 551.943136] [c01140af] __wake_up+0x32/0x43 [ 551.943140] [c03580e0] release_stripe+0x21/0x2e [ 551.943145] [c035d233] raid5d+0x100/0x161 [ 551.943150] [c036b03c] md_thread+0x40/0x103 [ 551.943155] [c012dbbe] autoremove_wake_function+0x0/0x4b [ 551.943160] [c036affc] md_thread+0x0/0x103 [ 551.943165] [c012da1a] kthread+0xfc/0x100 [ 551.943169] [c012d91e] kthread+0x0/0x100 [ 551.943173] [c0103b4b] kernel_thread_helper+0x7/0x1c [ 551.943178] === [ 551.943180] Code: 8b 4c 24 08 8b 41 2c 8b 4c 24 1c 03 54 08 08 8b 44 24 0c 85 c0 0f 85 3a ff ff ff 89 d9 c1 e9 02 8b 44 24 18 8d 3c 02 03 74 24 10 f3 a5 89 d9 83 e1 03 74 02 f3 a4 e9 37 ff ff ff 01 ee 89 74 24 [ 551.943254] EIP: [c0358bd8] copy_data+0x130/0x179 SS:ESP 0068:f76ffe1c [ 551.943262] 6note: md4_raid5[1309] exited with preempt_count 3 I will run resync/check on this array and then see if that fixes it. Justin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/