Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-26 Thread Justin Piszcz
Just re-ran the test 4-5 times, could not reproduce this one, but I'll 
keep running this kernel w/patch for a while and see if it happens again.

On Fri, 26 Jan 2007, Andrew Morton wrote:

> On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
> Justin Piszcz <[EMAIL PROTECTED]> wrote:
> 
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> > > others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> > > unmapped
> > > during memcpy() or similar operations.  Try disabling preempt -- that 
> > > seems to
> > > be the
> > > common factor.
> > > 
> > > 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to [EMAIL PROTECTED]
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > 
> > After I run some other tests, I am going to re-run this test and see if it 
> > OOPSes again with PREEMPT off.
> 
> Strange.  The below debug patch might catch it - please run with this
> applied.  
> 
> 
> --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
> +++ a/arch/i386/mm/highmem.c
> @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
>  {
>   enum fixed_addresses idx;
>   unsigned long vaddr;
> + static unsigned warn_count = 10;
>  
> + if (unlikely(warn_count == 0))
> + goto skip;
> +
> + if (unlikely(in_interrupt())) {
> + if (in_irq()) {
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (!irqs_disabled()) {  /* softirq */
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
> + type != KM_SKB_SUNRPC_DATA &&
> + type != KM_SKB_DATA_SOFTIRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> + }
> +
> + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
> + if (!irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
> + if (irq_count() == 0 && !irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> +skip:
>   /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
>   pagefault_disable();
>   if (!PageHighMem(page))
> _
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-26 Thread Justin Piszcz


On Fri, 26 Jan 2007, Andrew Morton wrote:

> On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
> Justin Piszcz <[EMAIL PROTECTED]> wrote:
> 
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> > > others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> > > unmapped
> > > during memcpy() or similar operations.  Try disabling preempt -- that 
> > > seems to
> > > be the
> > > common factor.
> > > 
> > > 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to [EMAIL PROTECTED]
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > 
> > After I run some other tests, I am going to re-run this test and see if it 
> > OOPSes again with PREEMPT off.
> 
> Strange.  The below debug patch might catch it - please run with this
> applied.  
> 
> 
> --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
> +++ a/arch/i386/mm/highmem.c
> @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
>  {
>   enum fixed_addresses idx;
>   unsigned long vaddr;
> + static unsigned warn_count = 10;
>  
> + if (unlikely(warn_count == 0))
> + goto skip;
> +
> + if (unlikely(in_interrupt())) {
> + if (in_irq()) {
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (!irqs_disabled()) {  /* softirq */
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
> + type != KM_SKB_SUNRPC_DATA &&
> + type != KM_SKB_DATA_SOFTIRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> + }
> +
> + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
> + if (!irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
> + if (irq_count() == 0 && !irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> +skip:
>   /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
>   pagefault_disable();
>   if (!PageHighMem(page))
> _
> 
> 

The RAID5 bug may be hard to trigger, I have only made it happen once so 
far (but only tried it once, don't like locking up the raid :)), I will 
re-run the test after applying this patch.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-26 Thread Andrew Morton
On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
Justin Piszcz <[EMAIL PROTECTED]> wrote:

> > Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> > others
> > have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> > unmapped
> > during memcpy() or similar operations.  Try disabling preempt -- that seems 
> > to
> > be the
> > common factor.
> > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> After I run some other tests, I am going to re-run this test and see if it 
> OOPSes again with PREEMPT off.

Strange.  The below debug patch might catch it - please run with this
applied.  


--- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
+++ a/arch/i386/mm/highmem.c
@@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
 {
enum fixed_addresses idx;
unsigned long vaddr;
+   static unsigned warn_count = 10;
 
+   if (unlikely(warn_count == 0))
+   goto skip;
+
+   if (unlikely(in_interrupt())) {
+   if (in_irq()) {
+   if (type != KM_IRQ0 && type != KM_IRQ1 &&
+   type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
+   type != KM_BOUNCE_READ) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   } else if (!irqs_disabled()) {  /* softirq */
+   if (type != KM_IRQ0 && type != KM_IRQ1 &&
+   type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
+   type != KM_SKB_SUNRPC_DATA &&
+   type != KM_SKB_DATA_SOFTIRQ &&
+   type != KM_BOUNCE_READ) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   }
+   }
+
+   if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
+   if (!irqs_disabled()) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
+   if (irq_count() == 0 && !irqs_disabled()) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   }
+skip:
/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
pagefault_disable();
if (!PageHighMem(page))
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-26 Thread Andrew Morton
On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
Justin Piszcz [EMAIL PROTECTED] wrote:

  Without digging too deeply, I'd say you've hit the same bug Sami Farin and
  others
  have reported starting with 2.6.19: pages mapped with kmap_atomic() become
  unmapped
  during memcpy() or similar operations.  Try disabling preempt -- that seems 
  to
  be the
  common factor.
  
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
 
 After I run some other tests, I am going to re-run this test and see if it 
 OOPSes again with PREEMPT off.

Strange.  The below debug patch might catch it - please run with this
applied.  


--- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
+++ a/arch/i386/mm/highmem.c
@@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
 {
enum fixed_addresses idx;
unsigned long vaddr;
+   static unsigned warn_count = 10;
 
+   if (unlikely(warn_count == 0))
+   goto skip;
+
+   if (unlikely(in_interrupt())) {
+   if (in_irq()) {
+   if (type != KM_IRQ0  type != KM_IRQ1 
+   type != KM_BIO_SRC_IRQ  type != KM_BIO_DST_IRQ 
+   type != KM_BOUNCE_READ) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   } else if (!irqs_disabled()) {  /* softirq */
+   if (type != KM_IRQ0  type != KM_IRQ1 
+   type != KM_SOFTIRQ0  type != KM_SOFTIRQ1 
+   type != KM_SKB_SUNRPC_DATA 
+   type != KM_SKB_DATA_SOFTIRQ 
+   type != KM_BOUNCE_READ) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   }
+   }
+
+   if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
+   if (!irqs_disabled()) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
+   if (irq_count() == 0  !irqs_disabled()) {
+   WARN_ON(1);
+   warn_count--;
+   }
+   }
+skip:
/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
pagefault_disable();
if (!PageHighMem(page))
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-26 Thread Justin Piszcz


On Fri, 26 Jan 2007, Andrew Morton wrote:

 On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
 Justin Piszcz [EMAIL PROTECTED] wrote:
 
   Without digging too deeply, I'd say you've hit the same bug Sami Farin and
   others
   have reported starting with 2.6.19: pages mapped with kmap_atomic() become
   unmapped
   during memcpy() or similar operations.  Try disabling preempt -- that 
   seems to
   be the
   common factor.
   
   
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
   
  
  After I run some other tests, I am going to re-run this test and see if it 
  OOPSes again with PREEMPT off.
 
 Strange.  The below debug patch might catch it - please run with this
 applied.  
 
 
 --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
 +++ a/arch/i386/mm/highmem.c
 @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
  {
   enum fixed_addresses idx;
   unsigned long vaddr;
 + static unsigned warn_count = 10;
  
 + if (unlikely(warn_count == 0))
 + goto skip;
 +
 + if (unlikely(in_interrupt())) {
 + if (in_irq()) {
 + if (type != KM_IRQ0  type != KM_IRQ1 
 + type != KM_BIO_SRC_IRQ  type != KM_BIO_DST_IRQ 
 + type != KM_BOUNCE_READ) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + } else if (!irqs_disabled()) {  /* softirq */
 + if (type != KM_IRQ0  type != KM_IRQ1 
 + type != KM_SOFTIRQ0  type != KM_SOFTIRQ1 
 + type != KM_SKB_SUNRPC_DATA 
 + type != KM_SKB_DATA_SOFTIRQ 
 + type != KM_BOUNCE_READ) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + }
 + }
 +
 + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
 + if (!irqs_disabled()) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
 + if (irq_count() == 0  !irqs_disabled()) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + }
 +skip:
   /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
   pagefault_disable();
   if (!PageHighMem(page))
 _
 
 

The RAID5 bug may be hard to trigger, I have only made it happen once so 
far (but only tried it once, don't like locking up the raid :)), I will 
re-run the test after applying this patch.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-26 Thread Justin Piszcz
Just re-ran the test 4-5 times, could not reproduce this one, but I'll 
keep running this kernel w/patch for a while and see if it happens again.

On Fri, 26 Jan 2007, Andrew Morton wrote:

 On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
 Justin Piszcz [EMAIL PROTECTED] wrote:
 
   Without digging too deeply, I'd say you've hit the same bug Sami Farin and
   others
   have reported starting with 2.6.19: pages mapped with kmap_atomic() become
   unmapped
   during memcpy() or similar operations.  Try disabling preempt -- that 
   seems to
   be the
   common factor.
   
   
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
   
  
  After I run some other tests, I am going to re-run this test and see if it 
  OOPSes again with PREEMPT off.
 
 Strange.  The below debug patch might catch it - please run with this
 applied.  
 
 
 --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
 +++ a/arch/i386/mm/highmem.c
 @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
  {
   enum fixed_addresses idx;
   unsigned long vaddr;
 + static unsigned warn_count = 10;
  
 + if (unlikely(warn_count == 0))
 + goto skip;
 +
 + if (unlikely(in_interrupt())) {
 + if (in_irq()) {
 + if (type != KM_IRQ0  type != KM_IRQ1 
 + type != KM_BIO_SRC_IRQ  type != KM_BIO_DST_IRQ 
 + type != KM_BOUNCE_READ) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + } else if (!irqs_disabled()) {  /* softirq */
 + if (type != KM_IRQ0  type != KM_IRQ1 
 + type != KM_SOFTIRQ0  type != KM_SOFTIRQ1 
 + type != KM_SKB_SUNRPC_DATA 
 + type != KM_SKB_DATA_SOFTIRQ 
 + type != KM_BOUNCE_READ) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + }
 + }
 +
 + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
 + if (!irqs_disabled()) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
 + if (irq_count() == 0  !irqs_disabled()) {
 + WARN_ON(1);
 + warn_count--;
 + }
 + }
 +skip:
   /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
   pagefault_disable();
   if (!PageHighMem(page))
 _
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-24 Thread Justin Piszcz


On Mon, 22 Jan 2007, Chuck Ebbert wrote:

> Justin Piszcz wrote:
> > My .config is attached, please let me know if any other information is
> > needed and please CC (lkml) as I am not on the list, thanks!
> >
> > Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to
> > the RAID5 running XFS.
> >
> > Any idea what happened here?
> >
> > [473795.214705] BUG: unable to handle kernel paging request at virtual
> > address fffb92b0
> > [473795.214715]  printing eip:
> > [473795.214718] c0358b14
> > [473795.214721] *pde = 3067
> > [473795.214723] *pte = 
> > [473795.214726] Oops:  [#1]
> > [473795.214729] PREEMPT SMP [473795.214736] CPU:0
> > [473795.214737] EIP:0060:[]Not tainted VLI
> > [473795.214738] EFLAGS: 00010286   (2.6.19.2 #1)
> > [473795.214746] EIP is at copy_data+0x6c/0x179
> > [473795.214750] eax:    ebx: 1000   ecx: 0354   edx:
> > fffb9000
> > [473795.214754] esi: fffb92b0   edi: da86c2b0   ebp: 1000   esp:
> > f7927dc4
> > [473795.214757] ds: 007b   es: 007b   ss: 0068
> > [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030
> > task.ti=f7926000)
> > [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009
> >  006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80
> > 0190 c0358d00 aa6fee88  [473795.214863]d7c5794c 0001
> > da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call
> > Trace:
> > [473795.214880]  [] compute_parity5+0xdf/0x497
> > [473795.214887]  [] handle_stripe+0x930/0x2986
> > [473795.214892]  [] find_busiest_group+0x124/0x4fd
> > [473795.214898]  [] release_stripe+0x21/0x2e
> > [473795.214902]  [] raid5d+0x100/0x161
> > [473795.214907]  [] md_thread+0x40/0x103
> > [473795.214912]  [] autoremove_wake_function+0x0/0x4b
> > [473795.214917]  [] md_thread+0x0/0x103
> > [473795.214922]  [] kthread+0xfc/0x100
> > [473795.214926]  [] kthread+0x0/0x100
> > [473795.214930]  [] kernel_thread_helper+0x7/0x1c
> > [473795.214935]  ===
> > [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01
> > c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02
> >  a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14
> > [473795.215017] EIP: [] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4
> >   
> Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> others
> have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> unmapped
> during memcpy() or similar operations.  Try disabling preempt -- that seems to
> be the
> common factor.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

After I run some other tests, I am going to re-run this test and see if it 
OOPSes again with PREEMPT off.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-24 Thread Justin Piszcz


On Mon, 22 Jan 2007, Chuck Ebbert wrote:

 Justin Piszcz wrote:
  My .config is attached, please let me know if any other information is
  needed and please CC (lkml) as I am not on the list, thanks!
 
  Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to
  the RAID5 running XFS.
 
  Any idea what happened here?
 
  [473795.214705] BUG: unable to handle kernel paging request at virtual
  address fffb92b0
  [473795.214715]  printing eip:
  [473795.214718] c0358b14
  [473795.214721] *pde = 3067
  [473795.214723] *pte = 
  [473795.214726] Oops:  [#1]
  [473795.214729] PREEMPT SMP [473795.214736] CPU:0
  [473795.214737] EIP:0060:[c0358b14]Not tainted VLI
  [473795.214738] EFLAGS: 00010286   (2.6.19.2 #1)
  [473795.214746] EIP is at copy_data+0x6c/0x179
  [473795.214750] eax:    ebx: 1000   ecx: 0354   edx:
  fffb9000
  [473795.214754] esi: fffb92b0   edi: da86c2b0   ebp: 1000   esp:
  f7927dc4
  [473795.214757] ds: 007b   es: 007b   ss: 0068
  [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030
  task.ti=f7926000)
  [473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009
   006c [473795.214790]1000 da8536a8 aa6fee90 f5538c80
  0190 c0358d00 aa6fee88  [473795.214863]d7c5794c 0001
  da853488 f6fbec70 f6fbebc0 0001 0005 0001 [473795.214876] Call
  Trace:
  [473795.214880]  [c0358d00] compute_parity5+0xdf/0x497
  [473795.214887]  [c035b0dd] handle_stripe+0x930/0x2986
  [473795.214892]  [c01146b9] find_busiest_group+0x124/0x4fd
  [473795.214898]  [c03580e0] release_stripe+0x21/0x2e
  [473795.214902]  [c035d233] raid5d+0x100/0x161
  [473795.214907]  [c036b03c] md_thread+0x40/0x103
  [473795.214912]  [c012dbbe] autoremove_wake_function+0x0/0x4b
  [473795.214917]  [c036affc] md_thread+0x0/0x103
  [473795.214922]  [c012da1a] kthread+0xfc/0x100
  [473795.214926]  [c012d91e] kthread+0x0/0x100
  [473795.214930]  [c0103b4b] kernel_thread_helper+0x7/0x1c
  [473795.214935]  ===
  [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01
  c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02
  f3 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14
  [473795.215017] EIP: [c0358b14] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4

 Without digging too deeply, I'd say you've hit the same bug Sami Farin and
 others
 have reported starting with 2.6.19: pages mapped with kmap_atomic() become
 unmapped
 during memcpy() or similar operations.  Try disabling preempt -- that seems to
 be the
 common factor.
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

After I run some other tests, I am going to re-run this test and see if it 
OOPSes again with PREEMPT off.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-23 Thread Justin Piszcz


On Tue, 23 Jan 2007, Michael Tokarev wrote:

> Justin Piszcz wrote:
> > 
> > On Tue, 23 Jan 2007, Michael Tokarev wrote:
> > 
> >> Disabling pre-emption on critical and/or server machines seems to be a good
> >> idea in the first place.  IMHO anyway.. ;)
> >
> > So bottom line is make sure not to use preemption on servers or else you 
> > will get weird spinlock/deadlocks on RAID devices--GOOD To know!
> 
> This is not a reason.  The reason is that preemption usually works worse
> on servers, esp. high-loaded servers - the more often you interrupt a
> (kernel) work, the more nedleess context switches you'll have, and the
> more slow the whole thing works.
> 
> Another point is that with preemption enabled, we have more chances to
> hit one or another bug somewhere.  Those bugs should be found and fixed
> for sure, but important servers/data isn't a place usually for bughunting.
> 
> /mjt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Thanks for the update/info.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-23 Thread Michael Tokarev
Justin Piszcz wrote:
> 
> On Tue, 23 Jan 2007, Michael Tokarev wrote:
> 
>> Disabling pre-emption on critical and/or server machines seems to be a good
>> idea in the first place.  IMHO anyway.. ;)
>
> So bottom line is make sure not to use preemption on servers or else you 
> will get weird spinlock/deadlocks on RAID devices--GOOD To know!

This is not a reason.  The reason is that preemption usually works worse
on servers, esp. high-loaded servers - the more often you interrupt a
(kernel) work, the more nedleess context switches you'll have, and the
more slow the whole thing works.

Another point is that with preemption enabled, we have more chances to
hit one or another bug somewhere.  Those bugs should be found and fixed
for sure, but important servers/data isn't a place usually for bughunting.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-23 Thread Justin Piszcz


On Tue, 23 Jan 2007, Michael Tokarev wrote:

> Justin Piszcz wrote:
> []
> > Is this a bug that can or will be fixed or should I disable pre-emption on 
> > critical and/or server machines?
> 
> Disabling pre-emption on critical and/or server machines seems to be a good
> idea in the first place.  IMHO anyway.. ;)
> 
> /mjt
> 

So for a server system, the following options should be as follows:

Preemption Model (No Forced Preemption (Server))  --->
[ ] Preempt The Big Kernel Lock

Also, my mobo has HPET timer support in the BIOS, is there any reason to 
use this on a server? I do run X on it via the Intel 965 chipset video.

So bottom line is make sure not to use preemption on servers or else you 
will get weird spinlock/deadlocks on RAID devices--GOOD To know!

Thanks!

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-23 Thread Michael Tokarev
Justin Piszcz wrote:
[]
> Is this a bug that can or will be fixed or should I disable pre-emption on 
> critical and/or server machines?

Disabling pre-emption on critical and/or server machines seems to be a good
idea in the first place.  IMHO anyway.. ;)

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-23 Thread Justin Piszcz


On Tue, 23 Jan 2007, Neil Brown wrote:

> On Monday January 22, [EMAIL PROTECTED] wrote:
> > Justin Piszcz wrote:
> > > My .config is attached, please let me know if any other information is 
> > > needed and please CC (lkml) as I am not on the list, thanks!
> > >
> > > Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba 
> > > to 
> > > the RAID5 running XFS.
> > >
> > > Any idea what happened here?
> 
> > >   
> > Without digging too deeply, I'd say you've hit the same bug Sami Farin 
> > and others
> > have reported starting with 2.6.19: pages mapped with kmap_atomic() 
> > become unmapped
> > during memcpy() or similar operations.  Try disabling preempt -- that 
> > seems to be the
> > common factor.
> 
> That is exactly the conclusion I had just come to (a kmap_atomic page
> must be being unmapped during memcpy).  I wasn't aware that others had
> reported it - thanks for that.
> 
> Turning off CONFIG_PREEMPT certainly seems like a good idea.
> 
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Is this a bug that can or will be fixed or should I disable pre-emption on 
critical and/or server machines?

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Justin Piszcz


On Tue, 23 Jan 2007, Michael Tokarev wrote:

 Justin Piszcz wrote:
  
  On Tue, 23 Jan 2007, Michael Tokarev wrote:
  
  Disabling pre-emption on critical and/or server machines seems to be a good
  idea in the first place.  IMHO anyway.. ;)
 
  So bottom line is make sure not to use preemption on servers or else you 
  will get weird spinlock/deadlocks on RAID devices--GOOD To know!
 
 This is not a reason.  The reason is that preemption usually works worse
 on servers, esp. high-loaded servers - the more often you interrupt a
 (kernel) work, the more nedleess context switches you'll have, and the
 more slow the whole thing works.
 
 Another point is that with preemption enabled, we have more chances to
 hit one or another bug somewhere.  Those bugs should be found and fixed
 for sure, but important servers/data isn't a place usually for bughunting.
 
 /mjt
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Thanks for the update/info.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Justin Piszcz


On Tue, 23 Jan 2007, Neil Brown wrote:

 On Monday January 22, [EMAIL PROTECTED] wrote:
  Justin Piszcz wrote:
   My .config is attached, please let me know if any other information is 
   needed and please CC (lkml) as I am not on the list, thanks!
  
   Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba 
   to 
   the RAID5 running XFS.
  
   Any idea what happened here?
 
 
  Without digging too deeply, I'd say you've hit the same bug Sami Farin 
  and others
  have reported starting with 2.6.19: pages mapped with kmap_atomic() 
  become unmapped
  during memcpy() or similar operations.  Try disabling preempt -- that 
  seems to be the
  common factor.
 
 That is exactly the conclusion I had just come to (a kmap_atomic page
 must be being unmapped during memcpy).  I wasn't aware that others had
 reported it - thanks for that.
 
 Turning off CONFIG_PREEMPT certainly seems like a good idea.
 
 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Is this a bug that can or will be fixed or should I disable pre-emption on 
critical and/or server machines?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Michael Tokarev
Justin Piszcz wrote:
[]
 Is this a bug that can or will be fixed or should I disable pre-emption on 
 critical and/or server machines?

Disabling pre-emption on critical and/or server machines seems to be a good
idea in the first place.  IMHO anyway.. ;)

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Justin Piszcz


On Tue, 23 Jan 2007, Michael Tokarev wrote:

 Justin Piszcz wrote:
 []
  Is this a bug that can or will be fixed or should I disable pre-emption on 
  critical and/or server machines?
 
 Disabling pre-emption on critical and/or server machines seems to be a good
 idea in the first place.  IMHO anyway.. ;)
 
 /mjt
 

So for a server system, the following options should be as follows:

Preemption Model (No Forced Preemption (Server))  ---
[ ] Preempt The Big Kernel Lock

Also, my mobo has HPET timer support in the BIOS, is there any reason to 
use this on a server? I do run X on it via the Intel 965 chipset video.

So bottom line is make sure not to use preemption on servers or else you 
will get weird spinlock/deadlocks on RAID devices--GOOD To know!

Thanks!

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-23 Thread Michael Tokarev
Justin Piszcz wrote:
 
 On Tue, 23 Jan 2007, Michael Tokarev wrote:
 
 Disabling pre-emption on critical and/or server machines seems to be a good
 idea in the first place.  IMHO anyway.. ;)

 So bottom line is make sure not to use preemption on servers or else you 
 will get weird spinlock/deadlocks on RAID devices--GOOD To know!

This is not a reason.  The reason is that preemption usually works worse
on servers, esp. high-loaded servers - the more often you interrupt a
(kernel) work, the more nedleess context switches you'll have, and the
more slow the whole thing works.

Another point is that with preemption enabled, we have more chances to
hit one or another bug somewhere.  Those bugs should be found and fixed
for sure, but important servers/data isn't a place usually for bughunting.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-22 Thread Neil Brown
On Monday January 22, [EMAIL PROTECTED] wrote:
> On 1/22/07, Neil Brown <[EMAIL PROTECTED]> wrote:
> > On Monday January 22, [EMAIL PROTECTED] wrote:
> > > Justin Piszcz wrote:
> > > > My .config is attached, please let me know if any other information is
> > > > needed and please CC (lkml) as I am not on the list, thanks!
> > > >
> > > > Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba 
> > > > to
> > > > the RAID5 running XFS.
> > > >
> > > > Any idea what happened here?
> > 
> > > >
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin
> > > and others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic()
> > > become unmapped
> > > during memcpy() or similar operations.  Try disabling preempt -- that
> > > seems to be the
> > > common factor.
> >
> > That is exactly the conclusion I had just come to (a kmap_atomic page
> > must be being unmapped during memcpy).  I wasn't aware that others had
> > reported it - thanks for that.
> >
> > Turning off CONFIG_PREEMPT certainly seems like a good idea.
> >
> Coming from an ARM background I am not yet versed in the inner
> workings of kmap_atomic, but if you have time for a question I am
> curious as to why spin_lock(>lock)  is not sufficient pre-emption
> protection for copy_data() in this case?
> 

Presumably there is a bug somewhere.
kmap_atomic itself calls inc_preempt_count so that preemption should
be disabled at least until the kunmap_atomic is called.

But apparently not.  The symptoms point exactly to the page getting
unmapped when it shouldn't.  Until that bug is found and fixed, the
work around of turning of CONFIG_PREEMPT seems to make sense.

Of course it would be great if someone who can easily reproduce this
bug could do the 'git bisect' thing to find out where the bug crept
in.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-22 Thread Dan Williams

On 1/22/07, Neil Brown <[EMAIL PROTECTED]> wrote:

On Monday January 22, [EMAIL PROTECTED] wrote:
> Justin Piszcz wrote:
> > My .config is attached, please let me know if any other information is
> > needed and please CC (lkml) as I am not on the list, thanks!
> >
> > Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to
> > the RAID5 running XFS.
> >
> > Any idea what happened here?

> >
> Without digging too deeply, I'd say you've hit the same bug Sami Farin
> and others
> have reported starting with 2.6.19: pages mapped with kmap_atomic()
> become unmapped
> during memcpy() or similar operations.  Try disabling preempt -- that
> seems to be the
> common factor.

That is exactly the conclusion I had just come to (a kmap_atomic page
must be being unmapped during memcpy).  I wasn't aware that others had
reported it - thanks for that.

Turning off CONFIG_PREEMPT certainly seems like a good idea.


Coming from an ARM background I am not yet versed in the inner
workings of kmap_atomic, but if you have time for a question I am
curious as to why spin_lock(>lock)  is not sufficient pre-emption
protection for copy_data() in this case?


NeilBrown


Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-22 Thread Neil Brown
On Monday January 22, [EMAIL PROTECTED] wrote:
> Justin Piszcz wrote:
> > My .config is attached, please let me know if any other information is 
> > needed and please CC (lkml) as I am not on the list, thanks!
> >
> > Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
> > the RAID5 running XFS.
> >
> > Any idea what happened here?

> >   
> Without digging too deeply, I'd say you've hit the same bug Sami Farin 
> and others
> have reported starting with 2.6.19: pages mapped with kmap_atomic() 
> become unmapped
> during memcpy() or similar operations.  Try disabling preempt -- that 
> seems to be the
> common factor.

That is exactly the conclusion I had just come to (a kmap_atomic page
must be being unmapped during memcpy).  I wasn't aware that others had
reported it - thanks for that.

Turning off CONFIG_PREEMPT certainly seems like a good idea.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-22 Thread Chuck Ebbert

Justin Piszcz wrote:
My .config is attached, please let me know if any other information is 
needed and please CC (lkml) as I am not on the list, thanks!


Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
the RAID5 running XFS.


Any idea what happened here?

[473795.214705] BUG: unable to handle kernel paging request at virtual 
address fffb92b0

[473795.214715]  printing eip:
[473795.214718] c0358b14
[473795.214721] *pde = 3067
[473795.214723] *pte = 
[473795.214726] Oops:  [#1]
[473795.214729] PREEMPT SMP 
[473795.214736] CPU:0

[473795.214737] EIP:0060:[]Not tainted VLI
[473795.214738] EFLAGS: 00010286   (2.6.19.2 #1)
[473795.214746] EIP is at copy_data+0x6c/0x179
[473795.214750] eax:    ebx: 1000   ecx: 0354   edx: fffb9000
[473795.214754] esi: fffb92b0   edi: da86c2b0   ebp: 1000   esp: f7927dc4
[473795.214757] ds: 007b   es: 007b   ss: 0068
[473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 
task.ti=f7926000)
[473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009  006c 
[473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88  
[473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 
[473795.214876] Call Trace:

[473795.214880]  [] compute_parity5+0xdf/0x497
[473795.214887]  [] handle_stripe+0x930/0x2986
[473795.214892]  [] find_busiest_group+0x124/0x4fd
[473795.214898]  [] release_stripe+0x21/0x2e
[473795.214902]  [] raid5d+0x100/0x161
[473795.214907]  [] md_thread+0x40/0x103
[473795.214912]  [] autoremove_wake_function+0x0/0x4b
[473795.214917]  [] md_thread+0x0/0x103
[473795.214922]  [] kthread+0xfc/0x100
[473795.214926]  [] kthread+0x0/0x100
[473795.214930]  [] kernel_thread_helper+0x7/0x1c
[473795.214935]  ===
[473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 
01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 
02  a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 
[473795.215017] EIP: [] copy_data+0x6c/0x179 SS:ESP 
0068:f7927dc4
  
Without digging too deeply, I'd say you've hit the same bug Sami Farin 
and others
have reported starting with 2.6.19: pages mapped with kmap_atomic() 
become unmapped
during memcpy() or similar operations.  Try disabling preempt -- that 
seems to be the

common factor.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-22 Thread Chuck Ebbert

Justin Piszcz wrote:
My .config is attached, please let me know if any other information is 
needed and please CC (lkml) as I am not on the list, thanks!


Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
the RAID5 running XFS.


Any idea what happened here?

[473795.214705] BUG: unable to handle kernel paging request at virtual 
address fffb92b0

[473795.214715]  printing eip:
[473795.214718] c0358b14
[473795.214721] *pde = 3067
[473795.214723] *pte = 
[473795.214726] Oops:  [#1]
[473795.214729] PREEMPT SMP 
[473795.214736] CPU:0

[473795.214737] EIP:0060:[c0358b14]Not tainted VLI
[473795.214738] EFLAGS: 00010286   (2.6.19.2 #1)
[473795.214746] EIP is at copy_data+0x6c/0x179
[473795.214750] eax:    ebx: 1000   ecx: 0354   edx: fffb9000
[473795.214754] esi: fffb92b0   edi: da86c2b0   ebp: 1000   esp: f7927dc4
[473795.214757] ds: 007b   es: 007b   ss: 0068
[473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 
task.ti=f7926000)
[473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009  006c 
[473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 aa6fee88  
[473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 0005 0001 
[473795.214876] Call Trace:

[473795.214880]  [c0358d00] compute_parity5+0xdf/0x497
[473795.214887]  [c035b0dd] handle_stripe+0x930/0x2986
[473795.214892]  [c01146b9] find_busiest_group+0x124/0x4fd
[473795.214898]  [c03580e0] release_stripe+0x21/0x2e
[473795.214902]  [c035d233] raid5d+0x100/0x161
[473795.214907]  [c036b03c] md_thread+0x40/0x103
[473795.214912]  [c012dbbe] autoremove_wake_function+0x0/0x4b
[473795.214917]  [c036affc] md_thread+0x0/0x103
[473795.214922]  [c012da1a] kthread+0xfc/0x100
[473795.214926]  [c012d91e] kthread+0x0/0x100
[473795.214930]  [c0103b4b] kernel_thread_helper+0x7/0x1c
[473795.214935]  ===
[473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 
01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 
02 f3 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 
[473795.215017] EIP: [c0358b14] copy_data+0x6c/0x179 SS:ESP 
0068:f7927dc4
  
Without digging too deeply, I'd say you've hit the same bug Sami Farin 
and others
have reported starting with 2.6.19: pages mapped with kmap_atomic() 
become unmapped
during memcpy() or similar operations.  Try disabling preempt -- that 
seems to be the

common factor.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-22 Thread Neil Brown
On Monday January 22, [EMAIL PROTECTED] wrote:
 Justin Piszcz wrote:
  My .config is attached, please let me know if any other information is 
  needed and please CC (lkml) as I am not on the list, thanks!
 
  Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
  the RAID5 running XFS.
 
  Any idea what happened here?


 Without digging too deeply, I'd say you've hit the same bug Sami Farin 
 and others
 have reported starting with 2.6.19: pages mapped with kmap_atomic() 
 become unmapped
 during memcpy() or similar operations.  Try disabling preempt -- that 
 seems to be the
 common factor.

That is exactly the conclusion I had just come to (a kmap_atomic page
must be being unmapped during memcpy).  I wasn't aware that others had
reported it - thanks for that.

Turning off CONFIG_PREEMPT certainly seems like a good idea.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-22 Thread Dan Williams

On 1/22/07, Neil Brown [EMAIL PROTECTED] wrote:

On Monday January 22, [EMAIL PROTECTED] wrote:
 Justin Piszcz wrote:
  My .config is attached, please let me know if any other information is
  needed and please CC (lkml) as I am not on the list, thanks!
 
  Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to
  the RAID5 running XFS.
 
  Any idea what happened here?

 
 Without digging too deeply, I'd say you've hit the same bug Sami Farin
 and others
 have reported starting with 2.6.19: pages mapped with kmap_atomic()
 become unmapped
 during memcpy() or similar operations.  Try disabling preempt -- that
 seems to be the
 common factor.

That is exactly the conclusion I had just come to (a kmap_atomic page
must be being unmapped during memcpy).  I wasn't aware that others had
reported it - thanks for that.

Turning off CONFIG_PREEMPT certainly seems like a good idea.


Coming from an ARM background I am not yet versed in the inner
workings of kmap_atomic, but if you have time for a question I am
curious as to why spin_lock(sh-lock)  is not sufficient pre-emption
protection for copy_data() in this case?


NeilBrown


Regards,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-22 Thread Neil Brown
On Monday January 22, [EMAIL PROTECTED] wrote:
 On 1/22/07, Neil Brown [EMAIL PROTECTED] wrote:
  On Monday January 22, [EMAIL PROTECTED] wrote:
   Justin Piszcz wrote:
My .config is attached, please let me know if any other information is
needed and please CC (lkml) as I am not on the list, thanks!
   
Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba 
to
the RAID5 running XFS.
   
Any idea what happened here?
  
   
   Without digging too deeply, I'd say you've hit the same bug Sami Farin
   and others
   have reported starting with 2.6.19: pages mapped with kmap_atomic()
   become unmapped
   during memcpy() or similar operations.  Try disabling preempt -- that
   seems to be the
   common factor.
 
  That is exactly the conclusion I had just come to (a kmap_atomic page
  must be being unmapped during memcpy).  I wasn't aware that others had
  reported it - thanks for that.
 
  Turning off CONFIG_PREEMPT certainly seems like a good idea.
 
 Coming from an ARM background I am not yet versed in the inner
 workings of kmap_atomic, but if you have time for a question I am
 curious as to why spin_lock(sh-lock)  is not sufficient pre-emption
 protection for copy_data() in this case?
 

Presumably there is a bug somewhere.
kmap_atomic itself calls inc_preempt_count so that preemption should
be disabled at least until the kunmap_atomic is called.

But apparently not.  The symptoms point exactly to the page getting
unmapped when it shouldn't.  Until that bug is found and fixed, the
work around of turning of CONFIG_PREEMPT seems to make sense.

Of course it would be great if someone who can easily reproduce this
bug could do the 'git bisect' thing to find out where the bug crept
in.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-20 Thread Justin Piszcz


On Sat, 20 Jan 2007, Justin Piszcz wrote:

> My .config is attached, please let me know if any other information is 
> needed and please CC (lkml) as I am not on the list, thanks!
> 
> Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
> the RAID5 running XFS.
> 
> Any idea what happened here?
> 

It happened again under heavy read I/O when I was running md5sum -c on 
some of my files.

[  551.942958] BUG: unable to handle kernel paging request at virtual address 
fffb97b0
[  551.942970]  printing eip:
[  551.942972] c0358bd8
[  551.942974] *pde = 3067
[  551.942976] *pte = 
[  551.942980] Oops: 0002 [#1]
[  551.942982] PREEMPT SMP 
[  551.942989] CPU:0
[  551.942990] EIP:0060:[]Not tainted VLI
[  551.942991] EFLAGS: 00010286   (2.6.19.2 #1)
[  551.942999] EIP is at copy_data+0x130/0x179
[  551.943001] eax:    ebx: 1000   ecx: 0214   edx: fffb9000
[  551.943005] esi: dd2007b0   edi: fffb97b0   ebp: 1000   esp: f76ffe1c
[  551.943007] ds: 007b   es: 007b   ss: 0068
[  551.943011] Process md4_raid5 (pid: 1309, ti=f76fe000 task=f7081560 
task.ti=f76fe000)
[  551.943013] Stack: c1d880c0 0003 cd2f0540  dd20 000e 
 00a8 
[  551.943027]1000 cd2f0540 dd1f1adc f6435c48 dd1f1ad8 c035a977 
34f3db20 c027be16 
[  551.943043]c0553328 0002 0002 c01146b9 f6435c48 c0553328 
f6435c48 dd1f193c 
[  551.943056] Call Trace:
[  551.943059]  [] handle_stripe+0x1ca/0x2986
[  551.943065]  [] __next_cpu+0x22/0x33
[  551.943072]  [] find_busiest_group+0x124/0x4fd
[  551.943136]  [] __wake_up+0x32/0x43
[  551.943140]  [] release_stripe+0x21/0x2e
[  551.943145]  [] raid5d+0x100/0x161
[  551.943150]  [] md_thread+0x40/0x103
[  551.943155]  [] autoremove_wake_function+0x0/0x4b
[  551.943160]  [] md_thread+0x0/0x103
[  551.943165]  [] kthread+0xfc/0x100
[  551.943169]  [] kthread+0x0/0x100
[  551.943173]  [] kernel_thread_helper+0x7/0x1c
[  551.943178]  ===
[  551.943180] Code: 8b 4c 24 08 8b 41 2c 8b 4c 24 1c 03 54 08 08 8b 44 24 
0c 85 c0 0f 85 3a ff ff ff 89 d9 c1 e9 02 8b 44 24 18 8d 3c 02 03 74 24 10 
 a5 89 d9 83 e1 03 74 02 f3 a4 e9 37 ff ff ff 01 ee 89 74 24 
[  551.943254] EIP: [] copy_data+0x130/0x179 SS:ESP 0068:f76ffe1c
[  551.943262]  <6>note: md4_raid5[1309] exited with preempt_count 3

I will run resync/check on this array and then see if that fixes it.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

2007-01-20 Thread Justin Piszcz
My .config is attached, please let me know if any other information is 
needed and please CC (lkml) as I am not on the list, thanks!

Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
the RAID5 running XFS.

Any idea what happened here?

[473795.214705] BUG: unable to handle kernel paging request at virtual 
address fffb92b0
[473795.214715]  printing eip:
[473795.214718] c0358b14
[473795.214721] *pde = 3067
[473795.214723] *pte = 
[473795.214726] Oops:  [#1]
[473795.214729] PREEMPT SMP 
[473795.214736] CPU:0
[473795.214737] EIP:0060:[]Not tainted VLI
[473795.214738] EFLAGS: 00010286   (2.6.19.2 #1)
[473795.214746] EIP is at copy_data+0x6c/0x179
[473795.214750] eax:    ebx: 1000   ecx: 0354   edx: fffb9000
[473795.214754] esi: fffb92b0   edi: da86c2b0   ebp: 1000   esp: f7927dc4
[473795.214757] ds: 007b   es: 007b   ss: 0068
[473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 
task.ti=f7926000)
[473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 
 006c 
[473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 
aa6fee88  
[473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 
0005 0001 
[473795.214876] Call Trace:
[473795.214880]  [] compute_parity5+0xdf/0x497
[473795.214887]  [] handle_stripe+0x930/0x2986
[473795.214892]  [] find_busiest_group+0x124/0x4fd
[473795.214898]  [] release_stripe+0x21/0x2e
[473795.214902]  [] raid5d+0x100/0x161
[473795.214907]  [] md_thread+0x40/0x103
[473795.214912]  [] autoremove_wake_function+0x0/0x4b
[473795.214917]  [] md_thread+0x0/0x103
[473795.214922]  [] kthread+0xfc/0x100
[473795.214926]  [] kthread+0x0/0x100
[473795.214930]  [] kernel_thread_helper+0x7/0x1c
[473795.214935]  ===
[473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 
01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 
02  a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 
[473795.215017] EIP: [] copy_data+0x6c/0x179 SS:ESP 
0068:f7927dc4
[473795.215024]  <6>note: md4_raid5[1305] exited with preempt_count 2

# mdadm -D /dev/md4
/dev/md4:
Version : 01.00.03
  Creation Time : Wed Jan 10 15:58:52 2007
 Raid Level : raid5
 Array Size : 1562834432 (1490.44 GiB 1600.34 GB)
Device Size : 781417216 (372.61 GiB 400.09 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 4
Persistence : Superblock is persistent

Update Time : Sat Jan 20 07:15:01 2007
  State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 128K

   Name : 4
   UUID : 7f453e18:893e4dd9:6e810372:4c724f49
 Events : 33

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   811  active sync   /dev/sdf1
   2   8  1132  active sync   /dev/sdh1
   3   8   653  active sync   /dev/sde1
   5   8   494  active sync   /dev/sdd1


config.bz2
Description: Binary data


Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-20 Thread Justin Piszcz
My .config is attached, please let me know if any other information is 
needed and please CC (lkml) as I am not on the list, thanks!

Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
the RAID5 running XFS.

Any idea what happened here?

[473795.214705] BUG: unable to handle kernel paging request at virtual 
address fffb92b0
[473795.214715]  printing eip:
[473795.214718] c0358b14
[473795.214721] *pde = 3067
[473795.214723] *pte = 
[473795.214726] Oops:  [#1]
[473795.214729] PREEMPT SMP 
[473795.214736] CPU:0
[473795.214737] EIP:0060:[c0358b14]Not tainted VLI
[473795.214738] EFLAGS: 00010286   (2.6.19.2 #1)
[473795.214746] EIP is at copy_data+0x6c/0x179
[473795.214750] eax:    ebx: 1000   ecx: 0354   edx: fffb9000
[473795.214754] esi: fffb92b0   edi: da86c2b0   ebp: 1000   esp: f7927dc4
[473795.214757] ds: 007b   es: 007b   ss: 0068
[473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 
task.ti=f7926000)
[473795.214765] Stack: c1ba7c40 0003 f5538c80 0001 da86c000 0009 
 006c 
[473795.214790]1000 da8536a8 aa6fee90 f5538c80 0190 c0358d00 
aa6fee88  
[473795.214863]d7c5794c 0001 da853488 f6fbec70 f6fbebc0 0001 
0005 0001 
[473795.214876] Call Trace:
[473795.214880]  [c0358d00] compute_parity5+0xdf/0x497
[473795.214887]  [c035b0dd] handle_stripe+0x930/0x2986
[473795.214892]  [c01146b9] find_busiest_group+0x124/0x4fd
[473795.214898]  [c03580e0] release_stripe+0x21/0x2e
[473795.214902]  [c035d233] raid5d+0x100/0x161
[473795.214907]  [c036b03c] md_thread+0x40/0x103
[473795.214912]  [c012dbbe] autoremove_wake_function+0x0/0x4b
[473795.214917]  [c036affc] md_thread+0x0/0x103
[473795.214922]  [c012da1a] kthread+0xfc/0x100
[473795.214926]  [c012d91e] kthread+0x0/0x100
[473795.214930]  [c0103b4b] kernel_thread_helper+0x7/0x1c
[473795.214935]  ===
[473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 
01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 
02 f3 a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14 
[473795.215017] EIP: [c0358b14] copy_data+0x6c/0x179 SS:ESP 
0068:f7927dc4
[473795.215024]  6note: md4_raid5[1305] exited with preempt_count 2

# mdadm -D /dev/md4
/dev/md4:
Version : 01.00.03
  Creation Time : Wed Jan 10 15:58:52 2007
 Raid Level : raid5
 Array Size : 1562834432 (1490.44 GiB 1600.34 GB)
Device Size : 781417216 (372.61 GiB 400.09 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 4
Persistence : Superblock is persistent

Update Time : Sat Jan 20 07:15:01 2007
  State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 128K

   Name : 4
   UUID : 7f453e18:893e4dd9:6e810372:4c724f49
 Events : 33

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   811  active sync   /dev/sdf1
   2   8  1132  active sync   /dev/sdh1
   3   8   653  active sync   /dev/sde1
   5   8   494  active sync   /dev/sdd1


config.bz2
Description: Binary data


Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba - RAID5)

2007-01-20 Thread Justin Piszcz


On Sat, 20 Jan 2007, Justin Piszcz wrote:

 My .config is attached, please let me know if any other information is 
 needed and please CC (lkml) as I am not on the list, thanks!
 
 Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba to 
 the RAID5 running XFS.
 
 Any idea what happened here?
 

It happened again under heavy read I/O when I was running md5sum -c on 
some of my files.

[  551.942958] BUG: unable to handle kernel paging request at virtual address 
fffb97b0
[  551.942970]  printing eip:
[  551.942972] c0358bd8
[  551.942974] *pde = 3067
[  551.942976] *pte = 
[  551.942980] Oops: 0002 [#1]
[  551.942982] PREEMPT SMP 
[  551.942989] CPU:0
[  551.942990] EIP:0060:[c0358bd8]Not tainted VLI
[  551.942991] EFLAGS: 00010286   (2.6.19.2 #1)
[  551.942999] EIP is at copy_data+0x130/0x179
[  551.943001] eax:    ebx: 1000   ecx: 0214   edx: fffb9000
[  551.943005] esi: dd2007b0   edi: fffb97b0   ebp: 1000   esp: f76ffe1c
[  551.943007] ds: 007b   es: 007b   ss: 0068
[  551.943011] Process md4_raid5 (pid: 1309, ti=f76fe000 task=f7081560 
task.ti=f76fe000)
[  551.943013] Stack: c1d880c0 0003 cd2f0540  dd20 000e 
 00a8 
[  551.943027]1000 cd2f0540 dd1f1adc f6435c48 dd1f1ad8 c035a977 
34f3db20 c027be16 
[  551.943043]c0553328 0002 0002 c01146b9 f6435c48 c0553328 
f6435c48 dd1f193c 
[  551.943056] Call Trace:
[  551.943059]  [c035a977] handle_stripe+0x1ca/0x2986
[  551.943065]  [c027be16] __next_cpu+0x22/0x33
[  551.943072]  [c01146b9] find_busiest_group+0x124/0x4fd
[  551.943136]  [c01140af] __wake_up+0x32/0x43
[  551.943140]  [c03580e0] release_stripe+0x21/0x2e
[  551.943145]  [c035d233] raid5d+0x100/0x161
[  551.943150]  [c036b03c] md_thread+0x40/0x103
[  551.943155]  [c012dbbe] autoremove_wake_function+0x0/0x4b
[  551.943160]  [c036affc] md_thread+0x0/0x103
[  551.943165]  [c012da1a] kthread+0xfc/0x100
[  551.943169]  [c012d91e] kthread+0x0/0x100
[  551.943173]  [c0103b4b] kernel_thread_helper+0x7/0x1c
[  551.943178]  ===
[  551.943180] Code: 8b 4c 24 08 8b 41 2c 8b 4c 24 1c 03 54 08 08 8b 44 24 
0c 85 c0 0f 85 3a ff ff ff 89 d9 c1 e9 02 8b 44 24 18 8d 3c 02 03 74 24 10 
f3 a5 89 d9 83 e1 03 74 02 f3 a4 e9 37 ff ff ff 01 ee 89 74 24 
[  551.943254] EIP: [c0358bd8] copy_data+0x130/0x179 SS:ESP 0068:f76ffe1c
[  551.943262]  6note: md4_raid5[1309] exited with preempt_count 3

I will run resync/check on this array and then see if that fixes it.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/