Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-10 Thread Neil Brown
On Friday April 6, [EMAIL PROTECTED] wrote:
> 
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

Difference is that kzalloc(0, ) now returns NULL.  Maybe it is a
SLUB/SLAB difference? (So maybe it did use memory it shouldn't have
before, but now it fails, which is the better behaviour).

This patch fixes the maths and should probably go in various 'stable'
kernels.  Bug is in 2.6.18, but not 2.6.16.

Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and
later have it.

Thanks for the bug report.

NeilBrown


-
Fix calculation for size of filemap_attr array in md/bitmap.

If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms)
for of 16 (64 bit platforms). filemap_attr would be allocated one
'unsigned long' shorter than required.  We need a round-up in there.


Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/bitmap.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c   2007-04-11 13:24:50.0 +1000
+++ ./drivers/md/bitmap.c   2007-04-11 13:24:59.0 +1000
@@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct 
 
/* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned 
long) */
bitmap->filemap_attr = kzalloc(
-   (((num_pages*4/8)+sizeof(unsigned long)-1)
-/sizeof(unsigned long))
-   *sizeof(unsigned long),
+   roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)),
GFP_KERNEL);
if (!bitmap->filemap_attr)
goto out;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Dan Williams

On 4/5/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> Hi,
>
> On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> >
> > - The oops in git-net.patch has been fixed, so that tree has been restored.
> >   It is huge.
> >
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> >   Kergon).  It is a quilt tree, living at
> >   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> >
> > - Added davidel's signalfd stuff.
>
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
>
> md1 is the first array on the disk, and it refuses to start up on boot, or 
after
> boot.
>
> ...
>
> tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> mdadm: device /dev/md1 already active - cannot assemble it
> tornado ~ # mdadm --run /dev/md1
> mdadm: failed to run array /dev/md1: Cannot allocate memory
> tornado ~ #
>
> and looking at a dmesg, this is logged:
>
> md: bind
> md: bind
> raid1: raid set md1 active with 2 out of 2 mirrors
> md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> md1: failed to create bitmap (-12)
> md: pers->run() failed ...


Is this the dmesg from boot or the dmesg after running the mdadm --run command?


>
> tornado ~ # uname -a
> Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
Intel(R)
> Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> tornado ~ #
>
> The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
> out the -mm releases so much lately.

OK.  I assume that bitmap->chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this.
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?



git-md-accel.patch does not touch anything in the raid1 path, but I
guess stranger things have happened.

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Andrew Morton
On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> > 
> > - The oops in git-net.patch has been fixed, so that tree has been restored. 
> >   It is huge.
> > 
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> >   Kergon).  It is a quilt tree, living at
> >   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> > 
> > - Added davidel's signalfd stuff.
> 
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
> 
> md1 is the first array on the disk, and it refuses to start up on boot, or 
> after 
> boot.
> 
> ...
> 
> tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> mdadm: device /dev/md1 already active - cannot assemble it
> tornado ~ # mdadm --run /dev/md1
> mdadm: failed to run array /dev/md1: Cannot allocate memory
> tornado ~ #
> 
> and looking at a dmesg, this is logged:
> 
> md: bind
> md: bind
> raid1: raid set md1 active with 2 out of 2 mirrors
> md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> md1: failed to create bitmap (-12)
> md: pers->run() failed ...
> 
> tornado ~ # uname -a
> Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
> Intel(R) 
> Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> tornado ~ #
> 
> The last known version that worked was 2.6.21-rc3-mm1 - I haven't been 
> testing 
> out the -mm releases so much lately.

OK.  I assume that bitmap->chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this. 
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?

> Also, Andrew, can you please restart posting/cc'ing your -mm announcements to 
> the [EMAIL PROTECTED] list?  Seems this stopped around about 
> 2.6.20, it was handy.

hm.  I always Bcc [EMAIL PROTECTED]  I assume that its
filters didn't get updated after the s/osdl/linux-foundation/ thing.  I'll
talk to people, thanks.

> .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Reuben Farrelly

Hi,

On 3/04/2007 3:47 PM, Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/

- The oops in git-net.patch has been fixed, so that tree has been restored. 
  It is huge.


- Added the device-mapper development tree to the -mm lineup (Alasdair
  Kergon).  It is a quilt tree, living at
  ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.

- Added davidel's signalfd stuff.


Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

md1 is the first array on the disk, and it refuses to start up on boot, or after 
boot.


tornado ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sda1[0] sdc1[1]
  208640 blocks

md3 : active raid1 sdc3[1] sda3[0]
  20008832 blocks [2/2] [UU]
  bitmap: 0/153 pages [0KB], 64KB chunk

md5 : active raid1 sdc5[1] sda5[0]
  10008384 blocks [2/2] [UU]
  bitmap: 4/153 pages [16KB], 32KB chunk

md6 : active raid1 sdc6[1] sda6[0]
  10008384 blocks [2/2] [UU]
  bitmap: 0/153 pages [0KB], 32KB chunk

md8 : active raid1 sdc8[1] sda8[0]
  1003904 blocks [2/2] [UU]
  bitmap: 0/123 pages [0KB], 4KB chunk

md10 : active raid1 sdc10[1] sda10[0]
  119933120 blocks [2/2] [UU]
  bitmap: 1/229 pages [4KB], 256KB chunk

md2 : active raid1 sdc2[1] sda2[0]
  14544 blocks [2/2] [UU]
  bitmap: 10/191 pages [40KB], 256KB chunk

unused devices: 
tornado ~ #

tornado ~ # mdadm --examine /dev/sda1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : f5c2e565:5ed956c0:33b08c07:16154426
  Creation Time : Fri Feb  2 10:16:29 2007
 Raid Level : raid1
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
 Array Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

Update Time : Fri Apr  6 02:06:17 2007
  State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : d3668aaa - correct
 Events : 0.368


  Number   Major   Minor   RaidDevice State
this 0   810  active sync   /dev/sda1

   0 0   810  active sync   /dev/sda1
   1 1   8   331  active sync   /dev/sdc1
tornado ~ # mdadm --examine /dev/sdc1
/dev/sdc1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : f5c2e565:5ed956c0:33b08c07:16154426
  Creation Time : Fri Feb  2 10:16:29 2007
 Raid Level : raid1
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
 Array Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

Update Time : Fri Apr  6 02:06:17 2007
  State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : d3668acc - correct
 Events : 0.368


  Number   Major   Minor   RaidDevice State
this 1   8   331  active sync   /dev/sdc1

   0 0   810  active sync   /dev/sda1
   1 1   8   331  active sync   /dev/sdc1
tornado ~ #


tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
mdadm: device /dev/md1 already active - cannot assemble it
tornado ~ # mdadm --run /dev/md1
mdadm: failed to run array /dev/md1: Cannot allocate memory
tornado ~ #

and looking at a dmesg, this is logged:

md: bind
md: bind
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
md1: failed to create bitmap (-12)
md: pers->run() failed ...

tornado ~ # uname -a
Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) 
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux

tornado ~ #

The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing 
out the -mm releases so much lately.


Also, Andrew, can you please restart posting/cc'ing your -mm announcements to 
the [EMAIL PROTECTED] list?  Seems this stopped around about 
2.6.20, it was handy.


.config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4

Thanks,
Reuben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/