Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On Friday April 6, [EMAIL PROTECTED] wrote: > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. Difference is that kzalloc(0, ) now returns NULL. Maybe it is a SLUB/SLAB difference? (So maybe it did use memory it shouldn't have before, but now it fails, which is the better behaviour). This patch fixes the maths and should probably go in various 'stable' kernels. Bug is in 2.6.18, but not 2.6.16. Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and later have it. Thanks for the bug report. NeilBrown - Fix calculation for size of filemap_attr array in md/bitmap. If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms) for of 16 (64 bit platforms). filemap_attr would be allocated one 'unsigned long' shorter than required. We need a round-up in there. Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./drivers/md/bitmap.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-04-11 13:24:50.0 +1000 +++ ./drivers/md/bitmap.c 2007-04-11 13:24:59.0 +1000 @@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct /* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */ bitmap->filemap_attr = kzalloc( - (((num_pages*4/8)+sizeof(unsigned long)-1) -/sizeof(unsigned long)) - *sizeof(unsigned long), + roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)), GFP_KERNEL); if (!bitmap->filemap_attr) goto out; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On 4/5/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > Hi, > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > md1 is the first array on the disk, and it refuses to start up on boot, or after > boot. > > ... > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > mdadm: device /dev/md1 already active - cannot assemble it > tornado ~ # mdadm --run /dev/md1 > mdadm: failed to run array /dev/md1: Cannot allocate memory > tornado ~ # > > and looking at a dmesg, this is logged: > > md: bind > md: bind > raid1: raid set md1 active with 2 out of 2 mirrors > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > md1: failed to create bitmap (-12) > md: pers->run() failed ... Is this the dmesg from boot or the dmesg after running the mdadm --run command? > > tornado ~ # uname -a > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > tornado ~ # > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing > out the -mm releases so much lately. OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? git-md-accel.patch does not touch anything in the raid1 path, but I guess stranger things have happened. -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > Hi, > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > md1 is the first array on the disk, and it refuses to start up on boot, or > after > boot. > > ... > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > mdadm: device /dev/md1 already active - cannot assemble it > tornado ~ # mdadm --run /dev/md1 > mdadm: failed to run array /dev/md1: Cannot allocate memory > tornado ~ # > > and looking at a dmesg, this is logged: > > md: bind > md: bind > raid1: raid set md1 active with 2 out of 2 mirrors > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > md1: failed to create bitmap (-12) > md: pers->run() failed ... > > tornado ~ # uname -a > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 > Intel(R) > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > tornado ~ # > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been > testing > out the -mm releases so much lately. OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? > Also, Andrew, can you please restart posting/cc'ing your -mm announcements to > the [EMAIL PROTECTED] list? Seems this stopped around about > 2.6.20, it was handy. hm. I always Bcc [EMAIL PROTECTED] I assume that its filters didn't get updated after the s/osdl/linux-foundation/ thing. I'll talk to people, thanks. > .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. tornado ~ # cat /proc/mdstat Personalities : [raid1] md1 : inactive sda1[0] sdc1[1] 208640 blocks md3 : active raid1 sdc3[1] sda3[0] 20008832 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 64KB chunk md5 : active raid1 sdc5[1] sda5[0] 10008384 blocks [2/2] [UU] bitmap: 4/153 pages [16KB], 32KB chunk md6 : active raid1 sdc6[1] sda6[0] 10008384 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 32KB chunk md8 : active raid1 sdc8[1] sda8[0] 1003904 blocks [2/2] [UU] bitmap: 0/123 pages [0KB], 4KB chunk md10 : active raid1 sdc10[1] sda10[0] 119933120 blocks [2/2] [UU] bitmap: 1/229 pages [4KB], 256KB chunk md2 : active raid1 sdc2[1] sda2[0] 14544 blocks [2/2] [UU] bitmap: 10/191 pages [40KB], 256KB chunk unused devices: tornado ~ # tornado ~ # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668aaa - correct Events : 0.368 Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668acc - correct Events : 0.368 Number Major Minor RaidDevice State this 1 8 331 active sync /dev/sdc1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bind md: bind raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers->run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the [EMAIL PROTECTED] list? Seems this stopped around about 2.6.20, it was handy. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 Thanks, Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/