Re: [patch] new aop loop fix
On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: > Btw: Nick's patches broke LO_CRYPT_XOR mode, It would help him if you could describe how. > but it is ok because > this feature was absolete and not used by anyone, am i right here? I know nothing of this; but so long as its code remains in the driver, I'd say we should be supporting it rather than breaking it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] new aop loop fix
On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: > loop.c code itself is not perfect. In fact before Nick's patch > partial write was't possible. Assumption what write chunks are > always page aligned is realy weird ( see "index++" line). > > Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> I'm interested, because I'm trying to chase down an -mm bug which occasionally leaves me with 1k of zeroes where it shouldn't (in a 1k bsize ext2 looped over tmpfs). The length of time for this to happen varies a lot, so bisection has been misleading: maybe the problem lies in Nick's patches, maybe it does not. But I don't understand your fix below at all. _If_ you need to change the handling of index, then you need to change the handling of offset too, don't you? But what's wrong with how inded was handled anyway? Yes, it might be being incremented at the bottom of the loop when we haven't reached the end of this page, but in that case we're not going round the loop again anyway: len will now be 0. So no problem. One of us is missing something: please enlighten me - thanks. Hugh > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 4bab9b1..8726da5 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct > bio_vec *bvec, > int len, ret; > > mutex_lock(&mapping->host->i_mutex); > - index = pos >> PAGE_CACHE_SHIFT; > offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1); > bv_offs = bvec->bv_offset; > len = bvec->bv_len; > @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct > bio_vec *bvec, > struct page *page; > void *fsdata; > > + index = pos >> PAGE_CACHE_SHIFT; > IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9); > size = PAGE_CACHE_SIZE - offset; > if (size > len) > @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct > bio_vec *bvec, > bv_offs += copied; > len -= copied; > offset = 0; > - index++; > pos += copied; > } > ret = 0; - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix umask when noACL kernel meets extN tuned for ACLs
Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a kernel built without ACL support, then umask was ignored when creating inodes - though root or user has umask 022, touch creates files as 0666, and mkdir creates directories as 0777. This appears to have worked right until 2.6.11, when a fix to the default mode on symlinks (always 0777) assumed VFS applies umask: which it does, unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in s_flags according to s_mount_opt set according to def_mount_opts. We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test); but other filesystems only set MS_POSIXACL when ACLs are configured. We could fix this at another level; but it seems most robust to avoid setting the s_mount_opt flag in the first place (at the expense of more ifdefs). Likewise don't set the XATTR_USER flag when built without XATTR support. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/super.c |4 fs/ext3/super.c |4 fs/ext4/super.c |4 3 files changed, 12 insertions(+) --- 2.6.20-rc5/fs/ext2/super.c 2007-01-13 08:46:07.0 + +++ linux/fs/ext2/super.c 2007-01-15 20:48:38.0 + @@ -708,10 +708,14 @@ static int ext2_fill_super(struct super_ set_opt(sbi->s_mount_opt, GRPID); if (def_mount_opts & EXT2_DEFM_UID16) set_opt(sbi->s_mount_opt, NO_UID32); +#ifdef CONFIG_EXT2_FS_XATTR if (def_mount_opts & EXT2_DEFM_XATTR_USER) set_opt(sbi->s_mount_opt, XATTR_USER); +#endif +#ifdef CONFIG_EXT2_FS_POSIX_ACL if (def_mount_opts & EXT2_DEFM_ACL) set_opt(sbi->s_mount_opt, POSIX_ACL); +#endif if (le16_to_cpu(sbi->s_es->s_errors) == EXT2_ERRORS_PANIC) set_opt(sbi->s_mount_opt, ERRORS_PANIC); --- 2.6.20-rc5/fs/ext3/super.c 2007-01-13 08:46:07.0 + +++ linux/fs/ext3/super.c 2007-01-15 20:48:38.0 + @@ -1459,10 +1459,14 @@ static int ext3_fill_super (struct super set_opt(sbi->s_mount_opt, GRPID); if (def_mount_opts & EXT3_DEFM_UID16) set_opt(sbi->s_mount_opt, NO_UID32); +#ifdef CONFIG_EXT3_FS_XATTR if (def_mount_opts & EXT3_DEFM_XATTR_USER) set_opt(sbi->s_mount_opt, XATTR_USER); +#endif +#ifdef CONFIG_EXT3_FS_POSIX_ACL if (def_mount_opts & EXT3_DEFM_ACL) set_opt(sbi->s_mount_opt, POSIX_ACL); +#endif if ((def_mount_opts & EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_DATA) sbi->s_mount_opt |= EXT3_MOUNT_JOURNAL_DATA; else if ((def_mount_opts & EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_ORDERED) --- 2.6.20-rc5/fs/ext4/super.c 2007-01-13 08:46:07.0 + +++ linux/fs/ext4/super.c 2007-01-15 20:48:38.0 + @@ -1518,10 +1518,14 @@ static int ext4_fill_super (struct super set_opt(sbi->s_mount_opt, GRPID); if (def_mount_opts & EXT4_DEFM_UID16) set_opt(sbi->s_mount_opt, NO_UID32); +#ifdef CONFIG_EXT4DEV_FS_XATTR if (def_mount_opts & EXT4_DEFM_XATTR_USER) set_opt(sbi->s_mount_opt, XATTR_USER); +#endif +#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL if (def_mount_opts & EXT4_DEFM_ACL) set_opt(sbi->s_mount_opt, POSIX_ACL); +#endif if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA) sbi->s_mount_opt |= EXT4_MOUNT_JOURNAL_DATA; else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED) - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: umask ignored in mkdir(2)?
[I've rearranged this to avoid a horrid mix of top and bottom posting] On Sun, 14 Jan 2007, Tigran Aivazian wrote: > On Sun, 14 Jan 2007, Tigran Aivazian wrote: > > On Sun, 14 Jan 2007, Tigran Aivazian wrote: > > > I think I may have found a bug --- on one of my machines the umask value > > > is ignored by ext3 (but honoured on tmpfs) for mkdir system call: > > > > > > $ cd /tmp > > > $ df -T . > > > FilesystemType 1K-blocks Used Available Use% Mounted on > > > /dev/hdf1 ext3 189238556 155721568 23749068 87% / > > > $ rmdir ok ; mkdir ok ; ls -ld ok > > > rmdir: ok: No such file or directory > > > drwxrwxrwx 2 tigran tigran 4096 Jan 14 20:36 ok/ > > > $ umask > > > 0022 > > > $ cd /dev/shm > > > $ df -T . > > > FilesystemType 1K-blocks Used Available Use% Mounted on > > > tmpfstmpfs 517988 0517988 0% /dev/shm > > > $ rmdir ok ; mkdir ok ; ls -ld ok > > > rmdir: ok: No such file or directory > > > drwxr-xr-x 2 tigran tigran 40 Jan 14 20:36 ok/ > > > $ uname -a > > > Linux ws 2.6.19.1 #6 SMP Sun Jan 14 20:03:30 GMT 2007 i686 i686 i386 > > > GNU/Linux > > > $ grep -i acl /usr/src/linux/.config > > > # CONFIG_FS_POSIX_ACL is not set > > > # CONFIG_TMPFS_POSIX_ACL is not set > > > # CONFIG_NFS_V3_ACL is not set > > > # CONFIG_NFSD_V3_ACL is not set > > > > > > As you see, ACL is not configured in, and neither are extended attributes: > > > > > > $ grep -i xattr /usr/src/linux/.config > > > # CONFIG_EXT2_FS_XATTR is not set > > > # CONFIG_EXT3_FS_XATTR is not set > > > > > > So, this is something fs-specific. What do you think? > > > > I forgot to mention that on another machine running the same kernel version > > with the same (as close as a UP machine can be to SMP) kernel configuration > > the umask is honoured properly on ext3 filesystem. > > I figured it out! I thought you might be interested --- the reason is the > mismatch between the default mount options stored in the superblock on disk > and the filesystem features compiled into the kernel. > > Namely, dumpe2fs on the offending filesystems showed the following default > mount options: > > user_xattr acl > > but on good filesystems it showed "(none)". So, I used "tune2fs -o ^acl" > (and ^user_xattr) to clear these in the superblock and mounted the filesystem > --- and now mkdir system call works as expected, i.e. honours the umask. > > Maybe the ext3 filesystem should automatically detect this (the mismatch) and > printk a warning so the user is told that his filesystem is mounted in > extremely insecure way, i.e. making directories as root will result in lots of > 0777 places (e.g. try "make modules_install" --- this will create lots of > security holes in /lib/modules). > > I cc'd linux-kernel as someone may wish to fix this. Good find! Though I suppose not much of a worry for distros, whose kernels will always(?) have ACLs configured in. I get sooo confused when there's multiple ways of switching something on and off (at the ifdef level and at the mount opts level and at the tuning level), looks like others do too. Here's my third version of a patch, already wondering if a fourth would be better (at the point where they set s_flags) ... no, I think this one is more robust... [PATCH] fix umask when noACL kernel meets extN tuned for ACLs Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a kernel built without ACL support, then umask was ignored when creating inodes - though root or user has umask 022, touch creates files as 0666, and mkdir creates directories as 0777. This appears to have worked right until 2.6.11, when a fix to the default mode on symlinks (always 0777) assumed VFS applies umask: which it does, unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in s_flags according to s_mount_opt set according to def_mount_opts. We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test); but other filesystems only set MS_POSIXACL when ACLs are configured. We could fix this at another level; but it seems most robust to avoid setting the s_mount_opt flag in the first place (at the expense of more ifdefs). Likewise don't set the XATTR_USER flag when built without XATTR support. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/super.c |4 fs/ext3/super.c |4 fs/ext4/super.c |4 3 files changed, 12 insertions(+) --- 2.6.20-rc5/f
Re: + ext3-balloc-fix-_with_rsv-freeze.patch added to -mm tree
On Wed, 29 Nov 2006, [EMAIL PROTECTED] wrote: > > The patch titled > ext3 balloc: fix _with_rsv freeze > has been added to the -mm tree. Its filename is > ext3-balloc-fix-_with_rsv-freeze.patch > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find > out what to do about this > > -- > Subject: ext3 balloc: fix _with_rsv freeze > From: Hugh Dickins <[EMAIL PROTECTED]> > > After several days of testing ext3 with reservations, it got caught inside > ext3_try_to_allocate_with_rsv: alloc_new_reservation repeatedly succeeding > on the window [12cff,12d0e], ext3_try_to_allocate repeatedly failing to > find the free block guaranteed to be included (unless there's contention). Thanks a lot for changing the comments on these twelve, substituting ext3 or ext4 for ext2 where appropriate: I much prefer these. But in the case of this ext3-balloc-fix-_with_rsv-freeze.patch and the ext4-balloc-fix-_with_rsv-freeze.patch, the resulting comment goes wrong: there's never been such a freeze in ext3 or ext4. I suggest a better comment for this ext3 one would be something like the below - I trust you to deduce the ext4 comment! Port fix to the off-by-one in find_next_usable_block's memscan from ext2 to ext3; but it didn't cause a serious problem for ext3 because the additional ext3_test_allocatable check rescued it from the error. Hugh > > Fix the range to find_next_usable_block's memscan: the scan from "here" > (0xcfe) up to (but excluding) "maxblocks" (0xd0e) needs to scan 3 bytes not > 2 (the relevant bytes of bitmap in this case being f7 df ff - none 00, but > the premature cutoff implying that the last was found 00). > > Is this a problem for mainline ext3? No, because the "size" in its memscan > is always EXT2_BLOCKS_PER_GROUP(sb), which mkfs.ext3 requires to be a > multiple of 8. Is this a problem for ext3 or ext4? No, because they have > an additional extN_test_allocatable test which rescues them from the error. > > Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> > Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> > Cc: > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > fs/ext3/balloc.c |2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > diff -puN fs/ext3/balloc.c~ext3-balloc-fix-_with_rsv-freeze fs/ext3/balloc.c > --- a/fs/ext3/balloc.c~ext3-balloc-fix-_with_rsv-freeze > +++ a/fs/ext3/balloc.c > @@ -730,7 +730,7 @@ find_next_usable_block(ext3_grpblk_t sta > here = 0; > > p = ((char *)bh->b_data) + (here >> 3); > - r = memscan(p, 0, (maxblocks - here + 7) >> 3); > + r = memscan(p, 0, ((maxblocks + 7) >> 3) - (here >> 3)); > next = (r - ((char *)bh->b_data)) << 3; > > if (next < maxblocks && next >= start && ext3_test_allocatable(next, > bh)) > _ > > Patches currently in -mm which might be from [EMAIL PROTECTED] are > > git-powerpc.patch > __unmap_hugepage_range-add-comment.patch > shared-page-table-for-hugetlb-page-v4.patch > htlb-forget-rss-with-pt-sharing.patch > mlock-cleanup.patch > always-print-out-the-header-line-in-proc-swaps.patch > reject-corrupt-swapfiles-earlier.patch > kill-install_file_ptes-pte_val.patch > honour-mnt_noexec-for-access.patch > ext3-fix-reservation-extension.patch > ext4-fix-reservation-extension.patch > ext3-balloc-reset-windowsz-when-full.patch > ext3-balloc-fix-off-by-one-against-grp_goal.patch > ext3-balloc-fix-off-by-one-against-rsv_end.patch > ext3-balloc-say-rb_entry-not-list_entry.patch > ext3-balloc-use-io_error-label.patch > ext3-balloc-fix-_with_rsv-freeze.patch > ext2-reservations.patch > ext2-balloc-fix-_with_rsv-freeze.patch > ext2-balloc-reset-windowsz-when-full.patch > ext2-balloc-fix-off-by-one-against-rsv_end.patch > ext2-balloc-fix-off-by-one-against-grp_goal.patch > ext2-balloc-say-rb_entry-not-list_entry.patch > ext2-balloc-use-io_error-label.patch > generic-bug-implementation.patch > generic-bug-implementation-handle-bug=n.patch > generic-bug-for-i386.patch > generic-bug-for-x86-64.patch > bug-test-1.patch > tty-switch-to-ktermios-powerpc-fix.patch - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/12] ext3 balloc: reset windowsz when full
On Tue, 28 Nov 2006, Mingming Cao wrote: > Port a series ext2 balloc patches from Hugh to ext3/4. The first 6 > patches are against ext3, and the rest are aginst ext4. Thanks for all that, Mingming: whichever is appropriate, all twelve Acked-by: Hugh Dickins <[EMAIL PROTECTED]> or Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> I'll think about your other mails, those that need further thought, later on: I need to pin down more accurately the repetitious sequence of reservations in the mistaken case - maybe it indicated further issues, maybe not; and I need to consider our different views of the my_rsv find_next_usable_block. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Tue, 21 Nov 2006, Mingming Cao wrote: > On Mon, 2006-11-20 at 16:19 +0000, Hugh Dickins wrote: > > After four days of running, the EM64T has at last reproduced the same > > hang as it did in an hour before: stuck in > > ext2_try_to_allocate_with_rsv, > > repeatedly ext2_rsv_window_add, ext2_try_to_allocate, > > rsv_window_remove > > (perhaps not in that order). > > At last it did happen again, on both x86_64 and ppc64. > > Don't have much clue, still...:( Don't worry, KDB helped me work it out, patch follows in a moment. > > The logic of the while loop in ext2_try_to_allocate_with_rsv() looks > like: Thanks for your clarifications and tips. > > > A bit confused, did the whole system hang or just the "ld" writer hang > in ext2 block allocation? And what other writers were waiting for? Were > they trying to write to the same file? The system was pingable, but couldn't do much else. Only the one "ld" was active on the ext2 filesystem by this time, other tasks of the make just waiting on it, nothing else was trying to write there. 4 cpus, well, 2*HT: why couldn't I ssh or login? I don't know, something else to investigate, but that can be quite separate: very unlikely to be related to the particular ext2 bug which showed it (that filesystem was just for the test, it's not my root). Probably stupidly obvious once I've worked it out. > > It might be helpful to check the return value of ext2_try_to_allocate(), > to see if it fails. It would be nice if you could check if there any > free bit inside the reservation window. After screwing up last time, I was ultra-paranoid this time, and did not dare to set any breakpoints: proceeded by repeatedly breaking in from the keyboard, and the time I happened to catch it on return from memscan() was revealing. > > And could you check the start and end block for every rsv_window_add and > rsv_window_remove, to see if it was keep creating and removing the same > window in the same block group? The same every time it settled on a usable reservation, but a lot of wasted adds and removes as it works across a fully allocated area of the bitmap. Not very efficient, all those rb_tree insertions and removals until it gets to a free area; but I can't judge from this example how common that would be, may not be worth bothering about. > > And it might be useful to dump the whole filesystem block reservation > tree as well. Not necessary, I've put just the relevant numbers in the patch comment, it helped me a lot to see the actual numbers and work it out from there. > > Not sure if it worth the effort to test it on ext3. The ext2 block > reservation code in 2.6.19-rc5-mm2 looks pretty much the same as ext3/4, > except the missing truncate_mutex. I understand this might take a few > days to reproduce, so this might not needed now. Turns out vanilla-ext2 and ext3 and ext4 are safe: ext3 and ext4 slightly wrong in the same way, but safe. I'll continue this thread with the bugfix patch 1/6; and five more (roughly descending order of importance, the latter just cosmetic) little fs/ext2/balloc.c patches to things I noticed on the way. Nothing very worrying. Easier to send patches than ask questions: please check, perhaps my "off-by-one" accusations are wrong, for example. If you're satisfied they're right, please apply the same to ext3 and ext4. Although 1/6 does (I believe: too early for my tests to tell) fix the bug in a minimal way, I rather think that area needs cleanup - further comments below my Signed-off-by in 1/6. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] ext2 balloc: fix _with_rsv freeze
On Tue, 28 Nov 2006, Mingming Cao wrote: > On Tue, 2006-11-28 at 17:40 +0000, Hugh Dickins wrote: > > After several days of testing ext2 with reservations, it got caught inside > > ext2_try_to_allocate_with_rsv: alloc_new_reservation repeatedly succeeding > > on the window [12cff,12d0e], ext2_try_to_allocate repeatedly failing to > > find the free block guaranteed to be included (unless there's contention). > > > > Hmm, I suspect there is other issue: alloc_new_reservation should not > repeatedly allocating the same window, if ext2_try_to_allocate > repeatedly fails to find a free block in that window. > find_next_reservable_window() takes my_rsv (the old window that he > thinks there is no free block) as a guide to find a window "after" the > end block of my_rsv, so how could this happen? Hmmm. I haven't studied that part of the code, but what you say sounds sensible: that would leave more to be explained, yes. I guess it would happen if all the rest of the bitmap were either allocated or reserved, but I don't believe that was the case here: I have noted that the map was all 00s from offset 0x1ae onwards, plenty unallocated; I've not recorded the following reservations, but it seems unlikely they covered the remaining free area (and still covered it even when the remaining tasks got to the point of just waiting for this one). > > > Fix the range to find_next_usable_block's memscan: the scan from "here" > > (0xcfe) up to (but excluding) "maxblocks" (0xd0e) needs to scan 3 bytes > > not 2 (the relevant bytes of bitmap in this case being f7 df ff - none > > 00, but the premature cutoff implying that the last was found 00). > > > > alloc_new_reservation() reserved a window with free block, when come to > the time to claim it, it scans the window again. So it seems that the > range of the the scan is too small: The range of the scan is 1 byte too small in this case, yes. > > p = ((char *)bh->b_data) + (here >> 3); > r = memscan(p, 0, (maxblocks - here + 7) >> 3); > next = (r - ((char *)bh->b_data)) << 3; > > -> next is -1 I don't understand you: next was not -1, it was 0xd08. > if (next < maxblocks && next >= here) > return next; > > --> falls to false branch No, it passed the "next < maxblocks && next >= here" test (maxblocks being 0xd0e and here being 0xcfe), so returned pointing to an allocated block - then the caller finds it cannot set the bit. > > here = bitmap_search_next_usable_block(here, bh, maxblocks); > return here; > > So we failed to find a free byte in the range. That's seems fine to me. > It's only a nice thing to have -- try to allocate a block in a place > where it's neighbors are all free also. If it fails, it will search the > window bit by bit. So I don't understand why it is not being recovered > by bitmap_search_next_usable_block(), which test the bitmap bit by bit? It already returned, it doesn't reach that line. > > > Is this a problem for mainline ext2? No, because the "size" in its memscan > > is always EXT2_BLOCKS_PER_GROUP(sb), which mkfs.ext2 requires to be a > > multiple of 8. Is this a problem for ext3 or ext4? No, because they have > > an additional extN_test_allocatable test which rescues them from the error. > > > Hmm, if the error is it prematurely think there is no free block in the > range (bitmap on disk), then even in ext3/4, it will not bother checking > the jbd copy of the bitmap. I am not sure this is the cause that ext3/4 > may not has the problem. In the ext3/4 case, it indeed won't bother to check the jbd copy (having found this bitmap bit set), it'll fall through to the bitmap_search_next_usable_block you indicated above, and that should do the right thing, finding the first free bit in the area originally reserved. > > > But the bigger question is, why does the my_rsv case come here to > > find_next_usable_block at all? > > Because grp_goal is -1? Well, yes, but my point is that we've got a reservation, and we're hoping to allocate from it (even though we've given up on the "goal"), but find_next_usable_block is not respecting it at all - liable to be allocating out of others' reservations instead. > > > Doesn't its 64-bit boundary limit, and its > > memscan, blithely ignore what the reservation prepared? > > I agree with you that the double check is urgly. But it's necessary:( If > there to prevent contention: other file make steal that f
[PATCH 4/6] ext2 balloc: fix off-by-one against grp_goal
grp_goal 0 is a genuine goal (unlike -1), so ext2_try_to_allocate_with_rsv should treat it as such. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/balloc.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 + +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.0 + @@ -1053,7 +1053,7 @@ ext2_try_to_allocate_with_rsv(struct sup } /* * grp_goal is a group relative block number (if there is a goal) -* 0 < grp_goal < EXT2_BLOCKS_PER_GROUP(sb) +* 0 <= grp_goal < EXT2_BLOCKS_PER_GROUP(sb) * first block is a filesystem wide block number * first block is the block number of the first block in this group */ @@ -1089,7 +1089,7 @@ ext2_try_to_allocate_with_rsv(struct sup if (!goal_in_my_reservation(&my_rsv->rsv_window, grp_goal, group, sb)) grp_goal = -1; - } else if (grp_goal > 0) { + } else if (grp_goal >= 0) { int curr = my_rsv->rsv_end - (grp_goal + group_first_block) + 1; - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] ext2 balloc: say rb_entry not list_entry
The reservations tree is an rb_tree not a list, so it's less confusing to use rb_entry() than list_entry() - though they're both just container_of(). Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/balloc.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 + +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.0 + @@ -156,7 +156,7 @@ restart: printk("Block Allocation Reservation Windows Map (%s):\n", fn); while (n) { - rsv = list_entry(n, struct ext2_reserve_window_node, rsv_node); + rsv = rb_entry(n, struct ext2_reserve_window_node, rsv_node); if (verbose) printk("reservation window 0x%p " "start: %lu, end: %lu\n", @@ -753,7 +753,7 @@ static int find_next_reservable_window( prev = rsv; next = rb_next(&rsv->rsv_node); - rsv = list_entry(next,struct ext2_reserve_window_node,rsv_node); + rsv = rb_entry(next,struct ext2_reserve_window_node,rsv_node); /* * Reached the last reservation, we can just append to the @@ -995,7 +995,7 @@ static void try_to_extend_reservation(st if (!next) my_rsv->rsv_end += size; else { - next_rsv = list_entry(next, struct ext2_reserve_window_node, rsv_node); + next_rsv = rb_entry(next, struct ext2_reserve_window_node, rsv_node); if ((next_rsv->rsv_start - my_rsv->rsv_end - 1) >= size) my_rsv->rsv_end += size; - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] ext2 balloc: fix _with_rsv freeze
After several days of testing ext2 with reservations, it got caught inside ext2_try_to_allocate_with_rsv: alloc_new_reservation repeatedly succeeding on the window [12cff,12d0e], ext2_try_to_allocate repeatedly failing to find the free block guaranteed to be included (unless there's contention). Fix the range to find_next_usable_block's memscan: the scan from "here" (0xcfe) up to (but excluding) "maxblocks" (0xd0e) needs to scan 3 bytes not 2 (the relevant bytes of bitmap in this case being f7 df ff - none 00, but the premature cutoff implying that the last was found 00). Is this a problem for mainline ext2? No, because the "size" in its memscan is always EXT2_BLOCKS_PER_GROUP(sb), which mkfs.ext2 requires to be a multiple of 8. Is this a problem for ext3 or ext4? No, because they have an additional extN_test_allocatable test which rescues them from the error. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- But the bigger question is, why does the my_rsv case come here to find_next_usable_block at all? Doesn't its 64-bit boundary limit, and its memscan, blithely ignore what the reservation prepared? It's messy too, the complement of the memscan being that "i < 7" loop over in ext2_try_to_allocate. I think this ought to be cleaned up, in ext2+reservations and ext3 and ext4. fs/ext2/balloc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 + +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.0 + @@ -570,7 +570,7 @@ find_next_usable_block(int start, struct here = 0; p = ((char *)bh->b_data) + (here >> 3); - r = memscan(p, 0, (maxblocks - here + 7) >> 3); + r = memscan(p, 0, ((maxblocks + 7) >> 3) - (here >> 3)); next = (r - ((char *)bh->b_data)) << 3; if (next < maxblocks && next >= here) - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] ext2 balloc: fix off-by-one against rsv_end
rsv_end is the last block within the reservation, so alloc_new_reservation should accept start_block == rsv_end as success. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/balloc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 + +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.0 + @@ -950,7 +950,7 @@ retry: * check if the first free block is within the * free space we just reserved */ - if (start_block >= my_rsv->rsv_start && start_block < my_rsv->rsv_end) + if (start_block >= my_rsv->rsv_start && start_block <= my_rsv->rsv_end) return 0; /* success */ /* * if the first free bit we found is out of the reservable space - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] ext2 balloc: use io_error label
ext2_new_blocks has a nice io_error label for setting -EIO, so goto that in the one place that doesn't already use it. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/balloc.c |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 + +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.0 + @@ -1258,10 +1258,9 @@ retry_alloc: if (group_no >= ngroups) group_no = 0; gdp = ext2_get_group_desc(sb, group_no, &gdp_bh); - if (!gdp) { - *errp = -EIO; - goto out; - } + if (!gdp) + goto io_error; + free_blocks = le16_to_cpu(gdp->bg_free_blocks_count); /* * skip this group if the number of - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] ext2 balloc: reset windowsz when full
ext2_new_blocks should reset the reservation window size to 0 when squeezing the last blocks out of an almost full filesystem, so the retry doesn't skip any groups with less than half that free, reporting ENOSPC too soon. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- fs/ext2/balloc.c |1 + 1 file changed, 1 insertion(+) --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 + +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.0 + @@ -1292,6 +1292,7 @@ retry_alloc: */ if (my_rsv) { my_rsv = NULL; + windowsz = 0; group_no = goal_group; goto retry_alloc; } - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 16 Nov 2006, Andrew Morton wrote: > On Thu, 16 Nov 2006 00:49:20 -0800 > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > That does not explain the repeated reservation window add and remove > > behavior Huge has reported. > > I spent quite some time comparing with ext3. I'm a bit stumped and I'm > suspecting that the simplistic porting the code is now OK, but something's > just wrong. > > I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has > gone infinite. I don't see why, but more staring is needed. Just to report that similar tests on three machines have each run for 20 hours so far without any such infinite loop reoccurring. Well, I broke off the x86_64 for a couple of hours: wondered if I'd got confused, forgotten to "rmmod ext2" at one stage, and saw that behaviour with my simple "ext2fs_blk_t ret_block" patch, rather than your more extensive patch to ext2_new_blocks() that I'd believed I was testing. I didn't give it long enough to be conclusive, but the problem didn't show up with that either, so I've gone back to testing with yours. I'd have kept the hang around for longer if I'd guessed it would be hard to reproduce, and that it would be puzzling even to you: sorry. kdb is in, if it comes again I can peer at it more closely. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html