Re: [patch] new aop loop fix

2007-06-13 Thread Hugh Dickins
On Wed, 13 Jun 2007, Dmitriy Monakhov wrote:

   loop.c code itself is not perfect. In fact before Nick's patch
   partial write was't possible. Assumption what write chunks are
   always page aligned is realy weird ( see index++ line).
 
 Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]

I'm interested, because I'm trying to chase down an -mm bug which
occasionally leaves me with 1k of zeroes where it shouldn't (in a
1k bsize ext2 looped over tmpfs).  The length of time for this to
happen varies a lot, so bisection has been misleading: maybe the
problem lies in Nick's patches, maybe it does not.

But I don't understand your fix below at all.  _If_ you need to
change the handling of index, then you need to change the handling
of offset too, don't you?

But what's wrong with how inded was handled anyway?  Yes, it might
be being incremented at the bottom of the loop when we haven't
reached the end of this page, but in that case we're not going
round the loop again anyway: len will now be 0.  So no problem.

One of us is missing something: please enlighten me - thanks.

Hugh

 
 diff --git a/drivers/block/loop.c b/drivers/block/loop.c
 index 4bab9b1..8726da5 100644
 --- a/drivers/block/loop.c
 +++ b/drivers/block/loop.c
 @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
 bio_vec *bvec,
   int len, ret;
  
   mutex_lock(mapping-host-i_mutex);
 - index = pos  PAGE_CACHE_SHIFT;
   offset = pos  ((pgoff_t)PAGE_CACHE_SIZE - 1);
   bv_offs = bvec-bv_offset;
   len = bvec-bv_len;
 @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
 bio_vec *bvec,
   struct page *page;
   void *fsdata;
  
 + index = pos  PAGE_CACHE_SHIFT;
   IV = ((sector_t)index  (PAGE_CACHE_SHIFT - 9))+(offset  9);
   size = PAGE_CACHE_SIZE - offset;
   if (size  len)
 @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
 bio_vec *bvec,
   bv_offs += copied;
   len -= copied;
   offset = 0;
 - index++;
   pos += copied;
   }
   ret = 0;
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] new aop loop fix

2007-06-13 Thread Hugh Dickins
On Wed, 13 Jun 2007, Dmitriy Monakhov wrote:
 Btw: Nick's patches broke  LO_CRYPT_XOR mode,

It would help him if you could describe how.

 but it is ok because
 this feature was absolete and not used by anyone, am i right here?

I know nothing of this; but so long as its code remains in the driver,
I'd say we should be supporting it rather than breaking it.

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix umask when noACL kernel meets extN tuned for ACLs

2007-01-22 Thread Hugh Dickins
Fix insecure default behaviour reported by Tigran Aivazian: if an ext2
or ext3 or ext4 filesystem is tuned to mount with acl, but mounted by
a kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.

This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.

We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured.  We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).

Likewise don't set the XATTR_USER flag when built without XATTR support.

Signed-off-by: Hugh Dickins [EMAIL PROTECTED]
---

 fs/ext2/super.c |4 
 fs/ext3/super.c |4 
 fs/ext4/super.c |4 
 3 files changed, 12 insertions(+)

--- 2.6.20-rc5/fs/ext2/super.c  2007-01-13 08:46:07.0 +
+++ linux/fs/ext2/super.c   2007-01-15 20:48:38.0 +
@@ -708,10 +708,14 @@ static int ext2_fill_super(struct super_
set_opt(sbi-s_mount_opt, GRPID);
if (def_mount_opts  EXT2_DEFM_UID16)
set_opt(sbi-s_mount_opt, NO_UID32);
+#ifdef CONFIG_EXT2_FS_XATTR
if (def_mount_opts  EXT2_DEFM_XATTR_USER)
set_opt(sbi-s_mount_opt, XATTR_USER);
+#endif
+#ifdef CONFIG_EXT2_FS_POSIX_ACL
if (def_mount_opts  EXT2_DEFM_ACL)
set_opt(sbi-s_mount_opt, POSIX_ACL);
+#endif

if (le16_to_cpu(sbi-s_es-s_errors) == EXT2_ERRORS_PANIC)
set_opt(sbi-s_mount_opt, ERRORS_PANIC);
--- 2.6.20-rc5/fs/ext3/super.c  2007-01-13 08:46:07.0 +
+++ linux/fs/ext3/super.c   2007-01-15 20:48:38.0 +
@@ -1459,10 +1459,14 @@ static int ext3_fill_super (struct super
set_opt(sbi-s_mount_opt, GRPID);
if (def_mount_opts  EXT3_DEFM_UID16)
set_opt(sbi-s_mount_opt, NO_UID32);
+#ifdef CONFIG_EXT3_FS_XATTR
if (def_mount_opts  EXT3_DEFM_XATTR_USER)
set_opt(sbi-s_mount_opt, XATTR_USER);
+#endif
+#ifdef CONFIG_EXT3_FS_POSIX_ACL
if (def_mount_opts  EXT3_DEFM_ACL)
set_opt(sbi-s_mount_opt, POSIX_ACL);
+#endif
if ((def_mount_opts  EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_DATA)
sbi-s_mount_opt |= EXT3_MOUNT_JOURNAL_DATA;
else if ((def_mount_opts  EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_ORDERED)
--- 2.6.20-rc5/fs/ext4/super.c  2007-01-13 08:46:07.0 +
+++ linux/fs/ext4/super.c   2007-01-15 20:48:38.0 +
@@ -1518,10 +1518,14 @@ static int ext4_fill_super (struct super
set_opt(sbi-s_mount_opt, GRPID);
if (def_mount_opts  EXT4_DEFM_UID16)
set_opt(sbi-s_mount_opt, NO_UID32);
+#ifdef CONFIG_EXT4DEV_FS_XATTR
if (def_mount_opts  EXT4_DEFM_XATTR_USER)
set_opt(sbi-s_mount_opt, XATTR_USER);
+#endif
+#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
if (def_mount_opts  EXT4_DEFM_ACL)
set_opt(sbi-s_mount_opt, POSIX_ACL);
+#endif
if ((def_mount_opts  EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA)
sbi-s_mount_opt |= EXT4_MOUNT_JOURNAL_DATA;
else if ((def_mount_opts  EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED)
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: umask ignored in mkdir(2)?

2007-01-15 Thread Hugh Dickins
[I've rearranged this to avoid a horrid mix of top and bottom posting]

On Sun, 14 Jan 2007, Tigran Aivazian wrote:
 On Sun, 14 Jan 2007, Tigran Aivazian wrote:
  On Sun, 14 Jan 2007, Tigran Aivazian wrote:
   I think I may have found a bug --- on one of my machines the umask value
   is ignored by ext3 (but honoured on tmpfs) for mkdir system call:
   
   $ cd /tmp
   $ df -T .
   FilesystemType   1K-blocks  Used Available Use% Mounted on
   /dev/hdf1 ext3   189238556 155721568  23749068  87% /
   $ rmdir ok ; mkdir ok ; ls -ld ok
   rmdir: ok: No such file or directory
   drwxrwxrwx 2 tigran tigran 4096 Jan 14 20:36 ok/
   $ umask
   0022
   $ cd /dev/shm
   $ df -T .
   FilesystemType   1K-blocks  Used Available Use% Mounted on
   tmpfstmpfs  517988 0517988   0% /dev/shm
   $ rmdir ok ; mkdir ok ; ls -ld ok
   rmdir: ok: No such file or directory
   drwxr-xr-x 2 tigran tigran 40 Jan 14 20:36 ok/
   $ uname -a
   Linux ws 2.6.19.1 #6 SMP Sun Jan 14 20:03:30 GMT 2007 i686 i686 i386
   GNU/Linux
   $ grep -i acl /usr/src/linux/.config
   # CONFIG_FS_POSIX_ACL is not set
   # CONFIG_TMPFS_POSIX_ACL is not set
   # CONFIG_NFS_V3_ACL is not set
   # CONFIG_NFSD_V3_ACL is not set
   
   As you see, ACL is not configured in, and neither are extended attributes:
   
   $ grep -i xattr /usr/src/linux/.config
   # CONFIG_EXT2_FS_XATTR is not set
   # CONFIG_EXT3_FS_XATTR is not set
   
   So, this is something fs-specific. What do you think?
 
  I forgot to mention that on another machine running the same kernel version
  with the same (as close as a UP machine can be to SMP) kernel configuration
  the umask is honoured properly on ext3 filesystem.
 
 I figured it out! I thought you might be interested --- the reason is the
 mismatch between the default mount options stored in the superblock on disk
 and the filesystem features compiled into the kernel.
 
 Namely, dumpe2fs on the offending filesystems showed the following default
 mount options:
 
 user_xattr acl
 
 but on good filesystems it showed (none). So, I used tune2fs -o ^acl
 (and ^user_xattr) to clear these in the superblock and mounted the filesystem
 --- and now mkdir system call works as expected, i.e. honours the umask.
 
 Maybe the ext3 filesystem should automatically detect this (the mismatch) and
 printk a warning so the user is told that his filesystem is mounted in
 extremely insecure way, i.e. making directories as root will result in lots of
 0777 places (e.g. try make modules_install --- this will create lots of
 security holes in /lib/modules).
 
 I cc'd linux-kernel as someone may wish to fix this.

Good find!  Though I suppose not much of a worry for distros,
whose kernels will always(?) have ACLs configured in.

I get sooo confused when there's multiple ways of switching something
on and off (at the ifdef level and at the mount opts level and at the
tuning level), looks like others do too.  Here's my third version of
a patch, already wondering if a fourth would be better (at the point
where they set s_flags) ... no, I think this one is more robust...


[PATCH] fix umask when noACL kernel meets extN tuned for ACLs

Fix insecure default behaviour reported by Tigran Aivazian: if an ext2
or ext3 or ext4 filesystem is tuned to mount with acl, but mounted by
a kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.

This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.

We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured.  We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).

Likewise don't set the XATTR_USER flag when built without XATTR support.

Signed-off-by: Hugh Dickins [EMAIL PROTECTED]
---

 fs/ext2/super.c |4 
 fs/ext3/super.c |4 
 fs/ext4/super.c |4 
 3 files changed, 12 insertions(+)

--- 2.6.20-rc5/fs/ext2/super.c  2007-01-13 08:46:07.0 +
+++ linux/fs/ext2/super.c   2007-01-15 20:48:38.0 +
@@ -708,10 +708,14 @@ static int ext2_fill_super(struct super_
set_opt(sbi-s_mount_opt, GRPID);
if (def_mount_opts  EXT2_DEFM_UID16)
set_opt(sbi-s_mount_opt, NO_UID32);
+#ifdef CONFIG_EXT2_FS_XATTR
if (def_mount_opts  EXT2_DEFM_XATTR_USER)
set_opt(sbi-s_mount_opt, XATTR_USER);
+#endif
+#ifdef CONFIG_EXT2_FS_POSIX_ACL
if (def_mount_opts  EXT2_DEFM_ACL)
set_opt(sbi-s_mount_opt, POSIX_ACL);
+#endif

if (le16_to_cpu(sbi

[PATCH 4/6] ext2 balloc: fix off-by-one against grp_goal

2006-11-28 Thread Hugh Dickins
grp_goal 0 is a genuine goal (unlike -1),
so ext2_try_to_allocate_with_rsv should treat it as such.

Signed-off-by: Hugh Dickins [EMAIL PROTECTED]
---

 fs/ext2/balloc.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.0 +
+++ linux/fs/ext2/balloc.c  2006-11-27 19:28:41.0 +
@@ -1053,7 +1053,7 @@ ext2_try_to_allocate_with_rsv(struct sup
}
/*
 * grp_goal is a group relative block number (if there is a goal)
-* 0  grp_goal  EXT2_BLOCKS_PER_GROUP(sb)
+* 0 = grp_goal  EXT2_BLOCKS_PER_GROUP(sb)
 * first block is a filesystem wide block number
 * first block is the block number of the first block in this group
 */
@@ -1089,7 +1089,7 @@ ext2_try_to_allocate_with_rsv(struct sup
if (!goal_in_my_reservation(my_rsv-rsv_window,
grp_goal, group, sb))
grp_goal = -1;
-   } else if (grp_goal  0) {
+   } else if (grp_goal = 0) {
int curr = my_rsv-rsv_end -
(grp_goal + group_first_block) + 1;
 
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] ext2 balloc: fix _with_rsv freeze

2006-11-28 Thread Hugh Dickins
On Tue, 28 Nov 2006, Mingming Cao wrote:
 On Tue, 2006-11-28 at 17:40 +, Hugh Dickins wrote:
  After several days of testing ext2 with reservations, it got caught inside
  ext2_try_to_allocate_with_rsv: alloc_new_reservation repeatedly succeeding
  on the window [12cff,12d0e], ext2_try_to_allocate repeatedly failing to
  find the free block guaranteed to be included (unless there's contention).
  
 
 Hmm, I suspect there is other issue: alloc_new_reservation should not
 repeatedly allocating the same window, if ext2_try_to_allocate
 repeatedly fails to find a free block in that window.
 find_next_reservable_window() takes my_rsv (the old window that he
 thinks there is no free block) as a guide to find a window after the
 end block of my_rsv, so how could this happen?

Hmmm.  I haven't studied that part of the code, but what you say sounds
sensible: that would leave more to be explained, yes.  I guess it would
happen if all the rest of the bitmap were either allocated or reserved,
but I don't believe that was the case here: I have noted that the map
was all 00s from offset 0x1ae onwards, plenty unallocated; I've not
recorded the following reservations, but it seems unlikely they covered
the remaining free area (and still covered it even when the remaining
tasks got to the point of just waiting for this one).

 
  Fix the range to find_next_usable_block's memscan: the scan from here
  (0xcfe) up to (but excluding) maxblocks (0xd0e) needs to scan 3 bytes
  not 2 (the relevant bytes of bitmap in this case being f7 df ff - none
  00, but the premature cutoff implying that the last was found 00).
  
 
 alloc_new_reservation() reserved a window with free block, when come to
 the time to claim it, it scans the window again. So it seems that the
 range of the the scan is too small:

The range of the scan is 1 byte too small in this case, yes.

 
 p = ((char *)bh-b_data) + (here  3);
 r = memscan(p, 0, (maxblocks - here + 7)  3);
 next = (r - ((char *)bh-b_data))  3;
 
   -   next is -1

I don't understand you: next was not -1, it was 0xd08.

 if (next  maxblocks  next = here)
 return next;
 
   -- falls to false branch

No, it passed the next  maxblocks  next = here test
(maxblocks being 0xd0e and here being 0xcfe), so returned
pointing to an allocated block - then the caller finds it
cannot set the bit.

 
 here = bitmap_search_next_usable_block(here, bh, maxblocks);
 return here;
 
 So we failed to find a free byte in the range.  That's seems fine to me.
 It's only a nice thing to have -- try to allocate a block in a place
 where it's neighbors are all free also. If it fails, it will search the
 window bit by bit. So I don't understand why it is not being recovered
 by bitmap_search_next_usable_block(), which test the bitmap bit by bit? 

It already returned, it doesn't reach that line.

 
  Is this a problem for mainline ext2?  No, because the size in its memscan
  is always EXT2_BLOCKS_PER_GROUP(sb), which mkfs.ext2 requires to be a
  multiple of 8.  Is this a problem for ext3 or ext4?  No, because they have
  an additional extN_test_allocatable test which rescues them from the error.
  
 Hmm, if the error is it prematurely think there is no free block in the
 range (bitmap on disk), then even in ext3/4, it will not bother checking
 the jbd copy of the bitmap. I am not sure this is the cause that ext3/4
 may not has the problem.

In the ext3/4 case, it indeed won't bother to check the jbd copy
(having found this bitmap bit set), it'll fall through to the
bitmap_search_next_usable_block you indicated above,
and that should do the right thing, finding the first
free bit in the area originally reserved.

 
  But the bigger question is, why does the my_rsv case come here to
  find_next_usable_block at all? 
 
 Because grp_goal is -1?

Well, yes, but my point is that we've got a reservation, and we're
hoping to allocate from it (even though we've given up on the goal),
but find_next_usable_block is not respecting it at all - liable to be
allocating out of others' reservations instead.

 
   Doesn't its 64-bit boundary limit, and its
  memscan, blithely ignore what the reservation prepared?
 
 I agree with you that the double check is urgly. But it's necessary:( If
 there to prevent contention: other file make steal that free block we
 reserved for this file, in the case filesystem is full of reservation...

I agree it's necessary to recheck the allocation; I disagree that the
64-bit boundary limit and memscan are appropriate when my_rsv is set.

 
It's messy too,
  the complement of the memscan being that i  7 loop over in
  ext2_try_to_allocate.  I think this ought to be cleaned up,
  in ext2+reservations and ext3 and ext4.
  
 The i7 loop there is for non reservation case. Since
 find_next_usable_block() could find a free byte, it's trying to avoid
 filesystem holes by shifting