Ext3 not marking filesystems as with errors

2007-09-20 Thread Jan Kara
  Hi,

  one friend has just pointed me to a following misbehaviour of ext3. If we
stumble on some error in JBD (e.g. in commit code), we call
__journal_abort_hard(). It just marks the journal as aborted but does
nothing else. Later ext3 comes, finds journal aborted, calls ext3_abort()
which remounts fs read-only and stops (it does not mark filesystem as
having errors). It calls journal_abort(.., -EIO) but that does nothing
because the journal is already aborted. If you then unmount the filesystem
and mount it again, everything goes on happily as if there was no error -
no suggestion for running fsck, nothing.
  I guess this is a bug but please correct me if you don't think so. There
are two possibilities how to fix it - either we mark the filesystem as with
errors in ext3_abort() or we could call some less lowlevel function from
JBD to abort journal (as soon as j_errno is set, we are safe). Any feeling
what is less hacky?

Honza

-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-20 Thread Theodore Tso
On Thu, Sep 20, 2007 at 03:33:50PM +0200, Jan Kara wrote:
   So for example deleting kernel tree on my computer takes ~14 seconds with
 h-trees and less than 9 without them. Also doing 'cp -lr' of the kernel
 tree takes 8 seconds with h-trees and 6.3s without them... So I think the
 performance difference is quite measurable.

This is in a completely cold cache state?  (i.e. mounting and
unmounting the filesystem before doing the rm -rf?)

On my kernel tree, using the command: lsattr -R | grep -- -I- shows
that only 8 directories are htree indexed, and they're not that big:

12 drwxr-xr-x 12 tytso tytso 12288 2007-09-14 16:25 ./drivers/char
24 drwxr-xr-x 30 tytso tytso 24576 2007-09-14 16:25 ./drivers/net
20 drwxr-xr-x  2 tytso tytso 20480 2007-09-14 16:25 ./drivers/usb/serial
32 drwxr-xr-x 24 tytso tytso 32768 2007-09-14 16:10 ./include/linux
12 drwxr-xr-x  2 tytso tytso 12288 2007-09-14 16:25 ./net/bridge/netfilter
24 drwxr-xr-x  2 tytso tytso 24576 2007-09-14 16:25 ./net/ipv4/netfilter
12 drwxr-xr-x  2 tytso tytso 12288 2007-09-14 16:25 ./net/ipv6/netfilter
32 drwxr-xr-x  2 tytso tytso 32768 2007-09-14 16:25 ./net/netfilter

... which means if the benchmark only focused on deleting these files,
then presumably the percentage increase would be even worse.

  Certainly one of the things that we could consider is for small
  directories to do an in-memory sort of all of the directory entries at
  opendir() time, and keeping that list until it is closed.  We can't do
  this for really big directories, but we could easily do it for
  directories under 32k or 64k.

   Umm, yes. That would be probably feasible. But converting to htrees only
 when directories grow larger would avoid the problem also. It also does not
 seem *that* hard but maybe I miss some nasty details...

The reason why I mentioned the caching idea is we already have code to
manage and return directories stored in an rbtree in the kernel,
albeit for a slightly different purpose.  So hacking it up to cache
all of the directory entries for directories  64k and to index them
by inode number instead of hash key would be pretty easy.

What's nasty about converting to htrees after the directories become
larger is that we need to reserve extra space in the journal for each
block that we need to modify, and then just the fact that we have to
keep track of the multiple buffers.  Basically, not impossible but
just a pain in the *ss.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - ignore bind mounts in fsck

2007-09-20 Thread Eric Sandeen
Theodore Tso wrote:
 On Wed, Sep 19, 2007 at 03:20:14PM -0500, Eric Sandeen wrote:
 An entry like this in /etc/fstab:

 /foo /barext3bind,defaults   1 3

 will stop boot, as fsck.ext3 tries to check it and fails:

 e2fsck 1.40.2 (12-Jul-2007)
 fsck.ext3: Is a directory while trying to open /foo

 The superblock could not be read or does not describe a correct ext2
 filesystem. ...

 Granted, asking for fsck of a bind mount in the fstab is a bit odd, 
 but it doesn't seem like it should stop the boot process if you make
 this mistake.
 
 That's fair, but note that the dump number and fsck pass number really
 should be zero in the fstab entry.  i.e., it really should be 0 0,
 or just plain ommitted.

Agreed.  If you think fsck shouldn't silently cope with this mistake,
and instead punish the user for it (it is what they asked for, after
all), I'm ok with that too.  I'm willing to close my end as NOTABUG if
you don't want to take this patch.  :)

(FWIW, this is RH bug #151533)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Avoid rec_len overflow with 64KB block size

2007-09-20 Thread Jan Kara
   Hello,

  when converting ext4 directories to pagecache I just came over
Takashi's patch preventing overflowing of rec_len. Looking over the
patch - can't we do it more elegantly by using say 0x instead of 64K
and perform conversion (using some helper) at the moment we read / store
rec_len? That would be IMHO more transparent than current approach (at
least it took me some time to understand what's going on with the
current patch when I was looking at the code)...

Honza

-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - ignore bind mounts in fsck

2007-09-20 Thread Theodore Tso
On Thu, Sep 20, 2007 at 09:31:56AM -0500, Eric Sandeen wrote:
 Agreed.  If you think fsck shouldn't silently cope with this mistake,
 and instead punish the user for it (it is what they asked for, after
 all), I'm ok with that too.  I'm willing to close my end as NOTABUG if
 you don't want to take this patch.  :)

I'm willing to take the patch, although I am thinking that it might be
appropriate for fsck to print a warning message --- Bind mount with
non-zero fsck pass, skipping, or some such.

What do you think?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-20 Thread Jan Kara
On Thu 20-09-07 11:14:40, Theodore Tso wrote:
 On Thu, Sep 20, 2007 at 04:58:39PM +0200, Jan Kara wrote:
Hmm, strange - I've just looked at my computer and dir_index is set
  just for 5 directories in my tree.
 
 I looked at a tree that had object files, which is probably why I had
 8 directories; I'm guessing you probably just had kernel sources and
 no build files.
 
  If I try deleting just them, I also
  see some performance decrease but it's less than if I try deleting the
  whole tree (and that result seems to be quite consistent)... There's 
  something
  fishy there. Maybe I could try seekwatcher or something similar to see
  what's really happening.
 
 That is very strange.
  Just a guess: Can't the culprit be the following test in ext3/4_readdir()?
if (EXT4_HAS_COMPAT_FEATURE(inode-i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) 
((EXT4_I(inode)-i_flags  EXT4_INDEX_FL) ||
 ((inode-i_size  sb-s_blocksize_bits) == 1))) {
error = ext4_dx_readdir(filp, dirent, filldir);
if (error != ERR_BAD_DX_DIR) {
ret = error;
goto out;
}
/*
 * We don't set the inode dirty flag since it's not
 * critical that it get flushed back to the disk.
 */
EXT4_I(filp-f_path.dentry-d_inode)-i_flags = ~EXT4_INDEX_FL;
}
  It calls ext4_dx_readdir() for *every* directory with 1 block (we have
1326 of them in the kernel tree). Now ext4_dx_readdir() calls
ext4_htree_fill_tree() which finds out the directory is not h-tree and
and calls htree_dirblock_to_tree(). So even for 4KB directories we end up
deleting inodes in hash order! And as a bonus we burn some cycles building
trees etc. What is the point of this?

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Avoid rec_len overflow with 64KB block size

2007-09-20 Thread Jan Kara
   when converting ext4 directories to pagecache I just came over
 Takashi's patch preventing overflowing of rec_len. Looking over the
 patch - can't we do it more elegantly by using say 0x instead of 64K
 and perform conversion (using some helper) at the moment we read / store
 rec_len? That would be IMHO more transparent than current approach (at
 least it took me some time to understand what's going on with the
 current patch when I was looking at the code)...
  Attached is a patch that does this for ext4. If you like this
approach, I can cook up a similar patch for ext2 / ext3.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara [EMAIL PROTECTED]

diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-rc6/fs/ext4/dir.c linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/dir.c
--- linux-2.6.23-rc6/fs/ext4/dir.c	2007-09-18 19:22:28.0 +0200
+++ linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/dir.c	2007-09-20 18:08:02.0 +0200
@@ -69,7 +69,7 @@ int ext4_check_dir_entry (const char * f
 			  unsigned long offset)
 {
 	const char * error_msg = NULL;
-	const int rlen = le16_to_cpu(de-rec_len);
+	const int rlen = ext4_get_rec_len(le16_to_cpu(de-rec_len));
 
 	if (rlen  EXT4_DIR_REC_LEN(1))
 		error_msg = rec_len is smaller than minimal;
@@ -176,10 +176,10 @@ revalidate:
  * least that it is non-zero.  A
  * failure will be detected in the
  * dirent test below. */
-if (le16_to_cpu(de-rec_len) 
-		EXT4_DIR_REC_LEN(1))
+if (ext4_get_rec_len(le16_to_cpu(de-rec_len))
+		 EXT4_DIR_REC_LEN(1))
 	break;
-i += le16_to_cpu(de-rec_len);
+i += ext4_get_rec_len(le16_to_cpu(de-rec_len));
 			}
 			offset = i;
 			filp-f_pos = (filp-f_pos  ~(sb-s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
 ret = stored;
 goto out;
 			}
-			offset += le16_to_cpu(de-rec_len);
+			offset += ext4_get_rec_len(le16_to_cpu(de-rec_len));
 			if (le32_to_cpu(de-inode)) {
 /* We might block in the next section
  * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
 	goto revalidate;
 stored ++;
 			}
-			filp-f_pos += le16_to_cpu(de-rec_len);
+			filp-f_pos += ext4_get_rec_len(le16_to_cpu(de-rec_len));
 		}
 		offset = 0;
 		brelse (bh);
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-rc6/fs/ext4/namei.c linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/namei.c
--- linux-2.6.23-rc6/fs/ext4/namei.c	2007-09-18 19:22:28.0 +0200
+++ linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/namei.c	2007-09-20 18:29:29.0 +0200
@@ -280,7 +280,7 @@ static struct stats dx_show_leaf(struct 
 			space += EXT4_DIR_REC_LEN(de-name_len);
 			names++;
 		}
-		de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de-rec_len));
+		de = ext4_next_entry(de);
 	}
 	printk((%i)\n, names);
 	return (struct stats) { names, space, 1 };
@@ -525,7 +525,8 @@ static int ext4_htree_next_block(struct 
  */
 static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 *p)
 {
-	return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p-rec_len));
+	return (struct ext4_dir_entry_2 *)((char*)p +
+		ext4_get_rec_len(le16_to_cpu(p-rec_len)));
 }
 
 /*
@@ -689,7 +690,7 @@ static int dx_make_map (struct ext4_dir_
 			cond_resched();
 		}
 		/* XXX: do we need to check rec_len == 0 case? -Chris */
-		de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de-rec_len));
+		de = ext4_next_entry(de);
 	}
 	return count;
 }
@@ -790,7 +791,7 @@ static inline int search_dirblock(struct
 			return 1;
 		}
 		/* prevent looping on a bad block */
-		de_len = le16_to_cpu(de-rec_len);
+		de_len = ext4_get_rec_len(le16_to_cpu(de-rec_len));
 		if (de_len = 0)
 			return -1;
 		offset += de_len;
@@ -1099,7 +1100,7 @@ dx_move_dirents(char *from, char *to, st
 		rec_len = EXT4_DIR_REC_LEN(de-name_len);
 		memcpy (to, de, rec_len);
 		((struct ext4_dir_entry_2 *) to)-rec_len =
-cpu_to_le16(rec_len);
+cpu_to_le16(ext4_store_rec_len(rec_len));
 		de-inode = 0;
 		map++;
 		to += rec_len;
@@ -1114,13 +1115,12 @@ static struct ext4_dir_entry_2* dx_pack_
 
 	prev = to = de;
 	while ((char*)de  base + size) {
-		next = (struct ext4_dir_entry_2 *) ((char *) de +
-		le16_to_cpu(de-rec_len));
+		next = ext4_next_entry(de);
 		if (de-inode  de-name_len) {
 			rec_len = EXT4_DIR_REC_LEN(de-name_len);
 			if (de  to)
 memmove(to, de, rec_len);
-			to-rec_len = cpu_to_le16(rec_len);
+			to-rec_len = cpu_to_le16(ext4_store_rec_len(rec_len));
 			prev = to;
 			to = (struct ext4_dir_entry_2 *) (((char *) to) + rec_len);
 		}
@@ -1178,8 +1178,8 @@ static struct ext4_dir_entry_2 *do_split
 	/* Fancy dance 

Re: Enabling h-trees too early?

2007-09-20 Thread Theodore Tso
On Thu, Sep 20, 2007 at 06:19:04PM +0200, Jan Kara wrote:
 if (EXT4_HAS_COMPAT_FEATURE(inode-i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) 
 ((EXT4_I(inode)-i_flags  EXT4_INDEX_FL) ||
  ((inode-i_size  sb-s_blocksize_bits) == 1))) {
 error = ext4_dx_readdir(filp, dirent, filldir);
 if (error != ERR_BAD_DX_DIR) {
 ret = error;
 goto out;
 }
 /*
  * We don't set the inode dirty flag since it's not
  * critical that it get flushed back to the disk.
  */
 EXT4_I(filp-f_path.dentry-d_inode)-i_flags = ~EXT4_INDEX_FL;
 }
   It calls ext4_dx_readdir() for *every* directory with 1 block (we have
 1326 of them in the kernel tree). Now ext4_dx_readdir() calls
 ext4_htree_fill_tree() which finds out the directory is not h-tree and
 and calls htree_dirblock_to_tree(). So even for 4KB directories we end up
 deleting inodes in hash order! And as a bonus we burn some cycles building
 trees etc. What is the point of this?

That was added so we wouldn't get screwed when a directory that was
previously non htree became an htree directory while the directory fd
is open.  So the failure case is one where you do opendir(), readdir()
on 25% of the directory, sleep for 2 hours, and in the meantime, 200
files are added to the directory and it gets converted into a htree
index, causing all of the previously returned readdir() results in
directory order to be completely screwed up now that the directory has
been converted into an htree.  (All of the readdir/telldir/seekdir 
POSIX requirements cause filesystem designers to tear their hair out.)

What we would need to do to avoid needing this is to read in the
entire directory leaf page into the rbtree, sorted by inode number,
and then to keep that rbtree for the entire life of the open directory
file descriptor.  We would also have to change telldir/seekdir to use
something else as a telldir cookie, and readdir would have to be set
up to *only* use the rbtree, and never look at the on-disk directory.
This would also mean that all of the files created or deleted after
the initial opendir() would never be reflected in results returned by
readdir(), but that's allowed by POSIX.  And if we do this for a
single block 4k directory, we might as well do it for a 32k or 64k
HTREE directory as well.

Does that make sense?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH e2fsprogs] return status from chattr

2007-09-20 Thread Eric Sandeen
This is for RH bug #180596, Chattr command doesn't provide expected 
exit code in case of failure.

(trying to clear out an e2fsprogs bug backlog, can you tell?)  :)

This is a little funky as a result of the man page saying that
links encountered on recursive traversal are (silently?) ignored.

I changed this a bit so that if it's explicitly listed on the
commandline, the link itself gets chattr'd.  I'm not quite sure 
what is intended here; that the links are not *followed* or
that they are not chattr'd?  Seems a little odd to me.

I tried to follow the way other recursive commands work, for example
chmod -R, and carry on in the face of any errors.  If any error
was encountered, exit with an error.  If no errors, exit 0.

Also, if both flags and -v (version) are specified, and the flag 
set encounters an error, the version set is not attempted.  Is this 
ok or should both commands be tried?

Finally, I'm curious, the utility ignores anything that's not a link,
regular file, or dir, but the kernel code doesn't have these checks.
Should it?

Comments?

Thanks,
-Eric

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]


Index: e2fsprogs-1.40.2/misc/chattr.c
===
--- e2fsprogs-1.40.2.orig/misc/chattr.c
+++ e2fsprogs-1.40.2/misc/chattr.c
@@ -182,7 +182,7 @@ static int decode_arg (int * i, int argc
 
 static int chattr_dir_proc (const char *, struct dirent *, void *);
 
-static void change_attributes (const char * name)
+static int change_attributes (const char * name, int cmdline)
 {
unsigned long flags;
STRUCT_STAT st;
@@ -190,19 +190,20 @@ static void change_attributes (const cha
if (LSTAT (name, st) == -1) {
com_err (program_name, errno, _(while trying to stat %s), 
 name);
-   return;
+   return -1;
}
-   if (S_ISLNK(st.st_mode)  recursive)
-   return;
 
-   /* Don't try to open device files, fifos etc.  We probably
-   ought to display an error if the file was explicitly given
-   on the command line (whether or not recursive was
-   requested).  */
-   if (!S_ISREG(st.st_mode)  !S_ISLNK(st.st_mode) 
-   !S_ISDIR(st.st_mode))
-   return;
+   /* Just silently ignore links found by recursion;
+  not an error according to the manpage */
+   if (S_ISLNK(st.st_mode)  !cmdline)
+   return 0;
 
+   /* Don't try to open device files, fifos etc. */
+   if (!S_ISREG(st.st_mode)  !S_ISLNK(st.st_mode) 
+   !S_ISDIR(st.st_mode)) {
+   com_err (program_name, EINVAL, _(for file %s), name);
+   return -1;
+   }
if (set) {
if (verbose) {
printf (_(Flags of %s set as ), name);
@@ -212,10 +213,11 @@ static void change_attributes (const cha
if (fsetflags (name, sf) == -1)
perror (name);
} else {
-   if (fgetflags (name, flags) == -1)
+   if (fgetflags (name, flags) == -1) {
com_err (program_name, errno,
 _(while reading flags on %s), name);
-   else {
+   return -1;
+   } else {
if (rem)
flags = ~rf;
if (add)
@@ -227,25 +229,32 @@ static void change_attributes (const cha
}
if (!S_ISDIR(st.st_mode))
flags = ~EXT2_DIRSYNC_FL;
-   if (fsetflags (name, flags) == -1)
+   if (fsetflags (name, flags) == -1) {
com_err (program_name, errno,
 _(while setting flags on %s), name);
+   return -1;
+   }
+
}
}
if (set_version) {
if (verbose)
printf (_(Version of %s set as %lu\n), name, version);
-   if (fsetversion (name, version) == -1)
+   if (fsetversion (name, version) == -1) {
com_err (program_name, errno,
 _(while setting version on %s), name);
+   return -1;
+   }
}
if (S_ISDIR(st.st_mode)  recursive)
-   iterate_on_dir (name, chattr_dir_proc, NULL);
+   return iterate_on_dir (name, chattr_dir_proc, NULL);
 }
 
 static int chattr_dir_proc (const char * dir_name, struct dirent * de,
void * private EXT2FS_ATTR((unused)))
 {
+   int err;
+
if (strcmp (de-d_name, .)  strcmp (de-d_name, ..)) {
char *path;
 
@@ -253,11 +262,13 @@ static int chattr_dir_proc (const char *
if (!path) {
fprintf(stderr, 

Re: Avoid rec_len overflow with 64KB block size

2007-09-20 Thread Andreas Dilger
On Sep 20, 2007  18:17 +0200, Jan Kara wrote:
when converting ext4 directories to pagecache I just came over
  Takashi's patch preventing overflowing of rec_len. Looking over the
  patch - can't we do it more elegantly by using say 0x instead of 64K
  and perform conversion (using some helper) at the moment we read / store
  rec_len? That would be IMHO more transparent than current approach (at
  least it took me some time to understand what's going on with the
  current patch when I was looking at the code)...

   Attached is a patch that does this for ext4. If you like this
 approach, I can cook up a similar patch for ext2 / ext3.

Yes, I think this is much cleaner to avoid all the conditionals in the
code.

 With 64KB blocksize, a directory entry can have size 64KB which does not fit
 into 16 bits we have for entry lenght. So we store 0x instead and convert
 value when read from / written to disk. The patch also converts some places
 to use ext4_next_entry() when we are changing them anyway.
 
   const char * error_msg = NULL;
 - const int rlen = le16_to_cpu(de-rec_len);
 + const int rlen = ext4_get_rec_len(le16_to_cpu(de-rec_len));

Maybe we should wrap the le16_to_cpu() into ext4_get_rec_len() itself,
making the parameter just be __le16 rec_len?  We appear to have
le16_to_cpu() at every callsite.

Likewise for ext4_store_rec_len() it should do the cpu_to_le16() internally
and return an __le16.  It should maybe be called ext4_set_rec_len() to be
a more natural pairing?

This also needs a patch for e2fsprogs, while I'm not sure the old patch did
(has anyone ever checked this?) We could still consider making
EXT4_DIR_MAX_REC_LEN as in Takashi's patch, but keep the cleanups here.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - ignore bind mounts in fsck

2007-09-20 Thread Theodore Tso
It might also be worthwhile to file a documentation bug against the
mount and fstab man pages, since it doesn't currently seem to specify
(at least on my Ubuntu system; maybe it's been fixed in newer upstream
packages) that you can specify the bind option in the fstab file.  

/src/dest   ext3bind,default

It's not clear to me that this should be the preferred form.  Why not?

/src/dest  bind  defaults

or


/src/dest  none  bind,defaults


instead?  In any case, how bind mounts are supposed to be specified in
fstab is not documented, and it really should be.

   - Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - ignore bind mounts in fsck

2007-09-20 Thread Theodore Tso
This is what I actually committed into e2fsprogs git, in the maint
branch.  Note the one-line summary at the beginning of the patch
description, and the Addresses-Red-Hat-Bugzilla line before the
Signed-off-by lines.

- Ted

commit ed773a263829493e4e4bf612dbec2380cf09349f
Author: Theodore Ts'o [EMAIL PROTECTED]
Date:   Thu Sep 20 15:06:35 2007 -0400

fsck: Ignore /etc/fstab entries for bind mounts

If a user specifies a bind mount with a non-zero fsck pass number, for
example:

/foo/barext3bind,defaults   1 3

print a warning and ignore the fstab entry.

Addresses-Red-Hat-Bugzilla: #151533

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
Signed-off-by: Theodore Ts'o [EMAIL PROTECTED]

diff --git a/misc/fsck.c b/misc/fsck.c
index 1dcac25..108adf6 100644
--- a/misc/fsck.c
+++ b/misc/fsck.c
@@ -867,6 +867,16 @@ static int ignore(struct fs_info *fs)
if (fs-passno == 0)
return 1;
 
+   /*
+* If this is a bind mount, ignore it.
+*/
+   if (opt_in_list(bind, fs-opts)) {
+   fprintf(stderr,
+   _(%s: skipping bad line in /etc/fstab: bind mount with 
nonzero fsck pass number\n),
+   fs-mountpt);
+   return 1;
+   }
+
interpret_type(fs);
 
/*
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] obsolete libcom-err for SuSE e2fsprogs

2007-09-20 Thread Eric Sandeen
Andreas Dilger wrote:
 On Sep 19, 2007  20:41 -0500, Eric Sandeen wrote:
 Andreas Dilger wrote:
 It isn't possible to build an e2fsprogs via make rpm on SuSE and have it
 install cleanly, because they split out some of the libraries into separate
 packages.

 We've got the current patch to the .spec file, but I'm open to discussion
 if it is more desirable to change the .spec to continue to build separate
 RPMs (though that is more of a distribution hassle and might need major
 changes in the .spec file).
 FWIW, I also have an RFE assigned to me for RHEL/Fedora to split up our
 e2fsprogs packages for libcom_err and libuuid... since many
 non-filesystem things now require them.   So, this is sort of going in
 the opposite direction.  :)

 Any idea how many distros already split it out?
 
 I know Debian-based distros have done this for ages...
 
 I'd also welcome someone with rpm-fu split it into separate packages.

I'd do this, my rpm-fu is still reasonably strong, though - I'm curious,
is there a compelling reason to split out just libcom-err?  what about
libuuid?  libblkid?  e2fsprogs is a bit of a grab bag of things.  What's
the rationale for the split?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-20 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700
Avantika Mathur [EMAIL PROTECTED] wrote:

 In pass1 of e2fsck, every inode table in the fileystem is scanned and 
 checked, 
 regardless of whether it is in use.  This is this the most time consuming 
 part 
 of the filesystem check.  The unintialized block group feature can greatly 
 reduce e2fsck time by eliminating checking of uninitialized inodes.  
 
 With this feature, there is a a high water mark of used inodes for each block 
 group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
 group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
 of each group descriptor is used to ensure that corruption in the group
 descriptor's bit flags does not cause incorrect operation.

This needed a few fixups due to conflicts with
ext2-ext3-ext4-add-block-bitmap-validation.patch but they were pretty
straightforward.  Please check that the result is OK.


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


+ ext4-uninitialized-block-groups.patch added to -mm tree

2007-09-20 Thread akpm

The patch titled
 Ext4: Uninitialized Block Groups
has been added to the -mm tree.  Its filename is
 ext4-uninitialized-block-groups.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: Ext4: Uninitialized Block Groups
From: Andreas Dilger [EMAIL PROTECTED]

In pass1 of e2fsck, every inode table in the fileystem is scanned and
checked, regardless of whether it is in use.  This is this the most time
consuming part of the filesystem check.  The unintialized block group
feature can greatly reduce e2fsck time by eliminating checking of
uninitialized inodes.

With this feature, there is a a high water mark of used inodes for each
block group.  Block and inode bitmaps can be uninitialized on disk via a
flag in the group descriptor to avoid reading or scanning them at e2fsck
time.  A checksum of each group descriptor is used to ensure that
corruption in the group descriptor's bit flags does not cause incorrect
operation.

The feature is enabled through a mkfs option

mke2fs /dev/ -O uninit_groups

A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.

The patches have been stress tested with fsstress and fsx.  In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem.  In ext4 with
the uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count. 
Since typical ext4 filesystems only use 1-10% of their inodes, this feature
can greatly reduce e2fsck time for users.  With performance improvement of
2-20 times, depending on how full the filesystem is.

The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.

In each group descriptor if we have

EXT4_BG_INODE_UNINIT set in bg_flags:
Inode table is not initialized/used in this group. So we can skip
the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
No block in the group is used. So we can skip the block bitmap
verification for this group.

We also add two new fields to group descriptor as a part of
uninitialized group patch.

__le16  bg_itable_unused;   /* Unused inodes count */
__le16  bg_checksum;/* crc16(sb_uuid+group+desc) */

bg_itable_unused:

If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.

bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Avantika Mathur [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/ext4/balloc.c|   92 +-
 fs/ext4/group.h |   29 
 fs/ext4/ialloc.c|  132 +++---
 fs/ext4/resize.c|2 
 fs/ext4/super.c |   96 +++
 include/linux/ext4_fs.h |   16 +++-
 6 files changed, 351 insertions(+), 16 deletions(-)

diff -puN fs/ext4/balloc.c~ext4-uninitialized-block-groups fs/ext4/balloc.c
--- a/fs/ext4/balloc.c~ext4-uninitialized-block-groups
+++ a/fs/ext4/balloc.c
@@ -20,6 +20,7 @@
 #include linux/quotaops.h
 #include linux/buffer_head.h
 
+#include group.h
 /*
  * balloc.c contains the blocks allocation and deallocation routines
  */
@@ -42,6 +43,74 @@ void ext4_get_group_no_and_offset(struct
 
 }
 
+/* Initializes an uninitialized block bitmap if given, and returns the
+ * number of blocks free in the group. */
+unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
+   int block_group, struct ext4_group_desc *gdp)
+{
+   unsigned long start;
+   int bit, bit_max;
+   unsigned free_blocks;
+   struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+   if (bh) {
+   J_ASSERT_BH(bh, buffer_locked(bh));
+
+   /* If checksum is bad mark all blocks used to prevent allocation
+* essentially implementing a per-group read-only flag. */
+   if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
+   ext4_error(sb, __FUNCTION__,
+  

+ ext3-new-export-ops.patch added to -mm tree

2007-09-20 Thread akpm

The patch titled
 ext3: new export ops
has been added to the -mm tree.  Its filename is
 ext3-new-export-ops.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: ext3: new export ops
From: Christoph Hellwig [EMAIL PROTECTED]

Trivial switch over to the new generic helpers.

Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]
Cc: Neil Brown [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/ext3/super.c |   35 ---
 1 files changed, 20 insertions(+), 15 deletions(-)

diff -puN fs/ext3/super.c~ext3-new-export-ops fs/ext3/super.c
--- a/fs/ext3/super.c~ext3-new-export-ops
+++ a/fs/ext3/super.c
@@ -631,13 +631,10 @@ static int ext3_show_options(struct seq_
 }
 
 
-static struct dentry *ext3_get_dentry(struct super_block *sb, void *vobjp)
+static struct inode *ext3_nfs_get_inode(struct super_block *sb,
+   u64 ino, u32 generation)
 {
-   __u32 *objp = vobjp;
-   unsigned long ino = objp[0];
-   __u32 generation = objp[1];
struct inode *inode;
-   struct dentry *result;
 
if (ino  EXT3_FIRST_INO(sb)  ino != EXT3_ROOT_INO)
return ERR_PTR(-ESTALE);
@@ -660,15 +657,22 @@ static struct dentry *ext3_get_dentry(st
iput(inode);
return ERR_PTR(-ESTALE);
}
-   /* now to find a dentry.
-* If possible, get a well-connected one
-*/
-   result = d_alloc_anon(inode);
-   if (!result) {
-   iput(inode);
-   return ERR_PTR(-ENOMEM);
-   }
-   return result;
+
+   return inode;
+}
+
+static struct dentry *ext3_fh_to_dentry(struct super_block *sb, struct fid 
*fid,
+   int fh_len, int fh_type)
+{
+   return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+   ext3_nfs_get_inode);
+}
+
+static struct dentry *ext3_fh_to_parent(struct super_block *sb, struct fid 
*fid,
+   int fh_len, int fh_type)
+{
+   return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+   ext3_nfs_get_inode);
 }
 
 #ifdef CONFIG_QUOTA
@@ -738,8 +742,9 @@ static const struct super_operations ext
 };
 
 static struct export_operations ext3_export_ops = {
+   .fh_to_dentry = ext3_fh_to_dentry,
+   .fh_to_parent = ext3_fh_to_parent,
.get_parent = ext3_get_parent,
-   .get_dentry = ext3_get_dentry,
 };
 
 enum {
_

Patches currently in -mm which might be from [EMAIL PROTECTED] are

git-nfs.patch
git-nfsd.patch
partially-fix-up-the-lookup_one_noperm-mess.patch
optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch
optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch
git-xfs.patch
sysv-convert-to-new-aops.patch
alpha-convert-to-generic-sys_ptrace.patch
kill-declare_mutex_locked.patch
remove-unneded-lock_kernel-in-driver-block-loopc.patch
ufs-move-non-layout-parts-of-ufs_fsh-to-fs-ufs.patch
fix-execute-checking-in-permission.patch
exec-remove-unnecessary-check-for-mnt_noexec.patch
fix-f_version-type-should-be-u64-instead-of-unsigned-long.patch
unprivileged-mounts-add-user-mounts-to-the-kernel.patch
unprivileged-mounts-allow-unprivileged-umount.patch
unprivileged-mounts-account-user-mounts.patch
unprivileged-mounts-propagate-error-values-from-clone_mnt.patch
unprivileged-mounts-allow-unprivileged-bind-mounts.patch
unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch
unprivileged-mounts-allow-unprivileged-mounts.patch
unprivileged-mounts-allow-unprivileged-fuse-mounts.patch
unprivileged-mounts-propagation-inherit-owner-from-parent.patch
unprivileged-mounts-add-no-submounts-flag.patch
revoke-special-mmap-handling.patch
revoke-core-code.patch
revoke-support-for-ext2-and-ext3.patch
revoke-add-documentation.patch
revoke-wire-up-i386-system-calls.patch
exportfs-add-fid-type.patch
exportfs-add-new-methods.patch
ext2-new-export-ops.patch
ext3-new-export-ops.patch
ext4-new-export-ops.patch
efs-new-export-ops.patch
jfs-new-export-ops.patch
ntfs-new-export-ops.patch
xfs-new-export-ops.patch
fat-new-export-ops.patch
isofs-new-export-ops.patch
shmem-new-export-ops.patch
reiserfs-new-export-ops.patch
gfs2-new-export-ops.patch
ocfs2-new-export-ops.patch
exportfs-remove-old-methods.patch
exportfs-make-struct-export_operations-const.patch
exportfs-update-documentation.patch

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


+ ext4-new-export-ops.patch added to -mm tree

2007-09-20 Thread akpm

The patch titled
 ext4: new export ops
has been added to the -mm tree.  Its filename is
 ext4-new-export-ops.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: ext4: new export ops
From: Christoph Hellwig [EMAIL PROTECTED]

Trivial switch over to the new generic helpers.

Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]
Cc: Neil Brown [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/ext4/super.c |   35 ---
 1 files changed, 20 insertions(+), 15 deletions(-)

diff -puN fs/ext4/super.c~ext4-new-export-ops fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-new-export-ops
+++ a/fs/ext4/super.c
@@ -694,13 +694,10 @@ static int ext4_show_options(struct seq_
 }
 
 
-static struct dentry *ext4_get_dentry(struct super_block *sb, void *vobjp)
+static struct inode *ext4_nfs_get_inode(struct super_block *sb,
+   u64 ino, u32 generation)
 {
-   __u32 *objp = vobjp;
-   unsigned long ino = objp[0];
-   __u32 generation = objp[1];
struct inode *inode;
-   struct dentry *result;
 
if (ino  EXT4_FIRST_INO(sb)  ino != EXT4_ROOT_INO)
return ERR_PTR(-ESTALE);
@@ -723,15 +720,22 @@ static struct dentry *ext4_get_dentry(st
iput(inode);
return ERR_PTR(-ESTALE);
}
-   /* now to find a dentry.
-* If possible, get a well-connected one
-*/
-   result = d_alloc_anon(inode);
-   if (!result) {
-   iput(inode);
-   return ERR_PTR(-ENOMEM);
-   }
-   return result;
+
+   return inode;
+}
+
+static struct dentry *ext4_fh_to_dentry(struct super_block *sb, struct fid 
*fid,
+   int fh_len, int fh_type)
+{
+   return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+   ext4_nfs_get_inode);
+}
+
+static struct dentry *ext4_fh_to_parent(struct super_block *sb, struct fid 
*fid,
+   int fh_len, int fh_type)
+{
+   return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+   ext4_nfs_get_inode);
 }
 
 #ifdef CONFIG_QUOTA
@@ -801,8 +805,9 @@ static const struct super_operations ext
 };
 
 static struct export_operations ext4_export_ops = {
+   .fh_to_dentry = ext4_fh_to_dentry,
+   .fh_to_parent = ext4_fh_to_parent,
.get_parent = ext4_get_parent,
-   .get_dentry = ext4_get_dentry,
 };
 
 enum {
_

Patches currently in -mm which might be from [EMAIL PROTECTED] are

git-nfs.patch
git-nfsd.patch
partially-fix-up-the-lookup_one_noperm-mess.patch
optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch
optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch
git-xfs.patch
sysv-convert-to-new-aops.patch
alpha-convert-to-generic-sys_ptrace.patch
kill-declare_mutex_locked.patch
remove-unneded-lock_kernel-in-driver-block-loopc.patch
ufs-move-non-layout-parts-of-ufs_fsh-to-fs-ufs.patch
fix-execute-checking-in-permission.patch
exec-remove-unnecessary-check-for-mnt_noexec.patch
fix-f_version-type-should-be-u64-instead-of-unsigned-long.patch
unprivileged-mounts-add-user-mounts-to-the-kernel.patch
unprivileged-mounts-allow-unprivileged-umount.patch
unprivileged-mounts-account-user-mounts.patch
unprivileged-mounts-propagate-error-values-from-clone_mnt.patch
unprivileged-mounts-allow-unprivileged-bind-mounts.patch
unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch
unprivileged-mounts-allow-unprivileged-mounts.patch
unprivileged-mounts-allow-unprivileged-fuse-mounts.patch
unprivileged-mounts-propagation-inherit-owner-from-parent.patch
unprivileged-mounts-add-no-submounts-flag.patch
revoke-special-mmap-handling.patch
revoke-core-code.patch
revoke-support-for-ext2-and-ext3.patch
revoke-add-documentation.patch
revoke-wire-up-i386-system-calls.patch
exportfs-add-fid-type.patch
exportfs-add-new-methods.patch
ext2-new-export-ops.patch
ext3-new-export-ops.patch
ext4-new-export-ops.patch
efs-new-export-ops.patch
jfs-new-export-ops.patch
ntfs-new-export-ops.patch
xfs-new-export-ops.patch
fat-new-export-ops.patch
isofs-new-export-ops.patch
shmem-new-export-ops.patch
reiserfs-new-export-ops.patch
gfs2-new-export-ops.patch
ocfs2-new-export-ops.patch
exportfs-remove-old-methods.patch
exportfs-make-struct-export_operations-const.patch
exportfs-update-documentation.patch

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


+ exportfs-add-new-methods.patch added to -mm tree

2007-09-20 Thread akpm

The patch titled
 exportfs: add new methods
has been added to the -mm tree.  Its filename is
 exportfs-add-new-methods.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: exportfs: add new methods
From: Christoph Hellwig [EMAIL PROTECTED]

Add the guts for the new filesystem API to exportfs.

There's now a fh_to_dentry method that returns a dentry for the object looked
for given a filehandle fragment, and a fh_to_parent operation that returns the
dentry for the encoded parent directory in case the file handle contains it.

There are default implementations for these methods that only take a callback
for an nfs-enhanced iget variant and implement the rest of the semantics.

Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]
Cc: Neil Brown [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Cc: Dave Kleikamp [EMAIL PROTECTED]
Cc: Anton Altaparmakov [EMAIL PROTECTED]
Cc: David Chinner [EMAIL PROTECTED]
Cc: Timothy Shimmin [EMAIL PROTECTED]
Cc: OGAWA Hirofumi [EMAIL PROTECTED]
Cc: Hugh Dickins [EMAIL PROTECTED]
Cc: Chris Mason [EMAIL PROTECTED]
Cc: Jeff Mahoney [EMAIL PROTECTED]
Cc: Vladimir V. Saveliev [EMAIL PROTECTED]
Cc: Steven Whitehouse [EMAIL PROTECTED]
Cc: Mark Fasheh [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/exportfs/expfs.c  |  136 +++--
 fs/libfs.c   |   88 +++
 include/linux/exportfs.h |   30 
 3 files changed, 248 insertions(+), 6 deletions(-)

diff -puN fs/exportfs/expfs.c~exportfs-add-new-methods fs/exportfs/expfs.c
--- a/fs/exportfs/expfs.c~exportfs-add-new-methods
+++ a/fs/exportfs/expfs.c
@@ -514,17 +514,141 @@ struct dentry *exportfs_decode_fh(struct
int (*acceptable)(void *, struct dentry *), void *context)
 {
struct export_operations *nop = mnt-mnt_sb-s_export_op;
-   struct dentry *result;
+   struct dentry *result, *alias;
+   int err;
 
-   if (nop-decode_fh) {
-   result = nop-decode_fh(mnt-mnt_sb, fid-raw, fh_len,
+   /*
+* Old way of doing things.  Will go away soon.
+*/
+   if (!nop-fh_to_dentry) {
+   if (nop-decode_fh) {
+   return nop-decode_fh(mnt-mnt_sb, fid-raw, fh_len,
fileid_type, acceptable, context);
+   } else {
+   return export_decode_fh(mnt-mnt_sb, fid-raw, fh_len,
+   fileid_type, acceptable, context);
+   }
+   }
+
+   /*
+* Try to get any dentry for the given file handle from the filesystem.
+*/
+   result = nop-fh_to_dentry(mnt-mnt_sb, fid, fh_len, fileid_type);
+   if (!result)
+   result = ERR_PTR(-ESTALE);
+   if (IS_ERR(result))
+   return result;
+
+   if (S_ISDIR(result-d_inode-i_mode)) {
+   /*
+* This request is for a directory.
+*
+* On the positive side there is only one dentry for each
+* directory inode.  On the negative side this implies that we
+* to ensure our dentry is connected all the way up to the
+* filesystem root.
+*/
+   if (result-d_flags  DCACHE_DISCONNECTED) {
+   err = reconnect_path(mnt-mnt_sb, result);
+   if (err)
+   goto err_result;
+   }
+
+   if (!acceptable(context, result)) {
+   err = -EACCES;
+   goto err_result;
+   }
+
+   return result;
} else {
-   result = export_decode_fh(mnt-mnt_sb, fid-raw, fh_len,
- fileid_type, acceptable, context);
+   /*
+* It's not a directory.  Life is a little more complicated.
+*/
+   struct dentry *target_dir, *nresult;
+   char nbuf[NAME_MAX+1];
+
+   /*
+* See if either the dentry we just got from the filesystem
+* or any alias for it is acceptable.  This is always true
+* if this filesystem is exported without the subtreecheck
+* option.  If the filesystem is exported with the subtree
+* check option there's a fair chance we need to look at
+* the parent directory in the file handle and make sure
+* it's connected to the filesystem root.
+*/
+   alias = find_acceptable_alias(result, acceptable, context);
+   if (alias)
+   return alias;
+
+   /*
+ 

[PATCH] Introduce ext4_find_next_bit

2007-09-20 Thread Aneesh Kumar K.V
Also add generic_find_next_le_bit

This gets used by the ext4 multi block allocator patches.

Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
---
 include/asm-generic/bitops/ext2-non-atomic.h |2 +
 include/asm-generic/bitops/le.h  |4 ++
 include/asm-powerpc/bitops.h |4 ++
 include/linux/ext4_fs.h  |1 +
 lib/find_next_bit.c  |   44 ++
 5 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/bitops/ext2-non-atomic.h 
b/include/asm-generic/bitops/ext2-non-atomic.h
index 1697404..63cf822 100644
--- a/include/asm-generic/bitops/ext2-non-atomic.h
+++ b/include/asm-generic/bitops/ext2-non-atomic.h
@@ -14,5 +14,7 @@
generic_find_first_zero_le_bit((unsigned long *)(addr), (size))
 #define ext2_find_next_zero_bit(addr, size, off) \
generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off))
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #endif /* _ASM_GENERIC_BITOPS_EXT2_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index b9c7e5d..80e3bf1 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -20,6 +20,8 @@
 #define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, 
addr)
 
 #define generic_find_next_zero_le_bit(addr, size, offset) 
find_next_zero_bit(addr, size, offset)
+#define generic_find_next_le_bit(addr, size, offset) \
+   find_next_bit(addr, size, offset)
 
 #elif defined(__BIG_ENDIAN)
 
@@ -42,6 +44,8 @@
 
 extern unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset);
+extern unsigned long generic_find_next_le_bit(const unsigned long *addr,
+   unsigned long size, unsigned long offset);
 
 #else
 #error Please fix asm/byteorder.h
diff --git a/include/asm-powerpc/bitops.h b/include/asm-powerpc/bitops.h
index 8144a27..60652a3 100644
--- a/include/asm-powerpc/bitops.h
+++ b/include/asm-powerpc/bitops.h
@@ -310,6 +310,8 @@ static __inline__ int test_le_bit(unsigned long nr,
 unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr,
+   unsigned long size, unsigned long offset);
 /* Bitmap functions for the ext2 filesystem */
 
 #define ext2_set_bit(nr,addr) \
@@ -329,6 +331,8 @@ unsigned long generic_find_next_zero_le_bit(const unsigned 
long *addr,
 #define ext2_find_next_zero_bit(addr, size, off) \
generic_find_next_zero_le_bit((unsigned long*)addr, size, off)
 
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)addr, size, off)
 /* Bitmap functions for the minix filesystem.  */
 
 #define minix_test_and_set_bit(nr,addr) \
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index cdee7aa..c7b9bb2 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -502,6 +502,7 @@ do {
   \
 #define ext4_test_bit  ext2_test_bit
 #define ext4_find_first_zero_bit   ext2_find_first_zero_bit
 #define ext4_find_next_zero_bitext2_find_next_zero_bit
+#define ext4_find_next_bit ext2_find_next_bit
 
 /*
  * Maximal mount counts between two filesystem checks
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index bda0d71..0306c04 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -178,4 +178,48 @@ found_middle_swap:
 
 EXPORT_SYMBOL(generic_find_next_zero_le_bit);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr, unsigned
+   long size, unsigned long offset)
+{
+   const unsigned long *p = addr + BITOP_WORD(offset);
+   unsigned long result = offset  ~(BITS_PER_LONG - 1);
+   unsigned long tmp;
+
+   if (offset = size)
+   return size;
+   size -= result;
+   offset = (BITS_PER_LONG - 1UL);
+   if (offset) {
+   tmp = ext2_swabp(p++);
+   tmp = (~0UL  offset);
+   if (size  BITS_PER_LONG)
+   goto found_first;
+   if (tmp)
+   goto found_middle;
+   size -= BITS_PER_LONG;
+   result += BITS_PER_LONG;
+   }
+
+   while (size  ~(BITS_PER_LONG - 1)) {
+   tmp = *(p++);
+   if (tmp)
+   goto found_middle_swap;
+   result += BITS_PER_LONG;
+   size -= BITS_PER_LONG;
+   }
+   if (!size)
+   return result;
+   tmp = ext2_swabp(p);
+found_first:
+   tmp = (~0UL  (BITS_PER_LONG - size));
+   if (tmp == 0UL)   

[PATCH] ext4: Fix spare warnings

2007-09-20 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
---
 fs/ext4/inode.c |6 --
 include/linux/ext4_fs.h |   16 
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a4848e0..307e240 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3177,12 +3177,14 @@ int ext4_mark_inode_dirty(handle_t *handle, struct 
inode *inode)
  iloc, handle);
if (ret) {
EXT4_I(inode)-i_state |= EXT4_STATE_NO_EXPAND;
-   if (mnt_count != sbi-s_es-s_mnt_count) {
+   if (mnt_count !=
+   le16_to_cpu(sbi-s_es-s_mnt_count)) {
ext4_warning(inode-i_sb, __FUNCTION__,
Unable to expand inode %lu. Delete
 some EAs or run e2fsck.,
inode-i_ino);
-   mnt_count = sbi-s_es-s_mnt_count;
+   mnt_count =
+   le16_to_cpu(sbi-s_es-s_mnt_count);
}
}
}
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index c7b9bb2..ab7edaa 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -129,7 +129,7 @@ struct ext4_group_desc
__le16  bg_free_blocks_count;   /* Free blocks count */
__le16  bg_free_inodes_count;   /* Free inodes count */
__le16  bg_used_dirs_count; /* Directories count */
-   __u16   bg_flags;
+   __le16  bg_flags;
__u32   bg_reserved[3];
__le32  bg_block_bitmap_hi; /* Blocks bitmap block MSB */
__le32  bg_inode_bitmap_hi; /* Inodes bitmap block MSB */
@@ -596,13 +596,13 @@ struct ext4_super_block {
 /*150*/__le32  s_blocks_count_hi;  /* Blocks count */
__le32  s_r_blocks_count_hi;/* Reserved blocks count */
__le32  s_free_blocks_count_hi; /* Free blocks count */
-   __u16   s_min_extra_isize;  /* All inodes have at least # bytes */
-   __u16   s_want_extra_isize; /* New inodes should reserve # bytes */
-   __u32   s_flags;/* Miscellaneous flags */
-   __u16   s_raid_stride;  /* RAID stride */
-   __u16   s_mmp_interval; /* # seconds to wait in MMP checking */
-   __u64   s_mmp_block;/* Block for multi-mount protection */
-   __u32   s_raid_stripe_width;/* blocks on all data disks (N*stride)*/
+   __le16  s_min_extra_isize;  /* All inodes have at least # bytes */
+   __le16  s_want_extra_isize; /* New inodes should reserve # bytes */
+   __le32  s_flags;/* Miscellaneous flags */
+   __le16  s_raid_stride;  /* RAID stride */
+   __le16  s_mmp_interval; /* # seconds to wait in MMP checking */
+   __le64  s_mmp_block;/* Block for multi-mount protection */
+   __le32  s_raid_stripe_width;/* blocks on all data disks (N*stride)*/
__u32   s_reserved[163];/* Padding to the end of the block */
 };
 
-- 
1.5.3.1.91.gd3392-dirty

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


+ ext4-uninitialized-block-groups-fix.patch added to -mm tree

2007-09-20 Thread akpm

The patch titled
 Ext4: Uninitialized Block Groups (fix)
has been added to the -mm tree.  Its filename is
 ext4-uninitialized-block-groups-fix.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: Ext4: Uninitialized Block Groups (fix)
From: Avantika Mathur [EMAIL PROTECTED]

Andreas Dilger wrote:
 On Sep 18, 2007  20:03 -0700, Andrew Morton wrote:

 On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur [EMAIL PROTECTED] wrote:


 +#if !defined(CONFIG_CRC16)
 +/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
 +__u16 const crc16_table[256] = { me
 +   0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,

 That's rather sad.  A plain old depends on would be better.


 My bad.  We wrote this patch and it had to run on older kernels that might
 not even have lib/crc16.c (it was added around 2.6.15 or so, so e.g. RHEL4
 doesn't have it).  I forgot to remove it from the upstream submission,
 and since it didn't cause problems nobody else complained...
The incremental patch below removes the local crc16 code, and also resolves an
issue with properly updating bg_itable_unused when an inode is allocated in an
unused block groups.

Cc: Andreas Dilger [EMAIL PROTECTED]
Cc: Avantika Mathur [EMAIL PROTECTED]
Cc: Mingming Cao [EMAIL PROTECTED]
Cc: Aneesh Kumar K.V [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/Kconfig   |1 
 fs/ext4/ialloc.c |8 ++-
 fs/ext4/super.c  |   51 -
 3 files changed, 9 insertions(+), 51 deletions(-)

diff -puN fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix fs/ext4/ialloc.c
--- a/fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix
+++ a/fs/ext4/ialloc.c
@@ -632,12 +632,18 @@ got:
if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
if (gdp-bg_flags  cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
gdp-bg_flags = cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-   free = EXT4_INODES_PER_GROUP(sb);
+   free = 0;
} else {
free = EXT4_INODES_PER_GROUP(sb) -
le16_to_cpu(gdp-bg_itable_unused);
}
 
+   /*
+* Check the relative inode number against the last used
+* relative inode number in this group. if it is greater
+* we need to  update the bg_itable_unused count
+*
+*/
if (ino  free)
gdp-bg_itable_unused =
cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
diff -puN fs/ext4/super.c~ext4-uninitialized-block-groups-fix fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-uninitialized-block-groups-fix
+++ a/fs/ext4/super.c
@@ -37,6 +37,7 @@
 #include linux/quotaops.h
 #include linux/seq_file.h
 #include linux/log2.h
+#include linux/crc16.h
 
 #include asm/uaccess.h
 
@@ -1337,56 +1338,6 @@ static int ext4_setup_super(struct super
return res;
 }
 
-#if !defined(CONFIG_CRC16)
-/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
-__u16 const crc16_table[256] = {
-   0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
-   0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440,
-   0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40,
-   0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841,
-   0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40,
-   0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41,
-   0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0, 0x1680, 0xD641,
-   0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1, 0xD081, 0x1040,
-   0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1, 0xF281, 0x3240,
-   0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0, 0x3480, 0xF441,
-   0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0, 0x3E80, 0xFE41,
-   0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1, 0xF881, 0x3840,
-   0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0, 0x2A80, 0xEA41,
-   0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1, 0xEC81, 0x2C40,
-   0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1, 0xE681, 0x2640,
-   0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0, 0x2080, 0xE041,
-   0xA001, 0x60C0, 0x6180, 0xA141, 0x6300, 0xA3C1, 0xA281, 0x6240,
-   0x6600, 0xA6C1, 0xA781, 0x6740, 0xA501, 0x65C0, 0x6480, 0xA441,
-   0x6C00, 0xACC1, 0xAD81, 0x6D40, 0xAF01, 0x6FC0, 0x6E80, 0xAE41,
-   0xAA01, 0x6AC0, 0x6B80, 0xAB41, 0x6900, 0xA9C1, 0xA881, 0x6840,
-   0x7800, 0xB8C1, 0xB981, 0x7940, 0xBB01, 0x7BC0, 0x7A80, 0xBA41,
-   0xBE01, 0x7EC0, 0x7F80, 0xBF41, 0x7D00, 0xBDC1, 

Re: + ext4-uninitialized-block-groups-fix.patch added to -mm tree

2007-09-20 Thread Aneesh Kumar K.V



[EMAIL PROTECTED] wrote:

The patch titled
 Ext4: Uninitialized Block Groups (fix)
has been added to the -mm tree.  Its filename is
 ext4-uninitialized-block-groups-fix.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: Ext4: Uninitialized Block Groups (fix)
From: Avantika Mathur [EMAIL PROTECTED]

Andreas Dilger wrote:

On Sep 18, 2007  20:03 -0700, Andrew Morton wrote:


On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur [EMAIL PROTECTED] wrote:



+#if !defined(CONFIG_CRC16)
+/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
+__u16 const crc16_table[256] = { me
+   0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,


That's rather sad.  A plain old depends on would be better.


My bad.  We wrote this patch and it had to run on older kernels that might
not even have lib/crc16.c (it was added around 2.6.15 or so, so e.g. RHEL4
doesn't have it).  I forgot to remove it from the upstream submission,
and since it didn't cause problems nobody else complained...

The incremental patch below removes the local crc16 code, and also resolves an
issue with properly updating bg_itable_unused when an inode is allocated in an
unused block groups.

Cc: Andreas Dilger [EMAIL PROTECTED]
Cc: Avantika Mathur [EMAIL PROTECTED]
Cc: Mingming Cao [EMAIL PROTECTED]
Cc: Aneesh Kumar K.V [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]


Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]



---

 fs/Kconfig   |1 
 fs/ext4/ialloc.c |8 ++-

 fs/ext4/super.c  |   51 -
 3 files changed, 9 insertions(+), 51 deletions(-)

diff -puN fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix fs/ext4/ialloc.c
--- a/fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix
+++ a/fs/ext4/ialloc.c
@@ -632,12 +632,18 @@ got:
if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
if (gdp-bg_flags  cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
gdp-bg_flags = cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-   free = EXT4_INODES_PER_GROUP(sb);



I guess a comment as below would help to explain why are we not decrementing
bg_itable_unused by 1.


/* When marking the block group with ~EXT4_BG_INODE_UNINIT we don't want to 
depend
* on the value of bg_itable_unsed even though mke2fs could have initialized the 
* same for us. Instead we calculated the value below

*/



+   free = 0;
} else {
free = EXT4_INODES_PER_GROUP(sb) -
le16_to_cpu(gdp-bg_itable_unused);
}

+   /*
+* Check the relative inode number against the last used
+* relative inode number in this group. if it is greater
+* we need to  update the bg_itable_unused count
+*
+*/
if (ino  free)
gdp-bg_itable_unused =
cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
diff -puN fs/ext4/super.c~ext4-uninitialized-block-groups-fix fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-uninitialized-block-groups-fix
+++ a/fs/ext4/super.c
@@ -37,6 +37,7 @@
 #include linux/quotaops.h
 #include linux/seq_file.h
 #include linux/log2.h
+#include linux/crc16.h

 #include asm/uaccess.h

@@ -1337,56 +1338,6 @@ static int ext4_setup_super(struct super
return res;
 }

-#if !defined(CONFIG_CRC16)
-/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
-__u16 const crc16_table[256] = {
-   0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
-   0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440,
-   0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40,
-   0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841,
-   0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40,
-   0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41,
-   0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0, 0x1680, 0xD641,
-   0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1, 0xD081, 0x1040,
-   0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1, 0xF281, 0x3240,
-   0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0, 0x3480, 0xF441,
-   0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0, 0x3E80, 0xFE41,
-   0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1, 0xF881, 0x3840,
-   0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0, 0x2A80, 0xEA41,
-   0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1, 0xEC81, 0x2C40,
-   0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1, 0xE681, 0x2640,
-   0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0, 0x2080, 0xE041,
-