Re: Ext2/3 block remapping tool

2007-05-01 Thread Andreas Dilger
On Apr 30, 2007  08:09 -0400, Theodore Tso wrote:
 On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
  I'd prefer that such functionality be integrated with Takashi's online
  defrag tool, since it needs virtually the same functionality.  For that
  matter, this is also very similar to the block-mapped - extents tool
  from Aneesh.  It doesn't make sense to have so many separate tools for
  users, especially if they start interfering with each other (i.e. defrag
  undoes the remapping done by your tool).
 
 While we're at it, someone want to start thinking about on-line
 shrinking of ext4 filesystems?  Again, the same block remapping
 interfaces for defrag and file access optimizations should also be
 useful for shrinking filesystems (even if some of the files that need
 to be relocated are being actively used).  If not, that probably means
 we got the interface wrong.

Except one other issue with online shrinking is that we need to move
inodes on occasion and this poses a bunch of other problems over just
remapping the data blocks.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 0/18] export operations rewrite

2007-05-01 Thread Christoph Hellwig
On Fri, Mar 30, 2007 at 01:34:53PM +1000, Neil Brown wrote:
 On Saturday March 17, [EMAIL PROTECTED] wrote:
 
 less that 2 weeks later

more than one month later :)

 My only question involves motivation.
 
   You say less complex, but to me it just looks different - though
   being very familiar with the original, that might be a biased view.
   Can you say more about how it is less complex?
   Maybe the extension to generic 64bit support will make that clear...
 
   But then generic 64bit support should just be an independent set of
   helper functions that can be plugged in to the export_operations
   structure.
 
 For what it does, the code looks fine, and I can see some definite
 improvements here and there, but I'm not clear on the over-all
 motivation.

When looking at suppor for 64bit inodes I stumbled over a few things
I really dislike in the current code:

 - I really hate the current function pointer indirection for
   find_exported_dentry.  Using the same method table for communication
   in two different ways just feels very bad to me.
 - passing the fid completely untyped to filesystem seem too error
   prone to me, and is rather contrary to how we do things elsewhere
   in the kernel.
 - the calling conversion on the decode side where we first call
   -decode_fh to split the filehandle into two blobs only to
   recurse back into exportfs and then recurse back into the filesystem
   seems rather odd.  By having two methods to get the dentry and
   parent directly from the on the wire file handle this big callstack
   collapses to a very simple one.

So yes, we probably could do 64bit inode filehandles with the old scheme
(and in fact xfs, ocfs2 and gfs2 already have support for it), but revamping
the layering will make it quite a bit cleaner.  I'll doug up those patches
again, but on the decode side 64bit inode support is really trivial
after these, it's just another case statement in the generic routines.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


file capabilities and security_task_wait failure Re: 2.6.22 -mm merge plans

2007-05-01 Thread Stephen Smalley
On Mon, 2007-04-30 at 16:20 -0700, Andrew Morton wrote:
  implement-file-posix-capabilities.patch
  file-capabilities-accomodate-future-64-bit-caps.patch
  return-eperm-not-echild-on-security_task_wait-failure.patch
 
 I think we're still waiting for the security guys to work out what to do with
 this work.

return-eperm-not-echild-on-security_task_wait-failure.patch should be
merged - it is effectively a bug fix.

On the file capabilities support, have any of the filesystem folks
(cc'd) looked at the code yet?

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread David Chinner
On Mon, Apr 30, 2007 at 09:39:06PM -0700, Nicholas Miell wrote:
 On Tue, 2007-05-01 at 14:22 +1000, David Chinner wrote:
  On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:
   This is actually for future use.  Any flags that are added into this
   range must be understood by both sides or it should be considered an
   error.  Flags outside the FIEMAP_FLAG_INCOMPAT do not necessarily need
   to be supported.  If it turns out that 8 bits is too small a range for
   INCOMPAT flags, then we can make 0x0100 an incompat flag that means
   e.g. 0x00ff are also incompat flags also.
  
  Ah, ok. So it's not really a set of compatibility flags, it's more a
  compulsory set. Under those terms, i don't really see why this is
  necessary - either the filesystem will understand the flags or it will
  return EINVAL or ignore them...
  
   I'm assuming that all flags that will be in the original FIEMAP proposal
   will be understood by the implementations.  Most filesystems can safely
   ignore FLAG_HSM_READ, for example, since they don't support HSM, and for
   that matter FLAG_SYNC is probably moot for most filesystems also because
   they do block allocation at preprw time.
  
  Exactly my point - so why do we really need to encode a compulsory set of
  
 
 Because flags have meaning, independent of whether or not the filesystem
 understands them. And if the filesystem chooses to ignore critically
 important flags (instead of returning EINVAL), bad things may happen.
 
 So, either the filesystem will understand the flag or iff the unknown flag
 is in the incompat set, it will return EINVAL or else the unknown flag will
 be safely ignored.

My point was that there is a difference between specification and
implementation - if the specification says something is compulsory,
then they must be implemented in the filesystem. This is easy
enough to ensure by code review - we don't need additional interface
complexity for this

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-05-01 Thread Theodore Tso
On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote:
 Except one other issue with online shrinking is that we need to move
 inodes on occasion and this poses a bunch of other problems over just
 remapping the data blocks.

Well, I did say necessary, and not sufficient.  But yes, moving
inodes, especially if the inode is currently open gets interesting.  I
don't think there are that many user space applications that would
notice or care if the st_ino of an open file changed out from under
them, but there are obviously userspace applications, such as tar,
that would most definitely care.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-05-01 Thread Valerie Henson
On Fri, Apr 27, 2007 at 11:06:47AM -0400, Jeff Dike wrote:
 On Thu, Apr 26, 2007 at 09:58:25PM -0700, Valerie Henson wrote:
  Here's an example, spelled out:
  
  Allocate file 1 in chunk A.
  Grow file 1.
  Chunk A fills up.
  Allocate continuation inode for file 1 in chunk B.
  Chunk A gets some free space.
  Chunk B fills up.
  Pick chunk A for allocating next block of file 1.
  Try to look up a continuation inode for file 1 in chunk A.
  Continuation inode for file 1 found in chunk A!
  Attach newly allocated block to existing inode for file 1 in chunk A.
 
 So far, so good (and the slides are helpful, tx!).  What happens when
 file 1 keeps growing and chunk A fills up (and chunk B is still full)?
 Can the same continuation inode also point at chunk C, where the file
 is going to grow to?

You allocate a new continuation inode in chunk C.  The rule is that
only inodes inside a chunk can point to blocks inside the chunk, so
you need an inode in C if you want to allocate blocks from C.

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] AFS: Make the match_*() functions take const options

2007-05-01 Thread David Howells
Make the match_*() functions take a const pointer to the options table and
make strings pointers in the options table const too.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/parser.h |8 
 lib/parser.c   |   10 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/parser.h b/include/linux/parser.h
index fa33328..86676f6 100644
--- a/include/linux/parser.h
+++ b/include/linux/parser.h
@@ -11,10 +11,10 @@
 /* associates an integer enumerator with a pattern string. */
 struct match_token {
int token;
-   char *pattern;
+   const char *pattern;
 };
 
-typedef struct match_token match_table_t[];
+typedef const struct match_token match_table_t[];
 
 /* Maximum number of arguments that match_token will find in a pattern */
 enum {MAX_OPT_ARGS = 3};
@@ -29,5 +29,5 @@ int match_token(char *, match_table_t table, substring_t 
args[]);
 int match_int(substring_t *, int *result);
 int match_octal(substring_t *, int *result);
 int match_hex(substring_t *, int *result);
-void match_strcpy(char *, substring_t *);
-char *match_strdup(substring_t *);
+void match_strcpy(char *, const substring_t *);
+char *match_strdup(const substring_t *);
diff --git a/lib/parser.c b/lib/parser.c
index 7ad2a48..703c8c1 100644
--- a/lib/parser.c
+++ b/lib/parser.c
@@ -22,7 +22,7 @@
  * match extremely simple token=arg style patterns. If the pattern is found,
  * the location(s) of the arguments will be returned in the @args array.
  */
-static int match_one(char *s, char *p, substring_t args[])
+static int match_one(char *s, const char *p, substring_t args[])
 {
char *meta;
int argc = 0;
@@ -43,7 +43,7 @@ static int match_one(char *s, char *p, substring_t args[])
p = meta + 1;
 
if (isdigit(*p))
-   len = simple_strtoul(p, p, 10);
+   len = simple_strtoul(p, (char **) p, 10);
else if (*p == '%') {
if (*s++ != '%')
return 0;
@@ -102,7 +102,7 @@ static int match_one(char *s, char *p, substring_t args[])
  */
 int match_token(char *s, match_table_t table, substring_t args[])
 {
-   struct match_token *p;
+   const struct match_token *p;
 
for (p = table; !match_one(s, p-pattern, args) ; p++)
;
@@ -190,7 +190,7 @@ int match_hex(substring_t *s, int *result)
  * substring_t @s to the c-style string @to. Caller guarantees that @to is
  * large enough to hold the characters of @s.
  */
-void match_strcpy(char *to, substring_t *s)
+void match_strcpy(char *to, const substring_t *s)
 {
memcpy(to, s-from, s-to - s-from);
to[s-to - s-from] = '\0';
@@ -204,7 +204,7 @@ void match_strcpy(char *to, substring_t *s)
  * the substring_t @s. The caller is responsible for freeing the returned
  * string with kfree().
  */
-char *match_strdup(substring_t *s)
+char *match_strdup(const substring_t *s)
 {
char *p = kmalloc(s-to - s-from + 1, GFP_KERNEL);
if (p)

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] AFS/AF_RXRPC: Miscellaneous fixes

2007-05-01 Thread David Howells
Make miscellaneous fixes to AFS and AF_RXRPC:

 (*) Make AF_RXRPC select KEYS rather than RXKAD or AFS_FS in Kconfig.

 (*) Don't use FS_BINARY_MOUNTDATA.

 (*) Remove a done 'TODO' item in a comemnt on afs_get_sb().

 (*) Don't pass a void * as the page pointer argument of kmap_atomic() as this
 breaks on m68k.  Patch from Geert Uytterhoeven [EMAIL PROTECTED].

 (*) Use match_*() functions rather than doing my own parsing.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig|1 -
 fs/afs/fsclient.c |3 +-
 fs/afs/super.c|  100 +++--
 net/rxrpc/Kconfig |3 +-
 4 files changed, 47 insertions(+), 60 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index a42f767..e33c089 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -2020,7 +2020,6 @@ config AFS_FS
tristate Andrew File System support (AFS) (EXPERIMENTAL)
depends on INET  EXPERIMENTAL
select AF_RXRPC
-   select KEYS
help
  If you say Y here, you will get an experimental Andrew File System
  driver. It currently only supports unsecured read-only AFS access.
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 2393d2a..e54e6c2 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -266,7 +266,8 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call,
call-unmarshall++;
 
if (call-count  PAGE_SIZE) {
-   buffer = kmap_atomic(call-reply3, KM_USER0);
+   page = call-reply3;
+   buffer = kmap_atomic(page, KM_USER0);
memset(buffer + PAGE_SIZE - call-count, 0,
   call-count);
kunmap_atomic(buffer, KM_USER0);
diff --git a/fs/afs/super.c b/fs/afs/super.c
index cebd03c..41173f8 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -20,6 +20,7 @@
 #include linux/slab.h
 #include linux/fs.h
 #include linux/pagemap.h
+#include linux/parser.h
 #include internal.h
 
 #define AFS_FS_MAGIC 0x6B414653 /* 'kAFS' */
@@ -42,7 +43,7 @@ struct file_system_type afs_fs_type = {
.name   = afs,
.get_sb = afs_get_sb,
.kill_sb= kill_anon_super,
-   .fs_flags   = FS_BINARY_MOUNTDATA,
+   .fs_flags   = 0,
 };
 
 static const struct super_operations afs_super_ops = {
@@ -58,6 +59,20 @@ static const struct super_operations afs_super_ops = {
 static struct kmem_cache *afs_inode_cachep;
 static atomic_t afs_count_active_inodes;
 
+enum {
+   afs_no_opt,
+   afs_opt_cell,
+   afs_opt_rwpath,
+   afs_opt_vol,
+};
+
+static const match_table_t afs_options_list = {
+   { afs_opt_cell, cell=%s   },
+   { afs_opt_rwpath,   rwpath},
+   { afs_opt_vol,  vol=%s},
+   { afs_no_opt,   NULL},
+};
+
 /*
  * initialise the filesystem
  */
@@ -115,31 +130,6 @@ void __exit afs_fs_exit(void)
 }
 
 /*
- * check that an argument has a value
- */
-static int want_arg(char **_value, const char *option)
-{
-   if (!_value || !*_value || !**_value) {
-   printk(KERN_NOTICE kAFS: %s: argument missing\n, option);
-   return 0;
-   }
-   return 1;
-}
-
-/*
- * check that there's no subsequent value
- */
-static int want_no_value(char *const *_value, const char *option)
-{
-   if (*_value  **_value) {
-   printk(KERN_NOTICE kAFS: %s: Invalid argument: %s\n,
-  option, *_value);
-   return 0;
-   }
-   return 1;
-}
-
-/*
  * parse the mount options
  * - this function has been shamelessly adapted from the ext3 fs which
  *   shamelessly adapted it from the msdos fs
@@ -148,48 +138,46 @@ static int afs_parse_options(struct afs_mount_params 
*params,
 char *options, const char **devname)
 {
struct afs_cell *cell;
-   char *key, *value;
-   int ret;
+   substring_t args[MAX_OPT_ARGS];
+   char *p;
+   int token;
 
_enter(%s, options);
 
options[PAGE_SIZE - 1] = 0;
 
-   ret = 0;
-   while ((key = strsep(options, ,))) {
-   value = strchr(key, '=');
-   if (value)
-   *value++ = 0;
-
-   _debug(kAFS: KEY: %s, VAL:%s, key, value ?: -);
+   while ((p = strsep(options, ,))) {
+   if (!*p)
+   continue;
 
-   if (strcmp(key, rwpath) == 0) {
-   if (!want_no_value(value, rwpath))
-   return -EINVAL;
-   params-rwpath = 1;
-   } else if (strcmp(key, vol) == 0) {
-   if (!want_arg(value, vol))
-   return -EINVAL;
-   *devname = value;
-   } else if (strcmp(key, cell) == 0) {
-   if (!want_arg(value, cell))
-

Re: [PATCH 2/2] AFS/AF_RXRPC: Miscellaneous fixes

2007-05-01 Thread Geert Uytterhoeven
Hi David,

I've just noticed another issue: if CONFIG_AFS_FS=y, the kernel build fails
with

| `afs_callback_update_kill' referenced in section `.init.text' of 
fs/built-in.o: defined in discarded section `.exit.text' of fs/built-in.o
| `afs_vlocation_purge' referenced in section `.init.text' of fs/built-in.o: 
defined in discarded section `.exit.text' of fs/built-in.o

and indeed, afs_init() calls both afs_callback_update_kill() and
afs_vlocation_purge() in the failure path.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] AFS/AF_RXRPC: Miscellaneous fixes

2007-05-01 Thread David Howells
Geert Uytterhoeven [EMAIL PROTECTED] wrote:

 I've just noticed another issue: if CONFIG_AFS_FS=y, the kernel build fails
 with

Can you send me the config you're using please?

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] AFS: Fix use of __exit functions from __init path

2007-05-01 Thread David Howells
Fix use of __exit functions from __init path.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/callback.c  |2 +-
 fs/afs/internal.h  |4 ++--
 fs/afs/vlocation.c |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 639399f..9bdbf36 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -468,7 +468,7 @@ int __init afs_callback_update_init(void)
 /*
  * shut down the callback update process
  */
-void __exit afs_callback_update_kill(void)
+void afs_callback_update_kill(void)
 {
destroy_workqueue(afs_callback_update_worker);
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index c0404b4..f1bd47e 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -401,7 +401,7 @@ extern void afs_give_up_callback(struct afs_vnode *);
 extern void afs_dispatch_give_up_callbacks(struct work_struct *);
 extern void afs_flush_callback_breaks(struct afs_server *);
 extern int __init afs_callback_update_init(void);
-extern void __exit afs_callback_update_kill(void);
+extern void afs_callback_update_kill(void);
 
 /*
  * cell.c
@@ -605,7 +605,7 @@ extern struct afs_vlocation *afs_vlocation_lookup(struct 
afs_cell *,
  struct key *,
  const char *, size_t);
 extern void afs_put_vlocation(struct afs_vlocation *);
-extern void __exit afs_vlocation_purge(void);
+extern void afs_vlocation_purge(void);
 
 /*
  * vnode.c
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 6c8e95a..3370cdb 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -602,7 +602,7 @@ int __init afs_vlocation_update_init(void)
 /*
  * discard all the volume location records for rmmod
  */
-void __exit afs_vlocation_purge(void)
+void afs_vlocation_purge(void)
 {
afs_vlocation_timeout = 0;
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AFS: Fix use of __exit functions from __init path

2007-05-01 Thread Geert Uytterhoeven
On Tue, 1 May 2007, David Howells wrote:
 Fix use of __exit functions from __init path.
 
 Signed-off-by: David Howells [EMAIL PROTECTED]

Acked-by: Geert Uytterhoeven [EMAIL PROTECTED]

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread Anton Altaparmakov

On 1 May 2007, at 05:22, David Chinner wrote:

On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:

  The FIBMAP ioctl is for privileged users
  only, and I wonder if FIEMAP should be the same, or at least  
disallow
  mapping files that the user can't access especially with  
FLAG_SYNC and/or

  FLAG_HSM_READ.


I see little reason for restricting FI[BE]MAP to privileged users -
anyone should be able to determine if files they have permission to
access are fragmented.


Allowing anyone to run FI[BE]MAP creates potential for DOS-ing the  
machine.  Perhaps for non-privileged users FIEMAP has to be read- 
only?  As soon as any of the FLAG_* flags come into play you make it  
privileged.  For example fancy any user being able to fill up your  
file system by calling FIEMAP with FLAG_HSM_READ on all files  
recursively?  This should certainly not be simply dismissed as a non- 
issue without thinking about it first...


Best regards,

Anton
--
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread Anton Altaparmakov

On 1 May 2007, at 15:20, David Chinner wrote:

On Mon, Apr 30, 2007 at 09:39:06PM -0700, Nicholas Miell wrote:

On Tue, 2007-05-01 at 14:22 +1000, David Chinner wrote:

On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:
This is actually for future use.  Any flags that are added into  
this
range must be understood by both sides or it should be  
considered an
error.  Flags outside the FIEMAP_FLAG_INCOMPAT do not  
necessarily need
to be supported.  If it turns out that 8 bits is too small a  
range for
INCOMPAT flags, then we can make 0x0100 an incompat flag  
that means

e.g. 0x00ff are also incompat flags also.


Ah, ok. So it's not really a set of compatibility flags, it's  
more a

compulsory set. Under those terms, i don't really see why this is
necessary - either the filesystem will understand the flags or it  
will

return EINVAL or ignore them...

I'm assuming that all flags that will be in the original FIEMAP  
proposal
will be understood by the implementations.  Most filesystems can  
safely
ignore FLAG_HSM_READ, for example, since they don't support HSM,  
and for
that matter FLAG_SYNC is probably moot for most filesystems also  
because

they do block allocation at preprw time.


Exactly my point - so why do we really need to encode a  
compulsory set of


Because flags have meaning, independent of whether or not the  
filesystem

understands them. And if the filesystem chooses to ignore critically
important flags (instead of returning EINVAL), bad things may happen.

So, either the filesystem will understand the flag or iff the  
unknown flag
is in the incompat set, it will return EINVAL or else the unknown  
flag will

be safely ignored.


My point was that there is a difference between specification and
implementation - if the specification says something is compulsory,
then they must be implemented in the filesystem. This is easy
enough to ensure by code review - we don't need additional interface
complexity for this


You are wrong about this because you are missing the point that you  
have no code to review.  The users that will use those flags are  
going to be applications that run in user space.  Chances are you  
will never see their code.  Heck, they might not even be open source  
applications...  And all applications will run against a multitude of  
kernels.  So version X of the application will run on kernel 2.4.*,  
2.6.*, a.b.*, etc...  For future expandability of the interface I  
think it is important to have both compulsory and non-compulsory flags.


For example there is no reason why FIEMAP_HSM_READ needs to be  
compulsory.  Most filesystems do not support HSM so can safely ignore  
it.  And applications that want to read/write the data locations that  
are obtained with the FIEMAP call will likely always supply  
FIEMAP_HSM_READ because they want to ensure the file is brought in if  
it is off line so they definitely want file systems that do not  
support this flag to ignore it.


And vice versa, an application might specify some weird and funky yet  
to be developed feature that it expects the FS to perform and if the  
FS cannot do it (either because it does not support it or because it  
failed to perform the operation) the application expects the FS to  
return an error and not to ignore the flag.  An example could be the  
asked for FIEMAP_XATTR_FORK flag.  If that is implemented, and the FS  
ignores it it will return the extent map for the file data instead of  
the XATTR_FORK!  Not what the application wanted at all.  Ouch!  So  
this is definitely a compulsory flag if I ever saw one.


So as you see you must support both voluntary and compulsory flags...

Also consider what I said above about different kernels.  A new  
feature is implemented in kernel 2.8.13 say that was not there before  
and an application is updated to use that feature.  There will be  
lots of instances where that application will still be run on older  
kernels where this feature does not exist.  Depending on the feature  
it may be quite sensible to simply ignore in the kernel that the  
application set an unknown flag whilst for a different feature it may  
be the opposite.


Best regards,

Anton
--
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-05-01 Thread Andreas Dilger
On May 01, 2007  11:28 -0400, Theodore Tso wrote:
 On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote:
  Except one other issue with online shrinking is that we need to move
  inodes on occasion and this poses a bunch of other problems over just
  remapping the data blocks.
 
 Well, I did say necessary, and not sufficient.  But yes, moving
 inodes, especially if the inode is currently open gets interesting.  I
 don't think there are that many user space applications that would
 notice or care if the st_ino of an open file changed out from under
 them, but there are obviously userspace applications, such as tar,
 that would most definitely care.

I think rm -r does a LOT of this kind of operation, like:

stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.)

I think find does the same to avoid security problems with malicious
path manipulation.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-05-01 Thread Theodore Tso
On Tue, May 01, 2007 at 12:52:49PM -0600, Andreas Dilger wrote:
 I think rm -r does a LOT of this kind of operation, like:
 
 stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.)
 
 I think find does the same to avoid security problems with malicious
 path manipulation.

Yep, so if you're doing an rm -rf (or any other recursive descent)
while we're doing an on-line shrink, it's going to fail.  I suppose we
could have an in-core inode mapping table that would continue to remap
inode numbers until the next reboot.  I'm not sure we would want to
keep the inode remapping indefinitely, although if we don't it could
also end up screwing up NFS as well.  Not sure I care, though.  :-)

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread Andreas Dilger
On May 01, 2007  14:22 +1000, David Chinner wrote:
 On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:
  Hmm, I'd thought offline would migrate to EXTENT_UNKNOWN, but I didn't
 
 I disagree - why would you want to indicate the state is unknown when we know
 very well that it is offline?

If you don't like UNKNOWN, what about UNMAPPED?  I just want a
catch-all flag that indicates this extent contains data but there is
nothing sensible to be returned for the extent mapping.

 Effectively, when your extent is offline in the HSM, it is inaccessable, and
 you have to bring it back from tape so it becomes accessible again. i.e. some
 action is necessary on behalf of the user to make it accessible. So I think
 that OFFLINE is a good name for this state because it really is inaccessible.

What you are calling OFFLINE I would prefer to call UNMAPPED, since that
can be used by applications as a catch-all for no mapping.  There can
be further flags that give refinements to UNMAPPED that some applications
might care about them (e.g. HSM_RESIDENT), but many users/apps will not
if they just want the number of fragments in a given file.

 Also, I don't think secondary is a good term because most large systems
 have more than one tier of storage. One possibility is HSM_RESIDENT
 which indicates the extent is current and resident with a HSM's archive

Sure.

  Can you propose reasonable flag names for these (I can't think of anything
  very good) and a clear explanation of what they mean.  I suspect it will
  only be XFS that uses them initially.  In mke2fs and ext4+mballoc there is
  the concept of stripe unit and stripe width, but as yet they are not
  communicated between the two very well.  I'd be much happier if this info
  could be queried in a standard way from the block layer instead of the
  user having to specify it and the filesystem having to track it.
 
 My preference is definitely for a separate ioctl to grab the
 filesystem geometry so this stuff can be calculated in userspace.
 i.e. the way XFS does it right now (XFS_IOC_FSGEOMETRY). I won't
 bother trying to define names until we decide which appraoch we take
 to implement this.

Hmm, previously you wrote This information could be easily passed up in the
flags fields if the filesystem has geometry information.  So, I _think_
what you are saying is that you want 4 flags to convey this start/end
alignment information, but the exact semantics of what a stripe unit and
a stripe width is filesystem specific?

I definitely do NOT want to get into any issues of querying the block
device geometry here.  I was just making a passing comment that ext4+mballoc
can already do RAID-specific allocation alignment, but it depends on the
admin to specify this information and it would be nice if there was some
easy way to get this from userspace/kernel interfaces.

Having an API that can request tell me the number of blocks from this
offset until the next physical disk boundary or similar would be useful
to any allocator, and the block layer already needs to know this when
submitting IO.

 In XFS, mkfs.xfs does the work of getting this information
 to see in the filesystem superblock. Here's the code for getting
 sunit/swidth from the underlying block device:
 
 http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsprogs/libdisk/
 
 Not much in common there ;)

It looks like this might be just what e2fsprogs needs also.

  It does make sense to specify zero for the fm_extent_count array and a
  new FIEMAP_FLAG_NO_EXTENTS to return only the count of extents and not the
  extent data itself, for the non-verbose mode of filefrag, and for
  pre-allocating a buffer large enough to hold the file if that is important.
 
 Rather than rely on implicit behaviour of pass in extent count of
 zero and a don't try to return any extents to return the number of
 extents on the file, why not just explicitly define this as a valid
 input flag? i.e. FIEMAP_FLAG_GET_NUMEXTENTS

That's what I said, isn't it?  FIEMAP_FLAG_NO_EXTENTS.  I wonder if my
clever-clever for return no extents and return number of extents
is wasted :-/.

  - does XFS return an extent for the metadata parts of the file (e.g. btree)?
 
 No, but we can return the extent map for the attribute fork (i.e.
 extended attrs) if asked for (XFS_IOC_GETBMAPA).

This seems like it would be a useful addition to the interface also, having
FIEMAP_FLAG_METADATA request the return of metadata allocations too.

  - does XFS return preallocated extents beyond EOF?
 
 Yes - they are part of the extent map for the file.

OK.

  - does XFS allow non-root users to call xfs_bmap on files they don't own, or
use by non-root users at all?
 
 Users can run xfs_bmap on any file they have permission to
 open(O_RDONLY).
 
The FIBMAP ioctl is for privileged users
only, and I wonder if FIEMAP should be the same, or at least disallow
mapping files that the user can't access especially with FLAG_SYNC and/or
FLAG_HSM_READ.
 

Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread Andreas Dilger
On May 02, 2007  00:20 +1000, David Chinner wrote:
 My point was that there is a difference between specification and
 implementation - if the specification says something is compulsory,
 then they must be implemented in the filesystem. This is easy
 enough to ensure by code review - we don't need additional interface
 complexity for this

What you seem to be missing about my proposal is that the FLAG_INCOMPAT
is for future use by that part of the specification we haven't thought
of yet...  Having COMPAT/INCOMPAT flags has been very useful for ext2/3/4,
and is much better than having version numbers for the interface.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread David Chinner
On Tue, May 01, 2007 at 07:37:20PM +0100, Anton Altaparmakov wrote:
 On 1 May 2007, at 05:22, David Chinner wrote:
 On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:
   The FIBMAP ioctl is for privileged users
   only, and I wonder if FIEMAP should be the same, or at least  
 disallow
   mapping files that the user can't access especially with  
 FLAG_SYNC and/or
   FLAG_HSM_READ.
 
 I see little reason for restricting FI[BE]MAP to privileged users -
 anyone should be able to determine if files they have permission to
 access are fragmented.
 
 Allowing anyone to run FI[BE]MAP creates potential for DOS-ing the  
 machine.  Perhaps for non-privileged users FIEMAP has to be read- 
 only?  As soon as any of the FLAG_* flags come into play you make it  
 privileged.  For example fancy any user being able to fill up your  
 file system by calling FIEMAP with FLAG_HSM_READ on all files  
 recursively?

By that reasoning, users should not be allowed to recall any files
without root privileges. HSMs don't work that way, though - any user
is allowed to recall any files they have permission to access either
by manual command or by trying to read the file daata.

If that runs the filesytem out of space, then the HSM either hasn't
been configured properly or it's failed to manage the space
correctly. Either way, that's not the fault of the user for
recalling their own files.

Hence allowing FIEMAP to be executed by the user does not open up
any DOS conditions that don't already exist in normal HSM-managed
filesystem.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

2007-05-01 Thread David Chinner
On Tue, May 01, 2007 at 03:30:40PM -0700, Andreas Dilger wrote:
 On May 01, 2007  14:22 +1000, David Chinner wrote:
  On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:
   Hmm, I'd thought offline would migrate to EXTENT_UNKNOWN, but I didn't
  
  I disagree - why would you want to indicate the state is unknown when we 
  know
  very well that it is offline?
 
 If you don't like UNKNOWN, what about UNMAPPED?  I just want a
 catch-all flag that indicates this extent contains data but there is
 nothing sensible to be returned for the extent mapping.

Yes, I like that much more. Good suggestion. ;)

  Effectively, when your extent is offline in the HSM, it is inaccessable, and
  you have to bring it back from tape so it becomes accessible again. i.e. 
  some
  action is necessary on behalf of the user to make it accessible. So I think
  that OFFLINE is a good name for this state because it really is 
  inaccessible.
 
 What you are calling OFFLINE I would prefer to call UNMAPPED, since that
 can be used by applications as a catch-all for no mapping.  There can
 be further flags that give refinements to UNMAPPED that some applications
 might care about them (e.g. HSM_RESIDENT), but many users/apps will not
 if they just want the number of fragments in a given file.

Agreed - UNMAPPED does make a lot more sense in this case.

   Can you propose reasonable flag names for these (I can't think of anything
   very good) and a clear explanation of what they mean.  I suspect it will
   only be XFS that uses them initially.  In mke2fs and ext4+mballoc there is
   the concept of stripe unit and stripe width, but as yet they are not
   communicated between the two very well.  I'd be much happier if this info
   could be queried in a standard way from the block layer instead of the
   user having to specify it and the filesystem having to track it.
  
  My preference is definitely for a separate ioctl to grab the
  filesystem geometry so this stuff can be calculated in userspace.
  i.e. the way XFS does it right now (XFS_IOC_FSGEOMETRY). I won't
  bother trying to define names until we decide which appraoch we take
  to implement this.
 
 Hmm, previously you wrote This information could be easily passed up in the
 flags fields if the filesystem has geometry information.  So, I _think_
 what you are saying is that you want 4 flags to convey this start/end
 alignment information, but the exact semantics of what a stripe unit and
 a stripe width is filesystem specific?

Right.

 I definitely do NOT want to get into any issues of querying the block
 device geometry here.  I was just making a passing comment that ext4+mballoc
 can already do RAID-specific allocation alignment, but it depends on the
 admin to specify this information and it would be nice if there was some
 easy way to get this from userspace/kernel interfaces.
 
 Having an API that can request tell me the number of blocks from this
 offset until the next physical disk boundary or similar would be useful
 to any allocator, and the block layer already needs to know this when
 submitting IO.

The block layer knows this once you get inside the volume manager. I
think the issue is that there is no common export interface for this
information.

  In XFS, mkfs.xfs does the work of getting this information
  to see in the filesystem superblock. Here's the code for getting
  sunit/swidth from the underlying block device:
  
  http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsprogs/libdisk/
  
  Not much in common there ;)
 
 It looks like this might be just what e2fsprogs needs also.

More than likely.

   It does make sense to specify zero for the fm_extent_count array and a
   new FIEMAP_FLAG_NO_EXTENTS to return only the count of extents and not the
   extent data itself, for the non-verbose mode of filefrag, and for
   pre-allocating a buffer large enough to hold the file if that is 
   important.
  
  Rather than rely on implicit behaviour of pass in extent count of
  zero and a don't try to return any extents to return the number of
  extents on the file, why not just explicitly define this as a valid
  input flag? i.e. FIEMAP_FLAG_GET_NUMEXTENTS
 
 That's what I said, isn't it?  FIEMAP_FLAG_NO_EXTENTS.  I wonder if my
 clever-clever for return no extents and return number of extents
 is wasted :-/.

Too clever for an API, I think. ;)

My point is mainly that if you are going to use an API for a
specific function (e.g. query the number of extents) I think that
the API should have an obvious method for executing that specific
function. Using a command of get no extents to provide the query
of how many extents in this file is kind of obscure. When you read
the code it doesn't make a lot of sense, as opposed to seeing a
clear statement of intent from the code itself.

i.e. FIEMAP_FLAG_GET_NUMEXTENTS is self-documenting in both the API
and the code that uses it...

   - does XFS return an extent for the metadata parts of the file (e.g. 
   btree)?
  
  

Re: [PATCH] Implement renaming for debugfs

2007-05-01 Thread Greg KH
On Mon, Apr 30, 2007 at 07:55:36PM +0200, Jan Kara wrote:
   Hello,
 
   attached patch implements renaming for debugfs. I was asked for this
 feature by WLAN guys and I guess it makes sence (they have some debug info
 in the directory identified by interface name and that can change...).
 Could someone have a look at what I wrote whether it looks reasonable?
 Thanks.
 
   Honza
 
 -- 
 Jan Kara [EMAIL PROTECTED]
 SuSE CR Labs

 Implement debugfs_rename() to allow renaming files/directories in debugfs.

I think you are going to need more infrastructure here, the caller
doesn't want to have to allocate a new dentry themselves, they just want
to pass in the new filename :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html