Re: [RFC] [PATCH 3/3] Recursive mtime for ext3

2007-11-08 Thread Jan Kara
On Wed 07-11-07 19:20:38, Theodore Tso wrote:
 On Wed, Nov 07, 2007 at 03:36:05PM +0100, Jan Kara wrote:
   What if more than one application wants to use this facility?
 
That should be fine - let's see: Each application keeps somewhere a time 
  when
  it started a scan of a subtree (or it can actually remember a time when it
  set the flag for each directory), during the scan, it sets the flag on
  each directory. When it wakes up to recheck the subtree it just compares
  the rtime against the stored time - if rtime is greater, subtree has been
  modified since the last scan and we recurse in it and when we are finished
  with it we set the flag. Now notice that we don't care about the flag when
  we check for changes - we care only for rtime - so if there are several
  applications interested in the same subtree, the flag just gets set more
  often and thus the update of rtime happens more often but the same scheme
  still works fine.
 
 OK, so in this case you don't need to set rtime on the every single
 file inode, but only directory inode, right?  Because you're only
  Yes, that's actually what I'm doing - sorry if I didn't make it clear
earlier.

 using checking the rtime at the directory level, and not the flag.
 And it's just as easy for you to check the rtime flag for the file's
 containing directory (modulo magic vis-a-vis hard links) as the file's
 inode.
  Exactly.

 I'm just really wishing that rtime and the rtime flag didn't have live
 on disk, but could rather be in memory.  If you only needed to save
 the directory flags and rtimes, that might actually be doable.
  I already gave some thought to this but there seemed to be some
drawbacks. Query I want to support is: given a directory, tell me which of
its subdirectories (arbitrarily deep below) have been modified since time
T.  That is what you need to support faster rsync, updatedb and similar
loads.  Also I want to allow a reboot to happen inbetween the modification
and a query (handling a crash correctly would be nice too but honestly my
current implementation is not completely reliable in this regard either) so
some pernament storage is needed in any case. What I can imagine we could
do is to report all modifications to userspace - that has a problem that
there are *many* possible modifications but we are interested only whether
there happened some since time T. We could improve this by an in-memory
inode flag I'm not interested in modifications any further and reporting
the change only if the parent directory does not have this flag set (note
that this flag gets lost when we evict the inode from memory). But I would
say that in the end all this message passing, climbing the tree from
userspace and maintaining data structure in memory and on disk would cost
use more than the current implementation... Also it has the disadvantage
that we miss the modifications which happen before we start the userspace
daemon catching the events.
  Doing this in kernel memory has a problem how to solve the persistency
across reboots (dumping mod's to userspace on request?) and also on my
system you'd have roughly a few MB of pinned memory for these purposes...
Plausible but I don't really like it...

 Note by the way that since you need to own the file/directory to set
 flags, this means that only programs that are running as root or
 running as the uid who owns the entire subtree will be able to use
 this scheme.  One advantage of doing in kernel memory is that you
 might be able to support watching a tree that is not owned by the
 watcher.
  Yes, that is the advantage. On the other hand we could allow setting that
particular flag even without being an owner of the inode. In fact, I
don't currently see use case where you won't be either root (rsync,
updatedb) or an owner of the files (watching config file trees) but I guess
people would find some :).

I don't get it here - you need to scan the whole subtree and set the flag
  only during the initial scan. Later, you need to scan and set the flag only
  for directories in whose subtree something changed. Similarty rtime needs
  to be updated for each inode at most once after the scan. 
 
 OK, so in the worst case every single file in a kernel source tree
 might change after doing an extreme git checkout.  That means around
 36k of files get updated.  So if you have to set/clear the rtime flag
 during the checkout process 36k file inodes would have to have their
 rtime flag cleared, plus 2k worth of directory inodes; but those would
 probably be folded into other changes made to the inodes anyway.  But
  Yes, here the impact is hardly measurable as I've written in the previous
email.

 then when trackerd goes back and scans the subtree, if you are
 actually setting rtime flags for every single file inode, then that's
 38k of indoes that need updating.  If you only need to set the rtime
 flags for directories, that's only 2k worth of extra gratuitous inode
 updates.
  As I wrote above, the flag 

Re: [RFC] [PATCH 3/3] Recursive mtime for ext3

2007-11-08 Thread Theodore Tso
Ah, OK, so the two things that I didn't get from your patch
description are:

1) the rtime flag and rtime field are only set on directories
2) the intended use is not trackerd and its ilk, but rsync and updatedb,
   so it is desirable that scan/queries be persistent across reboots

But then the major hole in this scheme is still the issue of hard
links.  The rsync program is still going to have to scan the entire
subtree looking for hard links, since an inode with multiple links
into the directory tree can't guarantee that all of its parent
directories will have their rtime field updated.

A program like updatedb which only cares about filenames will be OK,
since that means it really only cares about knowing when directories
have changed, and you can't have hard links to directories.

The other problem, of course, is that this feature would become ext
2/3/4 specific, and I could see future filesystems possibly wanting
this.  So this raises the question of whether the interface should be
at the VFS layer or not --- and if so, how to handle querying whether
a particulra filesystem supports it, and what happens if you have a
subtree which is covered by a filesystem that doesn't support rtime?

So a program like rsync would need to scan /proc/self/mounts to see
whether or not it would be safe to use this feature in the first
place.  And, of course, rsync would need to know whether it has write
access to the tree in order to set flags in the directory, and what to
do if some portion of the subtree isn't writeable by rsync.

On Thu, Nov 08, 2007 at 11:56:42AM +0100, Jan Kara wrote:
  Note by the way that since you need to own the file/directory to set
  flags, this means that only programs that are running as root or
  running as the uid who owns the entire subtree will be able to use
  this scheme.  One advantage of doing in kernel memory is that you
  might be able to support watching a tree that is not owned by the
  watcher.
   Yes, that is the advantage. On the other hand we could allow setting that
 particular flag even without being an owner of the inode. In fact, I
 don't currently see use case where you won't be either root (rsync,
 updatedb) or an owner of the files (watching config file trees) but I guess
 people would find some :).

Sometimes people like to use rsync to copy a subtree to which they
have read access but not write access.  (And here note that it's not
enough to have write access, you actually need to *own* all of the
directories in the subtree).

Yes, it's safe to let any user *set* the rtime flag, but we couldn't
let them clear the rtime flag, since then they would be able to hide a
file modification from some other (potentially privileged) process.
Speaking of security, I assume your patch will never allow rtime to go
backwards (for example if the user attempts to backdate a file's mtime
field using the utime() or utimes() system call)?

I guess I'm convinced that updatedb could use this facility, but there
are enough asteriks around it that I'm not sure that rsync could
safely use this feature in production.  I don't doubt that in a cold
cache case, it would speed up rsync, but because it doesn't handle
hard links, it's not reliable.  Since rsync often gets used for
backups, this is a big deal.  There are also questions about what to
do if rsync doesn't have write access to the filesystem, or if there
is a non-rtime capable filesystem mounted in the subtree, etc., that
can be worked around, but would add a lot of complexity and grottiness
to the rsync source tree.  Is the rsync maintainer really willing to
add all of the necessary hair to support this rtime facility into
their program?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 3/3] Recursive mtime for ext3

2007-11-08 Thread Jan Kara
On Thu 08-11-07 09:37:59, Theodore Tso wrote:
 Ah, OK, so the two things that I didn't get from your patch
 description are:
 
 1) the rtime flag and rtime field are only set on directories
 2) the intended use is not trackerd and its ilk, but rsync and updatedb,
so it is desirable that scan/queries be persistent across reboots
 
 But then the major hole in this scheme is still the issue of hard
 links.  The rsync program is still going to have to scan the entire
 subtree looking for hard links, since an inode with multiple links
 into the directory tree can't guarantee that all of its parent
 directories will have their rtime field updated.
  Not really - initially rsync can scan a tree for hardlinks and remember
where they are. If a hardlink to a file is created, an rtime update is
sent up the tree via the path used to create the link. So during next scan,
rsync will see the file is modified and finds out that its nlink is  1
and adds it to the list of hardlinked files.
  So for things like regular backups hardlinks can be dealt with
efficiently.

 A program like updatedb which only cares about filenames will be OK,
 since that means it really only cares about knowing when directories
 have changed, and you can't have hard links to directories.
 
 The other problem, of course, is that this feature would become ext
 2/3/4 specific, and I could see future filesystems possibly wanting
 this.  So this raises the question of whether the interface should be
 at the VFS layer or not --- and if so, how to handle querying whether
 a particulra filesystem supports it, and what happens if you have a
 subtree which is covered by a filesystem that doesn't support rtime?
 
 So a program like rsync would need to scan /proc/self/mounts to see
 whether or not it would be safe to use this feature in the first
  Yes, being filesystem specific and thus requiring special handling of
mount points is a disadvantage of this approach.

 place.  And, of course, rsync would need to know whether it has write
 access to the tree in order to set flags in the directory, and what to
 do if some portion of the subtree isn't writeable by rsync.
  Yes, the cases where we cannot modify the flag in a tree would have to be
handled (similarly as the cases where the filesystem simply does not
support the feature). I don't think it wouldn't be too complicated but I have
not the modification for rsync yet, so I can underestimate...

 On Thu, Nov 08, 2007 at 11:56:42AM +0100, Jan Kara wrote:
   Note by the way that since you need to own the file/directory to set
   flags, this means that only programs that are running as root or
   running as the uid who owns the entire subtree will be able to use
   this scheme.  One advantage of doing in kernel memory is that you
   might be able to support watching a tree that is not owned by the
   watcher.
Yes, that is the advantage. On the other hand we could allow setting that
  particular flag even without being an owner of the inode. In fact, I
  don't currently see use case where you won't be either root (rsync,
  updatedb) or an owner of the files (watching config file trees) but I guess
  people would find some :).
 
 Sometimes people like to use rsync to copy a subtree to which they
 have read access but not write access.  (And here note that it's not
 enough to have write access, you actually need to *own* all of the
 directories in the subtree).
  Yes, so in such cases my feature won't be able to help. But I think
there are still enough cases where it would help.

 Yes, it's safe to let any user *set* the rtime flag, but we couldn't
 let them clear the rtime flag, since then they would be able to hide a
 file modification from some other (potentially privileged) process.
  Good point.

 Speaking of security, I assume your patch will never allow rtime to go
 backwards (for example if the user attempts to backdate a file's mtime
 field using the utime() or utimes() system call)?
  No, the patch does not allow this. But anyway in case user has enough
rights to change file's mtime, would it really be a security concern?

 I guess I'm convinced that updatedb could use this facility, but there
 are enough asteriks around it that I'm not sure that rsync could
 safely use this feature in production.  I don't doubt that in a cold
 cache case, it would speed up rsync, but because it doesn't handle
 hard links, it's not reliable.  Since rsync often gets used for
 backups, this is a big deal.  There are also questions about what to
 do if rsync doesn't have write access to the filesystem, or if there
 is a non-rtime capable filesystem mounted in the subtree, etc., that
 can be worked around, but would add a lot of complexity and grottiness
 to the rsync source tree.  Is the rsync maintainer really willing to
 add all of the necessary hair to support this rtime facility into
 their program?
  Hardlinks can be worked-around as I wrote above and there would have to
be a fallback in case we cannot set the 

Re: delalloc space accounting problem.

2007-11-08 Thread Alex Tomas

because new delalloc patch doesn't have reservation integrated.
I'm going to implement this after data=order support.

thanks, Alex

Eric Sandeen wrote:

It appears that delalloc lets me copy 50M of data onto a 30M filesystem;
at least I never get ENOSPC back, although I wind up with several files
that have 1M length but 0 blocks.

I've filed a bug in the kernel bug tracker, I think we could use a
central place to track issues:

http://bugzilla.kernel.org/show_bug.cgi?id=9329

I'll try to find time to look into this unless someone knows offhand
where the problem is...

Thanks,
-Eric

(p.s. should I get ext[234] bug mail routed to this list, or would that
be annoying?)
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


delalloc space accounting problem.

2007-11-08 Thread Eric Sandeen
It appears that delalloc lets me copy 50M of data onto a 30M filesystem;
at least I never get ENOSPC back, although I wind up with several files
that have 1M length but 0 blocks.

I've filed a bug in the kernel bug tracker, I think we could use a
central place to track issues:

http://bugzilla.kernel.org/show_bug.cgi?id=9329

I'll try to find time to look into this unless someone knows offhand
where the problem is...

Thanks,
-Eric

(p.s. should I get ext[234] bug mail routed to this list, or would that
be annoying?)
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9329] New: ext4: delalloc space accounting problem drops data

2007-11-08 Thread Andrew Morton
 On Thu,  8 Nov 2007 09:42:10 -0800 (PST) [EMAIL PROTECTED] wrote:
 http://bugzilla.kernel.org/show_bug.cgi?id=9329
 
Summary: ext4: delalloc space accounting problem drops data
Product: File System
Version: 2.5
  KernelVersion: 2.6.24-rc1
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: ext4
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 2.6.24-rc1 + ext4 git patch queue from last week or so.
 
 It appears that delalloc does not track used space properly, and fails to
 return ENOSPC as appropriate:
 
 [EMAIL PROTECTED] ~]# mkfs.ext3 -I 256 /dev/sdb7 32768
 [EMAIL PROTECTED] ~]# mount -t ext4dev -o 
 data=writeback,delalloc,extents,mballoc
 /dev/sdb7 /mnt/test
 [EMAIL PROTECTED] ~]# df -h /mnt/test
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sdb7  30M  4.5M   24M  16% /mnt/test
 [EMAIL PROTECTED] ~]# du -h /tmp/1Mfile 
 1.1M/tmp/1Mfile
 [EMAIL PROTECTED] ~]# for I in `seq 1 50`; do cp /tmp/1Mfile 
 /mnt/test/1Mfile-$I;
 done
 [EMAIL PROTECTED] ~]# df -h /mnt/test
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sdb7  30M   30M 0 100% /mnt/test
 
 all resulting files are 1M in length:
 [EMAIL PROTECTED] ~]# ls -l /mnt/test/1M* | grep -v 1048576
 [EMAIL PROTECTED] ~]# ls -l /mnt/test/1M* | grep 1048576 | wc -l
 50
 but many of them have silently dropped data on the floor:
 [EMAIL PROTECTED] ~]# du -hc /mnt/test/1Mfile-* | grep -v 1.0M
 596K/mnt/test/1Mfile-26
 0   /mnt/test/1Mfile-27
 0   /mnt/test/1Mfile-28
 0   /mnt/test/1Mfile-29
 0   /mnt/test/1Mfile-30
 snip
 
 When mounted with nodelalloc, I get proper behavior:
 
 [EMAIL PROTECTED] ~]# for I in `seq 1 50`; do cp /tmp/1Mfile 
 /mnt/test/1Mfile-$I;
 done
 cp: writing `/mnt/test/1Mfile-26': No space left on device
 cp: writing `/mnt/test/1Mfile-27': No space left on device
 cp: writing `/mnt/test/1Mfile-28': No space left on device
 cp: writing `/mnt/test/1Mfile-29': No space left on device
 snip
 
 
 -- 
 Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
 --- You are receiving this mail because: ---
 You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix check_mntent_file() to pass mode for open(O_CREAT)

2007-11-08 Thread Andreas Dilger
On my FC8 install, ismounted.c fails to build because open(O_CREAT) is
used without passing a mode.  The following trivial patch fixes it.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]

Index: e2fsprogs-1.40.2/lib/ext2fs/ismounted.c
===
--- e2fsprogs-1.40.2.orig/lib/ext2fs/ismounted.c
+++ e2fsprogs-1.40.2/lib/ext2fs/ismounted.c
@@ -147,7 +147,7 @@ static errcode_t check_mntent_file(const
 is_root:
 #define TEST_FILE /.ismount-test-file
*mount_flags |= EXT2_MF_ISROOT;
-   fd = open(TEST_FILE, O_RDWR|O_CREAT);
+   fd = open(TEST_FILE, O_RDWR|O_CREAT, 0600);
if (fd  0) {
if (errno == EROFS)
*mount_flags |= EXT2_MF_READONLY;

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix check_mntent_file() to pass mode for open(O_CREAT)

2007-11-08 Thread Eric Sandeen
Andreas Dilger wrote:
 On my FC8 install, ismounted.c fails to build because open(O_CREAT) is
 used without passing a mode.  The following trivial patch fixes it.

You can add:

Acked-by: Eric Sandeen [EMAIL PROTECTED]

'cause it's an awful lot like the patch I sent for the same issue back
on 8/16  ;-)  Guess I should have followed that up with a ping.  (though
your 0600 mode is probably better than my 0644 was)

Andreas, did you also run into trouble with struct_io_manager's -open
calls triggering this test?

I sent a patch for that,

[PATCH] rename -open and -close ops in struct_io_manager

too... maybe the glibc #define tricks got smarter and don't trigger that
now?

-Eric

 Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
 
 Index: e2fsprogs-1.40.2/lib/ext2fs/ismounted.c
 ===
 --- e2fsprogs-1.40.2.orig/lib/ext2fs/ismounted.c
 +++ e2fsprogs-1.40.2/lib/ext2fs/ismounted.c
 @@ -147,7 +147,7 @@ static errcode_t check_mntent_file(const
  is_root:
  #define TEST_FILE /.ismount-test-file  
   *mount_flags |= EXT2_MF_ISROOT;
 - fd = open(TEST_FILE, O_RDWR|O_CREAT);
 + fd = open(TEST_FILE, O_RDWR|O_CREAT, 0600);
   if (fd  0) {
   if (errno == EROFS)
   *mount_flags |= EXT2_MF_READONLY;
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Software Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html