Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-15 Thread Daniel Phillips

Sorry, I couldn't  think of any good flames that haven't already been 
posted so I thought I'd be boring and post some code. ;-)

On Monday 14 May 2001 23:50, Daniel Phillips wrote:
> On Monday 14 May 2001 20:33, Andreas Dilger wrote:
> > Daniel, you write:
> > > Now, if the check routine tells us how much good data it found we
> > > could use that to set a limit for the dirent scan, thus keeping
> > > the same robustness as the old code but without having all the
> > > checks in the inner loop.  Or.  We could have separate loops for
> > > good blocks and bad blocks, it's just a very small amount of
> > > code.
> >
> > Yes, I was thinking about both of those as well.  I think the
> > latter would be easiest, because we only need to keep a single
> > error bit per buffer.
>
> Today's patch has the first part of that fix:
>
> http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-6

And today's has the second part of that mechanism.  This is the 
strategy:

  - When a directory block is first called for via ext2_bread,
 !create, and is !uptodate, ext2_check_dirents is called to
 go through and sniff anally for anything that might not
 be right.  This check only happens once per block's
 cache lifetime.

  - If check_dirents returns failure, ext2_bread sets the
 PG_error bit in the buffer's page flags.

  - Just before we do the lowlevel scan of a directory leaf we
 check the page error bit, and if it's set call check_dirents, this 
 time asking it to tell us how much of the block is good.  That 
 value is used to set the limit for the lowlevel scan.

This recovers the old behaviour where the user continues to see
all the directory entries up to the first bad one.  Is it really 
important to do this?  Now at least we can decide whether that's the 
behaviour we want instead of worrying about how it can be implemented 
efficiently.

This high level of paranoia costs practically nothing; there are no 
extra checks to slow down the inner loop.  Thanks to Al Viro for the 
inspiration.

I only implemented this check in one place,  line  807 in 
ext2_find_entry on the nonindexed path.  If the approach looks ok  I'll 
go and add it in the other places.

Extending the idea to do an equally paranoid check on the directory 
index structure will add a little messiness because check_dirents can't 
tell that a given block is actually an index block, it has to be told.  
I'll just let this sit and age a little before I uglify the code in 
that way.

On another front,  I've changed to a new COMPAT flag so that Andreas 
Gruenbacher's ACL users can stick with the flag Andreas is already 
using.  I haven't heard a definitive ruling from Ted on this yet, but I 
gather this is the one I'm now supposed to use:

  #define EXT2_FEATURE_COMPAT_DIR_INDEX  0x0020

The current patch uses this, pending Ted's approval.  I'll repeat my 
warning that any partition with an indexed directory will need to be 
mke2fsed, until the patch gets to official alpha.

The patch is at:

http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-7

This is hardly tested at all, it's for comment.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-14 Thread Daniel Phillips

On Monday 14 May 2001 22:04, Andreas Dilger wrote:
> Daniel writes:
> > I was originally thinking we should give the admin the ability to
> > create a nonindexed directory if desired, and that's how it used to
> > be before we changed the setting of INDEX_FL from directory
> > creation time to later, when the first directory block overflows.
>
> But you still got INDEX_FL on each new (unindexed) directory, which
> didn't help much, IMHO.  You can always do "chattr -h " (or
> whatever the flag will be called) for a directory which you don't
> want indexed, and it will immediately cease to be an indexed
> directory (and the index block will revert to a normal directory
> block).

This is my point.  The can't do that nicely because they have to wait 
for the directory to grow past one block.  I don't really want to go 
back to the original way of doing it - the new code is simpler.  But... 
as long as this sits fine with you, it's done.

> The 'i' flag is already used for the immutable (unchangeable)
> property. We could use 'I' for indexed directories, but sometimes you
> will make a mistake and use 'i' and in _theory_ you cannot remove the
> immutable flag from a file while in multi-user mode for security
> reasons, so it would be a hassle to fix this.  Note I haven't sent a
> patch to Ted for the hash_indexed flag for chattr, lsattr yet.

We still have lots of time.  I see "x" isn't taken.

> The only point of contention is what e2fsck will do in the case where
> COMPAT_DIR_INDEX is set on a filesystem.  Does it index all
> multi-block directories (and set INDEX_FL for all of them)?  That
> would be easiest, but it breaks the ability to selectively turn
> on/off directory indexing.
>
> If e2fsck doesn't index all multi-block directories, then how do you
> add indexing to an existing directory?  Something (kludgy) like:
>
> mkdir newdir
> mv olddir/* newdir
> rmdir olddir
>
> or the following (the "" is the tricky
> part...)(*)
>
> find  -xdev -type d -size + | xargs
> chattr +h umount
> e2fsck 
>
> In general, I can't see a good reason why you would NOT want indexing
> on some directories and not others once you have chosen to use
> indexing on a filesystem.  If people don't want to use indexing, they
> just shouldn't turn it on.

"Always on" is so much simpler, given that they asked for it.

> Since indexing is a "zero information" feature, it is easy to turn
> the flag on and off and have e2fsck fix it (assuming you have a
> version of e2fsck that understands indexing).  Even so, you can turn
> off indexing for old e2fsck (by using a backup superblock as it will
> suggest) and your filesystem is totally OK.  When you later get a new
> e2fsck you can re-enable indexing for the filesystem with no loss of
> information.
>
> (*) This brings up a problem with the "is_dx(dir)" simply checking if
> the EXT2_INDEX_FL is set.  We need to be sure that the code will bail
> out to linear directory searching properly if there is no index
> present, so it is possible to set this flag from user space for later
> e2fsck.

You mean in the case that EXT2_INDEX_FL is set, but there isn't 
actually an index there, or it's corrupt?  It's on my list of things to 
do.

> > The max link count logic in your previous patch didn't seem to be
> > complete so I didn't put it in.  I'll wait for more from you on
> > that.  I followed the ReseirFS thread on this but I'm not sure
> > exactly what form you want this to take in ext2.
>
> I'm not sure what you mean by incomplete

I didn't mean that, I found the rest of the code (in the other patch) 
shortly after I wrote it and forgot to edit the mail.

> (although I did have the
> aforementioned testing problems, so I never did try > 64k
> subdirectories in a single directory).  Basically, it works the same
> as reiserfs, with the extra caveat that we limit ourselves to
> EXT2_LINK_MAX for non-indexed directories (for performance/sanity
> reasons).  We keep link counts for indexed directories until we
> overflow 64k subdirectories (as opposed to the arbitrary 32000 limit
> of EXT2_LINK_MAX), after which we use "1" as the directory link
> count.  I also want to make sure the code degrades to the original
> (slightly simpler) code if CONFIG_EXT2_INDEX is not set.
>
> > The index create code has been moved out of the non-indexed code
> > path and into the scope of the indexed path.  This is with a view
> > to cutting ext2_add_entry into pieces at some point, for aesthetic
> > reasons.
>
> Yes, I was hoping for something like this.  That function is just
> getting way too long...
>
> I'll hopefully get a chance to look at the new code soon.

There's more new code today, you can skip yesterday's :-)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-14 Thread Daniel Phillips

On Monday 14 May 2001 22:04, Andreas Dilger wrote:
> Maybe we can have a "noindex" mount option for this?

We need that regardless, I just keep forgetting to put it in.  I assume 
the semantics are obvious: no new indexes are created but existing ones 
are maintained.  I.e., -o noindex does not mean 'seek out and destroy 
all indexes'.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-14 Thread Daniel Phillips

On Monday 14 May 2001 20:33, Andreas Dilger wrote:
> Daniel, you write:
> > Now, if the check routine tells us how much good data it found we
> > could use that to set a limit for the dirent scan, thus keeping the
> > same robustness as the old code but without having all the checks
> > in the inner loop.  Or.  We could have separate loops for good
> > blocks and bad blocks, it's just a very small amount of code.
>
> Yes, I was thinking about both of those as well.  I think the latter
> would be easiest, because we only need to keep a single error bit per
> buffer.

Today's patch has the first part of that fix:

http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-6

I broke up Al's check_page routine into the page-specific part and the 
dirent-specific part, which I call every time a buffer is brought 
uptodate in ext2_bread.  This is roughly as efficient as Al's 
page-oriented check.  I could get rid of the code in Al's check_page 
that initializes sets the rec_lens of a new dir page to blocksize 
because I do that explicitly in ext2_add_entry now.  This will make it 
a little cleaner.

The next step is to try and incorporate the intelligence about the good 
parts of a bad dirent block into the entry lookup code cleanly.

I moved ext2_bread and ext2_append into dir.c because of their 
dir-specific nature.  (I have some plans for ext2_getblk that have 
nothing to do with directories, which is why I'm trying to keep it 
generic.)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-14 Thread Andreas Dilger

Daniel writes:
> On Thursday 10 May 2001 09:21, Andreas Dilger wrote:
> > I have changed the code to do the following:
> > - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the
> >   INDEX mount option (the same as "mount -o index").  This removes
> >   the need to specify the "-o index" option each time for filesystems
> >   which already have indexed directories.
> 
> This applied fine.  The semantics are a little loose though: after 
> mounting -o index, the first directory that overflows its first block 
> will turn on 'sticky indexing' for that partition.

Maybe we can have a "noindex" mount option for this?  In any case, if
you have a kernel which supports indexed directories, no harm is done,
and it will be ignored by other kernels.  There is always the option
of doing "tune2fs -O ^dir_index" to turn off that bit.

> I was originally thinking we should give the admin the ability to 
> create a nonindexed directory if desired, and that's how it used to be 
> before we changed the setting of INDEX_FL from directory creation time 
> to later, when the first directory block overflows.

But you still got INDEX_FL on each new (unindexed) directory, which
didn't help much, IMHO.  You can always do "chattr -h " (or whatever
the flag will be called) for a directory which you don't want indexed,
and it will immediately cease to be an indexed directory (and the index
block will revert to a normal directory block).

The 'i' flag is already used for the immutable (unchangeable) property.
We could use 'I' for indexed directories, but sometimes you will make a
mistake and use 'i' and in _theory_ you cannot remove the immutable flag
from a file while in multi-user mode for security reasons, so it would
be a hassle to fix this.  Note I haven't sent a patch to Ted for the
hash_indexed flag for chattr, lsattr yet.

The only point of contention is what e2fsck will do in the case where
COMPAT_DIR_INDEX is set on a filesystem.  Does it index all multi-block
directories (and set INDEX_FL for all of them)?  That would be easiest,
but it breaks the ability to selectively turn on/off directory indexing.

If e2fsck doesn't index all multi-block directories, then how do you add
indexing to an existing directory?  Something (kludgy) like:

mkdir newdir
mv olddir/* newdir
rmdir olddir

or the following (the "" is the tricky part...)(*)

find  -xdev -type d -size + | xargs chattr +h
umount
e2fsck 

In general, I can't see a good reason why you would NOT want indexing on
some directories and not others once you have chosen to use indexing on
a filesystem.  If people don't want to use indexing, they just shouldn't
turn it on.

Since indexing is a "zero information" feature, it is easy to turn the
flag on and off and have e2fsck fix it (assuming you have a version of
e2fsck that understands indexing).  Even so, you can turn off indexing
for old e2fsck (by using a backup superblock as it will suggest) and
your filesystem is totally OK.  When you later get a new e2fsck you can
re-enable indexing for the filesystem with no loss of information.

(*) This brings up a problem with the "is_dx(dir)" simply checking if the
EXT2_INDEX_FL is set.  We need to be sure that the code will bail out
to linear directory searching properly if there is no index present,
so it is possible to set this flag from user space for later e2fsck.

> I broke out 'ext2_add_compat_feature' as a trial balloon.  It's not
> really very satisfying - it's annoying to have a great piece of
> machinery that's used in exactly one place.  I'd prefer to have
> 'ext2_add_feature' that takes the kind of compatibility as a parameter. 
> We can do this by treating the three ext2_super_block compatibility
> fields as an array, what do you think?  Then we'd get to use it in two
> places :-]

Actually, this is used a lot more in ext3, because we set the HAS_JOURNAL
flag, and set/unset the RECOVERY flag in several places, so it will be
useful there.

> The max link count logic in your previous patch didn't seem to be
> complete so I didn't put it in.  I'll wait for more from you on that.  I
> followed the ReseirFS thread on this but I'm not sure exactly what form
> you want this to take in ext2.

I'm not sure what you mean by incomplete (although I did have the
aforementioned testing problems, so I never did try > 64k subdirectories
in a single directory).  Basically, it works the same as reiserfs, with
the extra caveat that we limit ourselves to EXT2_LINK_MAX for non-indexed
directories (for performance/sanity reasons).  We keep link counts for
indexed directories until we overflow 64k subdirectories (as opposed
to the arbitrary 32000 limit of EXT2_LINK_MAX), after which we use "1"
as the directory link count.  I also want to make sure the code degrades
to the original (slightly simpler) code if CONFIG_EXT2_INDEX is not set.

> The index create code has been moved out of the non-indexed code path
> and into the scope of the indexed path.  This is with a

Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-14 Thread Daniel Phillips

On Monday 14 May 2001 20:33, Andreas Dilger wrote:
> Danie, you write:
> > This can go in ext2_bread, which already has dir-specific code in
> > it (readahead), and ext2_getblk remains generic, for what it's
> > worth.
>
> Note that the dir-specific code in ext2_bread() is not readahead, but
> rather directory block pre-allocation,

oops, yes that's what I meant.

> which would totally break
> indexed directory code (because the empty blocks would not be put
> into the index and would remain unreferenced thereafter).

The preallocation just does ext2_getblk, it doesn't increase i_size.  
The indexing code relies on i_size to append a new block, so this 
should work ok.

> For that matter, I'm not sure whether the dir-prealloc feature even
> works. The blocks are created/allocated, but are not initialized with
> the empty dirent data (i.e. set rec_len = blocksize), so it would
> just create a corrupt directory block.

I guess the lifetime of these blocks is the same as the lifetime of the 
directory's dcache entry, but I have to go spelunking to be sure.  If 
that's right then it's probably a good feature.  Having speculated 
about it, I should test this and see what happens.   

> > > ...(later)... I had a look at pulling out the ext2_check_page()
> > > code so that it can be used for both ext2_get_page() and
> > > ext2_bread(), but one thing concerns me - if we do checking on
> > > the whole page/buffer at once, then any error in the page/buffer
> > > will discard all of the dir entries in that page.  In the old
> > > code, we would at least return all of the entries up to the bad
> > > dir entry.  Comments on this?
> >
> > For a completely empty page that's the right thing to do, much
> > better than hanging as it does now.  We don't want to try to repair
> > damage on the fly do we?
>
> What do you mean by hanging?  You refer to new (indexed) code or old
> code?

I mean the new code, which just loops forever if it gets a zero 
rec_len.  It would be nice to get rid of this fragility without putting 
an extra check in the inner loop.  I think it's worth the effort, we 
are still cycling linearly through an average of roughly 128 directory 
entires on every directory operation.  I'll try fixing this by method 
related to what Al does, except instead of using the page flag, the 
place to do it is where the block gets mapped.

[...] No repair on the fly: Ack [...]

> > Now, if the check routine tells us how much good data it found we
> > could use that to set a limit for the dirent scan, thus keeping the
> > same robustness as the old code but without having all the checks
> > in the inner loop.  Or.  We could have separate loops for good
> > blocks and bad blocks, it's just a very small amount of code.
>
> Yes, I was thinking about both of those as well.  I think the latter
> would be easiest, because we only need to keep a single error bit per
> buffer.

Heh.  I was going to try the other one first.

By the way, all the code for the directory link limit enhancement was 
in fact in your last patch, and is included in the patch I uploaded 
yesterday.  I haven't tested it.  It would be nice to know how fast we 
can create 1,000,000 empty directories :-)

I noticed that Postfix creates an elaborate directory bucket structure 
that will be completely obsolete when the directory indexing becomes 
generally available.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-14 Thread Andreas Dilger

Danie, you write:
> This can go in ext2_bread, which already has dir-specific code in it
> (readahead), and ext2_getblk remains generic, for what it's worth.

Note that the dir-specific code in ext2_bread() is not readahead, but
rather directory block pre-allocation, which would totally break indexed
directory code (because the empty blocks would not be put into the index
and would remain unreferenced thereafter).

For that matter, I'm not sure whether the dir-prealloc feature even works.
The blocks are created/allocated, but are not initialized with the empty
dirent data (i.e. set rec_len = blocksize), so it would just create a
corrupt directory block.

> > ...(later)... I had a look at pulling out the ext2_check_page() code
> > so that it can be used for both ext2_get_page() and ext2_bread(), but
> > one thing concerns me - if we do checking on the whole page/buffer at
> > once, then any error in the page/buffer will discard all of the dir
> > entries in that page.  In the old code, we would at least return all
> > of the entries up to the bad dir entry.  Comments on this?
> 
> For a completely empty page that's the right thing to do, much better 
> than hanging as it does now.  We don't want to try to repair damage on 
> the fly do we?

What do you mean by hanging?  You refer to new (indexed) code or old code?
We definitely cannot repair this damage on-the-fly.  It would be hard/work
to find the next valid dir entry in the block, and that may be a false
positive.  Running e2fsck fixes this by clearing that dirent and setting
the rec_len to the rest of the block (hence deleting all of the remaining
dirents in the block).  The inodes they point to will be put in lost+found,
if that was the only link.

Note that e2fsck also guarantees that a directory start with "." and "..",
so we are pretty safe with the first block-split code.

> Now, if the check routine tells us how much good data it found we could 
> use that to set a limit for the dirent scan, thus keeping the same 
> robustness as the old code but without having all the checks in the 
> inner loop.  Or.  We could have separate loops for good blocks and bad 
> blocks, it's just a very small amount of code.

Yes, I was thinking about both of those as well.  I think the latter would
be easiest, because we only need to keep a single error bit per buffer.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-13 Thread Daniel Phillips

On Thursday 10 May 2001 09:21, Andreas Dilger wrote:
> I previously wrote:
> > I was looking at the new patch, and I saw something that puzzles
> > me. Why do you set the EXT2_INDEX_FL on a new (empty) directory,
> > rather than only setting it when the dx_root index is created?
> >
> > Setting the flag earlier than that makes it mostly useless, since
> > it will be set on basically every directory.  Not setting it would
> > also make your is_dx() check simply a check for the EXT2_INDEX_FL
> > bit (no need to also check size).
> >
> > Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we
> > have a (real) directory with an index, to avoid gratuitous
> > incompatibility with e2fsck.

This is fixed in today's patch.

> I have changed the code to do the following:
> - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the
>   INDEX mount option (the same as "mount -o index").  This removes
>   the need to specify the "-o index" option each time for filesystems
>   which already have indexed directories.

This applied fine.  The semantics are a little loose though: after 
mounting -o index, the first directory that overflows its first block 
will turn on 'sticky indexing' for that partition.

I was originally thinking we should give the admin the ability to 
create a nonindexed directory if desired, and that's how it used to be 
before we changed the setting of INDEX_FL from directory creation time 
to later, when the first directory block overflows.

You are probably right on both counts, but it's something to think 
about.

> - New directories NEVER have the INDEX flag set on them.
> - If the INDEX mount option is set, then when directories grow past 1
>   block (and have the index added) they will get the directory INDEX
>   flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off).

It works this way now.

> This means that you can have common code for indexed and non-indexed
> ext2 filesystems, and the admin either needs to explicitly set
> COMPAT_DIR_INDEX in the superblock or mount with "-o index" (and
> create a directory > 1 block).

Actually, I've had common code for indexed and non-indexed right from 
the first prototype, I hope there isn't any misunderstanding there.

> I have also added some tricks to ext2_inc_count() and
> ext2_dec_count() so that indexed directories are not subject to the
> EXT2_LINK_MAX.  I've done the same as reiserfs, and set i_nlink = 1
> if we overflow EXT2_LINK_MAX (which has been increased to 65500 for
> indexed directories). Apparently i_nlink = 1 is the right think to do
> w.r.t. find and other user tools.
>
> Patches need some light testing before being posted.

Today's Patch
---

The main change in today's patch is the simplified handling of the index
creation.  Besides that there are quite a few incremental improvements
and most, but not all of your recent patch set is in.

I broke out 'ext2_add_compat_feature' as a trial balloon.  It's not
really very satisfying - it's annoying to have a great piece of
machinery that's used in exactly one place.  I'd prefer to have
'ext2_add_feature' that takes the kind of compatibility as a parameter. 
We can do this by treating the three ext2_super_block compatibility
fields as an array, what do you think?  Then we'd get to use it in two
places :-]

I wrote ext2_append as a wrapper on ext2_bread, mainly to bury the
i_size handling.  Its function is to append a single block to a file.  
I pass back the block number of the appended block via a pointer, which 
is clumsy but the only other choice is to calculate the block number 
every time outside the function, which just leaves more things to get 
out of sync.

I found a real bug in the continued-hash handling - instead of advancing
to the next index block the same block would be reread.  Nobody hit this
because even with a million file directory the error condition would
exist for less than 1 in 250 million entries.  Still, correct is
correct.  Having to deal with such rare conditions is the price we have
to pay for the benefits of hashed keys.

The max link count logic in your previous patch didn't seem to be
complete so I didn't put it in.  I'll wait for more from you on that.  I
followed the ReseirFS thread on this but I'm not sure exactly what form
you want this to take in ext2.

I followed your recommendation and simplified the index creation and
INDEX_FL flag interpretation.  Now the index is created for any
directory that grows over a block, whereas before the condition was 'any
new directory created with indexing enabled that later grows over one
block'.

The index create code has been moved out of the non-indexed code path
and into the scope of the indexed path.  This is with a view to cutting
ext2_add_entry into pieces at some point, for aesthetic reasons.

The one place where I'm relying on a newly-created block to be contain
zeros got a comment.

I was unable to apply your patch all at once, and of course, I'd hacked
on the co

Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-12 Thread Daniel Phillips

On Sunday 13 May 2001 00:18, Alexander Viro wrote:
> On Sat, 12 May 2001, Andreas Dilger wrote:
> > We could use the "buffer_uptodate" flag on the buffer to signal
> > that the block has been checked.  AFAIK, a new buffer will not be
> > uptodate, and once it is it will not be read from disk again... 
> > However, if a user-space process read the buffer would also mark it
> > uptodate without doing the check...  Maybe we should use a new BH_
> > pointer... Just need to factor out the ext2_check_page() code so
> > that it works on a generic memory pointer and end pointer.
>
> Or you could simply use ext2_get_page() and forget about this crap.

I tried that first.  The resulting code was not nice and worked only 
for 4K block_size, as far as I took it.

I'm not sure what advantage you see in ext2_get_page, perhaps you can 
explain.
--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-12 Thread Daniel Phillips

On Saturday 12 May 2001 23:41, Andreas Dilger wrote:
> Daniel writes:
> > Oh yes, I'm well aware it, that's what I mean by the "bullet
> > proofing" item on my to-do list.  I don't quite agree with the idea
> > of embedding the checking of directory entry format inside the
> > ext2_get_page routine, it should be moved outside ext2_get_page, on
> > basic principal. Never mind that this routine is only used to get
> > pages of directory files, it's still not nice.  Anyway, I think the
> > thing to do is graft a similar one-time check onto ext2_bread.  We
> > can't use Al's PG_checked mechanism because we might have read only
> > one buffer out of several on the page.
>
> We could use the "buffer_uptodate" flag on the buffer to signal that
> the block has been checked.  AFAIK, a new buffer will not be
> uptodate, and once it is it will not be read from disk again... 
> However, if a user-space process read the buffer would also mark it
> uptodate without doing the check...  Maybe we should use a new BH_
> pointer... Just need to factor out the ext2_check_page() code so that
> it works on a generic memory pointer and end pointer.

Yes, exactly what I had in mind.  In other words, do the check when a 
!uptodate buffer is read.  This can go in ext2_bread, which already has 
dir-specific code in it (readahead), and ext2_getblk remains generic, 
for what it's worth.

> > > It turns out that in adding the checks for rec_len, I fixed my
> > > original bug, but added another...  Please disregard my previous
> > > patch.
> >
> > In any event, I couldn't apply it due to patch giving a 'malformed'
> > error - did you edit it after diffing?  I'll wait for a revised
> > patch - there's quite a lot of stuff in there including formatting
> > changes, ext2 warnings, etc.
>
> Yes, I think I had mentioned this in the patch, maybe not.  I removed
> all of the INDEX flag changes I had made from the resulting diff, so
> I probably screwed up somewhere...

I went through your patch and distilled out the stylistic changes you 
made.  I simplified the INDEX_FL handling along the lines you suggested 
and factored out a ext2_add_compat_feature routine among other 
incremental improvements.  I'll post the update tomorrow.

I did not attempt to address the check_ issue this time round.

> ...(later)... I had a look at pulling out the ext2_check_page() code
> so that it can be used for both ext2_get_page() and ext2_bread(), but
> one thing concerns me - if we do checking on the whole page/buffer at
> once, then any error in the page/buffer will discard all of the dir
> entries in that page.  In the old code, we would at least return all
> of the entries up to the bad dir entry.  Comments on this?

For a completely empty page that's the right thing to do, much better 
than hanging as it does now.  We don't want to try to repair damage on 
the fly do we?

Now, if the check routine tells us how much good data it found we could 
use that to set a limit for the dirent scan, thus keeping the same 
robustness as the old code but without having all the checks in the 
inner loop.  Or.  We could have separate loops for good blocks and bad 
blocks, it's just a very small amount of code.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-12 Thread Alexander Viro



On Sat, 12 May 2001, Andreas Dilger wrote:

> We could use the "buffer_uptodate" flag on the buffer to signal that
> the block has been checked.  AFAIK, a new buffer will not be uptodate,
> and once it is it will not be read from disk again...  However, if a
> user-space process read the buffer would also mark it uptodate without
> doing the check...  Maybe we should use a new BH_ pointer... Just need
> to factor out the ext2_check_page() code so that it works on a generic
> memory pointer and end pointer.

Or you could simply use ext2_get_page() and forget about this crap.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-12 Thread Andreas Dilger

Daniel writes:
> Oh yes, I'm well aware it, that's what I mean by the "bullet proofing" 
> item on my to-do list.  I don't quite agree with the idea of embedding 
> the checking of directory entry format inside the ext2_get_page 
> routine, it should be moved outside ext2_get_page, on basic principal.
> Never mind that this routine is only used to get pages of directory 
> files, it's still not nice.  Anyway, I think the thing to do is graft a 
> similar one-time check onto ext2_bread.  We can't use Al's PG_checked 
> mechanism because we might have read only one buffer out of several on 
> the page.

We could use the "buffer_uptodate" flag on the buffer to signal that
the block has been checked.  AFAIK, a new buffer will not be uptodate,
and once it is it will not be read from disk again...  However, if a
user-space process read the buffer would also mark it uptodate without
doing the check...  Maybe we should use a new BH_ pointer... Just need
to factor out the ext2_check_page() code so that it works on a generic
memory pointer and end pointer.

> > It turns out that in adding the checks for rec_len, I fixed my
> > original bug, but added another...  Please disregard my previous
> > patch.
> 
> In any event, I couldn't apply it due to patch giving a 'malformed' 
> error - did you edit it after diffing?  I'll wait for a revised patch - 
> there's quite a lot of stuff in there including formatting changes, 
> ext2 warnings, etc.

Yes, I think I had mentioned this in the patch, maybe not.  I removed
all of the INDEX flag changes I had made from the resulting diff, so
I probably screwed up somewhere...


...(later)... I had a look at pulling out the ext2_check_page() code
so that it can be used for both ext2_get_page() and ext2_bread(), but
one thing concerns me - if we do checking on the whole page/buffer at
once, then any error in the page/buffer will discard all of the dir
entries in that page.  In the old code, we would at least return all
of the entries up to the bad dir entry.  Comments on this?

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-11 Thread Daniel Phillips

On Friday 11 May 2001 18:34, Andreas Dilger wrote:
> Al writes:
> > On Fri, 11 May 2001, Andreas Dilger wrote:
> > > I've tested again, now with kdb, and the system loops in
> > > ext2_find_entry() or ext2_add_link(), because there is a
> > > directory with a zero rec_len. While the actual cause of this
> > > problem is elsewhere, the fact that ext2_next_entry() will loop
> > > forever with a bad rec_len is a bug not in the old ext2 code.
> >
> > No. Bug is that data ends up in pages without being validated.
> > That's the real thing to watch for - if ext2_get_page() is the only
> > way to get pages in cache you get all checks in one place and done
> > once.
>
> OK, I don't think that Daniel is aware of this, I wasn't either.  He
> is using ext2_bread() modified to access the page cache instead of
> the buffer cache.

Oh yes, I'm well aware it, that's what I mean by the "bullet proofing" 
item on my to-do list.  I don't quite agree with the idea of embedding 
the checking of directory entry format inside the ext2_get_page 
routine, it should be moved outside ext2_get_page, on basic principal.
Never mind that this routine is only used to get pages of directory 
files, it's still not nice.  Anyway, I think the thing to do is graft a 
similar one-time check onto ext2_bread.  We can't use Al's PG_checked 
mechanism because we might have read only one buffer out of several on 
the page.

> It turns out that in adding the checks for rec_len, I fixed my
> original bug, but added another...  Please disregard my previous
> patch.

In any event, I couldn't apply it due to patch giving a 'malformed' 
error - did you edit it after diffing?  I'll wait for a revised patch - 
there's quite a lot of stuff in there including formatting changes, 
ext2 warnings, etc.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-11 Thread Andreas Dilger

Al writes:
> On Fri, 11 May 2001, Andreas Dilger wrote:
> > I've tested again, now with kdb, and the system loops in ext2_find_entry()
> > or ext2_add_link(), because there is a directory with a zero rec_len.
> > While the actual cause of this problem is elsewhere, the fact that
> > ext2_next_entry() will loop forever with a bad rec_len is a bug not in
> > the old ext2 code.
> 
> No. Bug is that data ends up in pages without being validated. That's
> the real thing to watch for - if ext2_get_page() is the only way to
> get pages in cache you get all checks in one place and done once.

OK, I don't think that Daniel is aware of this, I wasn't either.  He
is using ext2_bread() modified to access the page cache instead of the
buffer cache.

It turns out that in adding the checks for rec_len, I fixed my original
bug, but added another...  Please disregard my previous patch.  

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-11 Thread Daniel Phillips

On Friday 11 May 2001 09:10, Andreas Dilger wrote:
> and previously wrote:
> > OK, here are the patches described above.
> >
> > Unfortunately, they haven't been tested.  I've given them several
> > eyeballings and they appear OK, but when I try to run the ext2
> > index code (even without "-o index" mount option) my system
> > deadlocks somwhere inside my ext2i module (tight loop, but SysRQ
> > still works).  I doubt it is due to these patches, but possibly
> > because I am also working on ext3 code in the same kernel.  I'm
> > just in the process of getting kdb installed into that kernel so I
> > can find out just where it is looping.
>
> I've tested again, now with kdb, and the system loops in
> ext2_find_entry() or ext2_add_link(), because there is a directory
> with a zero rec_len. While the actual cause of this problem is
> elsewhere, the fact that ext2_next_entry() will loop forever with a
> bad rec_len is a bug not in the old ext2 code.
>
> I have changed ext2_next_entry() to verify rec_len >
> EXT2_DIR_REC_LEN(0), and added a helper function ext2_next_empty()
> which is nearly the same, but it follows the de links until it finds
> one with enough space for a new entry (used in ext2_add_link()).  The
> former is useful for both Al and Daniel's code, while the latter may
> only be useful for Daniel's.

Al already answered this, but I'll amplify.  Al keeps a bit in the page 
state that tells you if a page in the page cache has been 'checked' 
since it was entered into the cache, allowing his code to carry out a 
careful, expensive check that is performed only once per cache page 
lifetime.  This will pick up bad data coming from disk, and we assume 
that newly created entries have been created correctly.  It's an 
interesting idea, but note his comment that 'this should die early'.  
I'm torn on that, it's hard to think of a cuter way of doing this.

We could use the same strategy in ext2_getblk, and in fact use the same
check_page code, providing the PG_checked bit is going to stay.

> However, the way that I currently fixed ext2_next_entry() is probably
> not very safe.  If it detects a bad rec_len, we should probably not
> touch that block anymore, but it currently just makes it look like
> the block is empty.  That may lead to deleting the rest of the
> directory entries in the block, although this is what e2fsck does
> anyways...  I at least need to set the error flag in the
> superblock...  Next patch.

We can set the PG_error flag on the page, then we get the complaint 
just once per mount (typically, unless you have cache pressure, then 
you probably want to get lots of complaints).

> I spotted another bug, namely that when allocating a new directory
> page it sets rec_len to blocksize, but does not zero the inode
> field...  This would imply that we would skip a bogus directory entry
> at the start of a new page.

The block is supposed to be zeroed when created.  I think I did that 
correctly in ext2_getblk.  This at least deserves a comment.

> As yet I haven't found the cause of the real bug, but at least now my
> kernel doesn't lock up on (relatively common) bogus data.

Yep, this counts as 'bullet proofing'.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-11 Thread Alexander Viro



On Fri, 11 May 2001, Andreas Dilger wrote:
 
> I've tested again, now with kdb, and the system loops in ext2_find_entry()
> or ext2_add_link(), because there is a directory with a zero rec_len.
> While the actual cause of this problem is elsewhere, the fact that
> ext2_next_entry() will loop forever with a bad rec_len is a bug not in
> the old ext2 code.

No. Bug is that data ends up in pages without being validated. That's
the real thing to watch for - if ext2_get_page() is the only way to
get pages in cache you get all checks in one place and done once.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-10 Thread Andreas Dilger

I previously wrote:
> OK, here are the patches described above.
> 
> Unfortunately, they haven't been tested.  I've given them several
> eyeballings and they appear OK, but when I try to run the ext2 index code
> (even without "-o index" mount option) my system deadlocks somwhere
> inside my ext2i module (tight loop, but SysRQ still works).  I doubt
> it is due to these patches, but possibly because I am also working on
> ext3 code in the same kernel.  I'm just in the process of getting kdb
> installed into that kernel so I can find out just where it is looping.

I've tested again, now with kdb, and the system loops in ext2_find_entry()
or ext2_add_link(), because there is a directory with a zero rec_len.
While the actual cause of this problem is elsewhere, the fact that
ext2_next_entry() will loop forever with a bad rec_len is a bug not in
the old ext2 code.

I have changed ext2_next_entry() to verify rec_len > EXT2_DIR_REC_LEN(0),
and added a helper function ext2_next_empty() which is nearly the same,
but it follows the de links until it finds one with enough space for a
new entry (used in ext2_add_link()).  The former is useful for both Al
and Daniel's code, while the latter may only be useful for Daniel's.

However, the way that I currently fixed ext2_next_entry() is probably
not very safe.  If it detects a bad rec_len, we should probably not
touch that block anymore, but it currently just makes it look like the
block is empty.  That may lead to deleting the rest of the directory
entries in the block, although this is what e2fsck does anyways...  I
at least need to set the error flag in the superblock...  Next patch.

I spotted another bug, namely that when allocating a new directory page
it sets rec_len to blocksize, but does not zero the inode field...  This
would imply that we would skip a bogus directory entry at the start of
a new page.

As yet I haven't found the cause of the real bug, but at least now my
kernel doesn't lock up on (relatively common) bogus data.

Cheers, Andreas

PS - patch has recieved some editing to remove the other INDEX flag stuff,
 so hopefully it is OK
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert


diff -ru linux/fs/ext2i.orig/dir.c linux/fs/ext2i/dir.c
--- linux/fs/ext2i.orig/dir.c   Thu May 10 12:10:22 2001
+++ linux/fs/ext2i/dir.cFri May 11 00:01:46 2001
@@ -573,20 +599,48 @@
 }
 
 /*
- * p is at least 6 bytes before the end of page
+ * Check record length to ensure we don't loop on a bad directory entry.
+ * de is at least 6 bytes before the end of page.
  */
-static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
+static inline ext2_dirent *ext2_next_entry(ext2_dirent *de, ext2_dirent *top)
 {
-   return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
+   int len = le16_to_cpu(de->rec_len);
+   if (len < EXT2_DIR_REC_LEN(0)) {
+   printk(KERN_ERR "EXT2-fs: bad directory record length %d!\n",
+  len);
+   return top;
+   }
+   return (ext2_dirent *)((char *)de + len);
+}
+
+static inline ext2_dirent *ext2_next_empty(ext2_dirent *de, ext2_dirent *top,
+  unsigned reclen)
+{
+   while (de <= top) {
+   unsigned nlen = EXT2_DIR_REC_LEN(de->name_len);
+   unsigned rlen = le16_to_cpu(de->rec_len);
+   if (rlen < EXT2_DIR_REC_LEN(0)) {
+   printk(KERN_ERR
+  "EXT2-fs: bad directory record length %d!\n",
+  rlen);
+   break;
+   }
+   if ((de->inode? rlen - nlen: rlen) >= reclen)
+   return de;
+   de = (ext2_dirent *) ((char *) de + rlen);
+   }
+   return NULL;
 }
 
 static inline unsigned 
 ext2_validate_entry(char *base, unsigned offset, unsigned mask)
 {
ext2_dirent *de = (ext2_dirent*)(base + offset);
ext2_dirent *p = (ext2_dirent*)(base + (offset&mask));
-   while ((char*)p < (char*)de)
-   p = ext2_next_entry(p);
+   while (p < de)
+   p = ext2_next_entry(p, de);
return (char *)p - base;
 }
 
@@ -640,8 +706,8 @@
types = ext2_filetype_table;
 
for ( ; n < npages; n++, offset = 0) {
-   char *kaddr, *limit;
-   ext2_dirent *de;
+   char *kaddr;
+   ext2_dirent *de, *top;
struct page *page = ext2_get_page(inode, n);
 
if (IS_ERR(page))
@@ -652,8 +718,9 @@
need_revalidate = 0;
}
de = (ext2_dirent *)(kaddr+offset);
-   limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1);
-   for ( ;(char*)de <= limit; de = ext2_next_entry(de))
+

Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-10 Thread Daniel Phillips

On Thursday 10 May 2001 22:53, Andreas Dilger wrote:
> OK, here are the patches described above.
>
> The first one changes the use of the various INDEX flags, so that
> they only appear when we have mounted with "-o index" (or
> COMPAT_DIR_INDEX) and actually created an indexed directory.
>
> The second one changes the i_nlink counting on indexed dirs so that
> if it overflows the 16-bit i_link_count field it is set to 1 and we
> no longer track link counts on this directory.
>
> Unfortunately, they haven't been tested.  I've given them several
> eyeballings and they appear OK, but when I try to run the ext2 index
> code (even without "-o index" mount option) my system deadlocks
> somwhere inside my ext2i module (tight loop, but SysRQ still works). 
> I doubt it is due to these patches, but possibly because I am also
> working on ext3 code in the same kernel.  I'm just in the process of
> getting kdb installed into that kernel so I can find out just where
> it is looping.

I'll apply and test them in uml tomorrow (3 am here now).  By the way, 
uml was made in heaven for this sort of filesystem development.  Err, 
actually it was made by Jeff, but I guess that makes him some kind of 
angel :-)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-10 Thread Daniel Phillips

On Wednesday 09 May 2001 23:22, you wrote:
> Daniel writes [re index directories]:
> > This is lightly tested and apparently stable.
>
> I was looking at the new patch, and I saw something that puzzles me.
> Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather
> than only setting it when the dx_root index is created?

This makes it follow the rule: new directories will be created with an 
index.  (Never mind that the index creation is deferred.)  Now, if you 
think it's ok to add an index to any directory that grows past one 
block that's fine with me.  I was looking for a little more predictable 
behavriour.

A couple of lines of code will go away and the is_dx(dir) tests will 
get a little faster if we just use 'dir grew past a block' as the rule.

> Setting the flag earlier than that makes it mostly useless, since it
> will be set on basically every directory.  Not setting it would also
> make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no
> need to also check size).

Yep.

> Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we
> have a (real) directory with an index, to avoid gratuitous
> incompatibility with e2fsck.
>
> Cheers, Andreas

Again yep,  you decide, I will fix, or I see a few posts down you have 
changed the patch, also fine.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-10 Thread Andreas Dilger

I previously wrote:
> I have changed the code to do the following:
> - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the
>   INDEX mount option (the same as "mount -o index").  This removes
>   the need to specify the "-o index" option each time for filesystems
>   which already have indexed directories.
> - New directories NEVER have the INDEX flag set on them.
> - If the INDEX mount option is set, then when directories grow past 1
>   block (and have the index added) they will get the directory INDEX
>   flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off).
> 
> This means that you can have common code for indexed and non-indexed ext2
> filesystems, and the admin either needs to explicitly set COMPAT_DIR_INDEX
> in the superblock or mount with "-o index" (and create a directory > 1 block).
> 
> I have also added some tricks to ext2_inc_count() and ext2_dec_count() so
> that indexed directories are not subject to the EXT2_LINK_MAX.  I've done
> the same as reiserfs, and set i_nlink = 1 if we overflow EXT2_LINK_MAX
> (which has been increased to 65500 for indexed directories). Apparently
> i_nlink = 1 is the right think to do w.r.t. find and other user tools.

OK, here are the patches described above.

The first one changes the use of the various INDEX flags, so that they
only appear when we have mounted with "-o index" (or COMPAT_DIR_INDEX)
and actually created an indexed directory.

The second one changes the i_nlink counting on indexed dirs so that if
it overflows the 16-bit i_link_count field it is set to 1 and we no
longer track link counts on this directory.

Unfortunately, they haven't been tested.  I've given them several
eyeballings and they appear OK, but when I try to run the ext2 index code
(even without "-o index" mount option) my system deadlocks somwhere
inside my ext2i module (tight loop, but SysRQ still works).  I doubt
it is due to these patches, but possibly because I am also working on
ext3 code in the same kernel.  I'm just in the process of getting kdb
installed into that kernel so I can find out just where it is looping.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert


diff -u linux/ext2.orig/dir.c linux/ext2/dir.c
--- linux/ext2.orig/dir.c   Thu May 10 12:10:22 2001
+++ linux/ext2/dir.cThu May 10 11:09:51 2001
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #ifndef assert
@@ -201,6 +204,21 @@
return hash0;
 }
 
+static void ext2_update_to_indexed(struct inode *dir)
+{
+   struct super_block *sb = dir->i_sb;
+
+   if (test_opt(sb, INDEX) && !EXT2_HAS_COMPAT_FEATURE(sb, DIR_INDEX)) {
+   dxtrace_on(printk("Adding COMPAT_DIR_INDEX feature flag\n"));
+   ext2_update_dynamic_rev(sb);
+   lock_kernel();
+   EXT2_SET_COMPAT_FEATURE(sb, DIR_INDEX);
+   unlock_kernel();
+   ext2_write_super(dir->i_sb);
+   }
+   dir->u.ext2_i.i_flags |= EXT2_INDEX_FL;
+}
+
 /*
  * Debug
  */
@@ -696,8 +720,7 @@
char *data, *top;
*result = NULL;
if (namelen > EXT2_NAME_LEN) return NULL;
-   if (ext2_dx && is_dx(dir))
-   {
+   if (is_dx(dir)) {
u32 hash = dx_hash (name, namelen);
struct dx_frame frames[2], *frame;
if (!(frame = dx_probe (dir, hash, frames)))
@@ -818,8 +860,7 @@
char *data1;
 
if (!namelen) return -EINVAL;
-   if (ext2_dx && is_dx(dir))
-   {
+   if (is_dx(dir)) {
unsigned count, split, hash2, block2;
struct buffer_head *bh2;
char *data2;
@@ -989,7 +1032,7 @@
if ((de->inode? rlen - nlen: rlen) >= reclen) goto add;
de = (ext2_dirent *) ((char *) de + rlen);
}
-   if (ext2_dx && blocks == 1 && dir->u.ext2_i.i_flags & EXT2_INDEX_FL)
+   if (ext2_dx && blocks == 1 && test_opt(dir->i_sb, INDEX))
{
struct dx_root *root;
struct buffer_head *newbh, *rootbh = bh;
@@ -1029,6 +1072,7 @@
dx_set_block (entries, 1);
dx_set_count (entries, 1);
dx_set_limit (entries, dx_root_limit(dir, sizeof(struct 
dx_root_info)));
+   ext2_update_to_indexed(dir);
 
/* Initialize as for dx_probe */
hash = dx_hash (name, namelen);
@@ -1107,26 +1154,11 @@
 
 int ext2_make_empty(struct inode *dir, struct inode *parent)
 {
-   struct super_block *sb = dir->i_sb;
struct buffer_head *bh;
-   int make_dx = test_opt (sb, INDEX);
-   dir->i_size = sb->s_blocksize;
+
+   dir->i_size = dir->i_sb->s_blocksize;
   

Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-10 Thread Andreas Dilger

I previously wrote:
> I was looking at the new patch, and I saw something that puzzles me.
> Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather
> than only setting it when the dx_root index is created?
> 
> Setting the flag earlier than that makes it mostly useless, since it
> will be set on basically every directory.  Not setting it would also
> make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no
> need to also check size).
> 
> Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we have
> a (real) directory with an index, to avoid gratuitous incompatibility
> with e2fsck.

I have changed the code to do the following:
- If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the
  INDEX mount option (the same as "mount -o index").  This removes
  the need to specify the "-o index" option each time for filesystems
  which already have indexed directories.
- New directories NEVER have the INDEX flag set on them.
- If the INDEX mount option is set, then when directories grow past 1
  block (and have the index added) they will get the directory INDEX
  flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off).

This means that you can have common code for indexed and non-indexed ext2
filesystems, and the admin either needs to explicitly set COMPAT_DIR_INDEX
in the superblock or mount with "-o index" (and create a directory > 1 block).

I have also added some tricks to ext2_inc_count() and ext2_dec_count() so
that indexed directories are not subject to the EXT2_LINK_MAX.  I've done
the same as reiserfs, and set i_nlink = 1 if we overflow EXT2_LINK_MAX
(which has been increased to 65500 for indexed directories). Apparently
i_nlink = 1 is the right think to do w.r.t. find and other user tools.

Patches need some light testing before being posted.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-09 Thread Andreas Dilger

Daniel writes [re index directories]:
> This is lightly tested and apparently stable.

I was looking at the new patch, and I saw something that puzzles me.
Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather
than only setting it when the dx_root index is created?

Setting the flag earlier than that makes it mostly useless, since it
will be set on basically every directory.  Not setting it would also
make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no
need to also check size).

Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we have
a (real) directory with an index, to avoid gratuitous incompatibility
with e2fsck.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-06 Thread Daniel Phillips

This patch updates ext2_getblk and ext2_bread to use the ERR_PTR style 
of error return.  As Al Viro pointed out, this is a better way of doing 
things for a function returning a pointer.  This approach would have 
prevented the bug I fixed with the previous patch.  20 20 hindsight, 
and I can only plead that I was following the interface of the old 
ext2_getblk.  But since these functions are only used only by the ext2 
directory code - which in turn is the only part of ext2 that is 
interested in file data - there was no problem changing the interface.

The patch is at:

http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-4

This is lightly tested and apparently stable.  I wish I could say the 
same for kernel 2.4.4 - cache performance sucks horribly.  
(Nontechnical evaluation.)  So it is probably not a good idea to take 
benchmarks too seriously this month.  The previous stable kernels, 
2.4.2 and 2.4.3, had their problems too, fixable via patching.  Maybe 
next month...

This patch requires Al Viro's directory-in-page-cache patch to be 
applied first, available from:

   ftp://ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz

The other flavor of indexing patch, dx.testme..., also does 
directory-in-page-cache, using the good old ext2 directory code.  This 
works fine and is stable, but IMHO Al's patches constitute a pretty 
major cleanup.

To apply:

cd source/tree
zcat ext2-dir-patch-S4.gz | patch -p1
cat dx.pcache-2.4.4-4 | patch -p0

To create an indexed directory:

mount /dev/hdxxx /test -o index
mkdir /test/foo

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-03 Thread Albert Cranford

Works fine now.  I'll keep on testing.
Thanks a bunch,
Albert
Daniel Phillips wrote:
> 
> > This combination against 2.4.4 won't allow directories to be moved.
> > Ex: mv a b #fails with I/O error.  See attached strace.
> > But with ext2-dir-patch-S4 by itself, mv works as it should.
> 
> Now it also works with my index patch as it should:
> 
> http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-3
> 

-- 
Albert Cranford Deerfield Beach FL USA
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-03 Thread Daniel Phillips

> This combination against 2.4.4 won't allow directories to be moved.
> Ex: mv a b #fails with I/O error.  See attached strace.
> But with ext2-dir-patch-S4 by itself, mv works as it should.

Now it also works with my index patch as it should:

http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-3

It was just an uninitialized *err return.  Next patch I'll follow Al
Viro's suggestion and change ext2_getblk (used *only* in the directory  
code) to use ERR_PTR instead of *err, saving an argument and eliminating
the chance for this kind of dumb error.

ext2_getblk now looks like:

struct buffer_head *ext2_getblk (struct inode *inode, u32 block, int create, int *err)
{
unsigned blockshift = inode->i_sb->s_blocksize_bits;
unsigned blocksize = 1 << blockshift;
unsigned pshift = PAGE_CACHE_SHIFT - blockshift;
struct buffer_head *bh;
struct page *page;

if (!(page = grab_cache_page(inode->i_mapping, block >> pshift)))
goto fail_page;
if (!page->buffers)
create_empty_buffers (page, inode->i_dev, blocksize);
bh = page_buffer (page, block & ((1 << pshift) - 1));
atomic_inc(&bh->b_count);
UnlockPage(page);
page_cache_release(page);
*err = 0;
if (buffer_mapped(bh))
return bh;
if ((*err = ext2_get_block(inode, block, bh, create)))
goto out_null;
if (!buffer_mapped(bh))
goto out_null;
if (!buffer_new(bh))
return bh;
lock_buffer(bh);
memset(bh->b_data, 0, blocksize);
mark_buffer_uptodate(bh, 1);
mark_buffer_dirty_inode(bh, inode);
unlock_buffer(bh);
return bh;
out_null:
brelse(bh);
return NULL;
fail_page:
*err = -EIO;
return NULL;
}

A little longer, somewhat clearer and much more correct.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-02 Thread Daniel Phillips

On Thursday 03 May 2001 03:15, you wrote:
> Hello Daniel,
> This combination against 2.4.4 won't allow directories to be moved.
> Ex: mv a b #fails with I/O error.  See attached strace.
> But with ext2-dir-patch-S4 by itself, mv works as it should.
> Later,
> Albert

Thanks Albert, this was easily reproduceable here but did not show
up with 2.4.2 (uml), which I'd used to develop the patch.  Analyzing
now...

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-05-01 Thread Albert Cranford

Hello Daniel,
This combination against 2.4.4 won't allow directories to be moved.
Ex: mv a b #fails with I/O error.  See attached strace.
But with ext2-dir-patch-S4 by itself, mv works as it should.
Later,
Albert

Daniel Phillips wrote:
> 
> > Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz
> 
> Here is my ext2 directory index as a patch against your patch:
> 
> http://kernelnewbies.org/~phillips/htree/dx.pcache-2.4.4
> 
> Changes:
> 
> - COMBSORT macro replaced by custom sort code
> - Most #ifdef CONFIG_EXT2_INDEX's changed to if ()
> 
> To do:
> 
>   - Split up the split code
>   - Finalize hash function
>   - Test/debug big endian
>   - Fall back to linear search if bad index detected
>   - Fail gracefully on random data
>   - Remove the tracing and test options
> 
> To apply:
> 
> cd source/tree
> zcat ext2-dir-patch-S4.gz | patch -p1
> cat dx.pcache-2.4.4 | patch -p0
> 
> To create an indexed directory:
> 
> mount /dev/hdxxx /test -o index
> mkdir /test/foo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Albert Cranford Deerfield Beach FL USA
[EMAIL PROTECTED]

execve("/bin/mv", ["mv", "a", "b"], [/* 47 vars */]) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40007000
mprotect(0x4000, 21025, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0x8048000, 52997, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
stat("/etc/ld.so.cache", {st_mode=S_IFREG|0644, st_size=44476, ...}) = 0
open("/etc/ld.so.cache", O_RDONLY)  = 3
mmap(NULL, 44476, PROT_READ, MAP_SHARED, 3, 0) = 0x40008000
close(3)= 0
stat("/etc/ld.so.preload", {st_mode=S_IFREG|0644, st_size=1, ...}) = 0
open("/etc/ld.so.preload", O_RDONLY)= 3
mmap(NULL, 2, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x40013000
close(3)= 0
munmap(0x40013000, 2)   = 0
open("/lib/libc.so.5", O_RDONLY)= 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\*\1\000"..., 4096) = 4096
mmap(NULL, 786432, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40013000
mmap(0x40013000, 555135, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x40013000
mmap(0x4009b000, 21344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x87000) = 
0x4009b000
mmap(0x400a1000, 204364, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 
-1, 0) = 0x400a1000
close(3)= 0
mprotect(0x40013000, 555135, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
munmap(0x40008000, 44476)   = 0
mprotect(0x8048000, 52997, PROT_READ|PROT_EXEC) = 0
mprotect(0x40013000, 555135, PROT_READ|PROT_EXEC) = 0
mprotect(0x4000, 21025, PROT_READ|PROT_EXEC) = 0
personality(PER_LINUX)  = 0
geteuid()   = 0
getuid()= 0
getgid()= 0
getegid()   = 0
brk(0x80568bc)  = 0x80568bc
brk(0x8057000)  = 0x8057000
geteuid()   = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
stat("b", 0xb76c)   = -1 ENOENT (No such file or directory)
lstat("a", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("b", 0x8056660)   = -1 ENOENT (No such file or directory)
rename("a", "b")= -1 EIO (Input/output error)
write(2, "mv: ", 4mv: ) = 4
write(2, "cannot move `a\' to `b\'", 22cannot move `a' to `b') = 22
stat("/etc/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or directory)
stat("/usr/share/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or 
directory)
stat("/usr/share/locale/libc/C", 0xb2f0) = -1 ENOENT (No such file or directory)
stat("/usr/share/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or 
directory)
stat("/usr/local/share/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or 
directory)
write(2, ": I/O error", 11: I/O error) = 11
write(2, "\n", 1
)   = 1
_exit(1)= ?



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-04-29 Thread Daniel Phillips

> Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz

Here is my ext2 directory index as a patch against your patch:

http://kernelnewbies.org/~phillips/htree/dx.pcache-2.4.4

Changes:

- COMBSORT macro replaced by custom sort code
- Most #ifdef CONFIG_EXT2_INDEX's changed to if ()

To do:

  - Split up the split code
  - Finalize hash function
  - Test/debug big endian
  - Fall back to linear search if bad index detected
  - Fail gracefully on random data
  - Remove the tracing and test options

To apply:

cd source/tree
zcat ext2-dir-patch-S4.gz | patch -p1
cat dx.pcache-2.4.4 | patch -p0

To create an indexed directory:

mount /dev/hdxxx /test -o index
mkdir /test/foo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][CFT] (updated) ext2 directories in pagecache

2001-04-28 Thread Alexander Viro

Patch rediffed to 2.4.4, otherwise - no changes (2.4.4 has a
fix for ext2 race, but it's unrelated to the thing).

Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz
Please, help with testing.
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH][CFT] (updated) ext2 directories in pagecache

2001-04-23 Thread Alexander Viro



On Thu, 12 Apr 2001, Alexander Viro wrote:

>   Folks, IMO ext2-dir-patch got to the stable stage. Currently
> it's against 2.4.4-pre2, but it should apply to anything starting with
> 2.4.2 or so.
> 
>   Ted, could you review it for potential inclusion into 2.4 once
> it gets enough testing? It's ext2-only (the only change outside of
> ext2 is exporting waitfor_one_page()), it doesn't change fs layout,
> it seriously simplifies ext2/dir.c and ext2/namei.c and it gives better
> VM behaviour.

Previous variant left junk in ->d_type of directory entries
on "old" filesystems (i.e. ones where it should be zeroed). Harmless
(on these filesystems readdir() returned DT_UNKNOWN anyway), but
it PO'd fsck and was the wrong thing anyway.

Fixed and rediffed against current tree (2.4.4-pre6). Folks,
please help with testing.

Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4-pre6.gz

Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/