Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Sorry, I couldn't think of any good flames that haven't already been posted so I thought I'd be boring and post some code. ;-) On Monday 14 May 2001 23:50, Daniel Phillips wrote: > On Monday 14 May 2001 20:33, Andreas Dilger wrote: > > Daniel, you write: > > > Now, if the check routine tells us how much good data it found we > > > could use that to set a limit for the dirent scan, thus keeping > > > the same robustness as the old code but without having all the > > > checks in the inner loop. Or. We could have separate loops for > > > good blocks and bad blocks, it's just a very small amount of > > > code. > > > > Yes, I was thinking about both of those as well. I think the > > latter would be easiest, because we only need to keep a single > > error bit per buffer. > > Today's patch has the first part of that fix: > > http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-6 And today's has the second part of that mechanism. This is the strategy: - When a directory block is first called for via ext2_bread, !create, and is !uptodate, ext2_check_dirents is called to go through and sniff anally for anything that might not be right. This check only happens once per block's cache lifetime. - If check_dirents returns failure, ext2_bread sets the PG_error bit in the buffer's page flags. - Just before we do the lowlevel scan of a directory leaf we check the page error bit, and if it's set call check_dirents, this time asking it to tell us how much of the block is good. That value is used to set the limit for the lowlevel scan. This recovers the old behaviour where the user continues to see all the directory entries up to the first bad one. Is it really important to do this? Now at least we can decide whether that's the behaviour we want instead of worrying about how it can be implemented efficiently. This high level of paranoia costs practically nothing; there are no extra checks to slow down the inner loop. Thanks to Al Viro for the inspiration. I only implemented this check in one place, line 807 in ext2_find_entry on the nonindexed path. If the approach looks ok I'll go and add it in the other places. Extending the idea to do an equally paranoid check on the directory index structure will add a little messiness because check_dirents can't tell that a given block is actually an index block, it has to be told. I'll just let this sit and age a little before I uglify the code in that way. On another front, I've changed to a new COMPAT flag so that Andreas Gruenbacher's ACL users can stick with the flag Andreas is already using. I haven't heard a definitive ruling from Ted on this yet, but I gather this is the one I'm now supposed to use: #define EXT2_FEATURE_COMPAT_DIR_INDEX 0x0020 The current patch uses this, pending Ted's approval. I'll repeat my warning that any partition with an indexed directory will need to be mke2fsed, until the patch gets to official alpha. The patch is at: http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-7 This is hardly tested at all, it's for comment. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Monday 14 May 2001 22:04, Andreas Dilger wrote: > Daniel writes: > > I was originally thinking we should give the admin the ability to > > create a nonindexed directory if desired, and that's how it used to > > be before we changed the setting of INDEX_FL from directory > > creation time to later, when the first directory block overflows. > > But you still got INDEX_FL on each new (unindexed) directory, which > didn't help much, IMHO. You can always do "chattr -h " (or > whatever the flag will be called) for a directory which you don't > want indexed, and it will immediately cease to be an indexed > directory (and the index block will revert to a normal directory > block). This is my point. The can't do that nicely because they have to wait for the directory to grow past one block. I don't really want to go back to the original way of doing it - the new code is simpler. But... as long as this sits fine with you, it's done. > The 'i' flag is already used for the immutable (unchangeable) > property. We could use 'I' for indexed directories, but sometimes you > will make a mistake and use 'i' and in _theory_ you cannot remove the > immutable flag from a file while in multi-user mode for security > reasons, so it would be a hassle to fix this. Note I haven't sent a > patch to Ted for the hash_indexed flag for chattr, lsattr yet. We still have lots of time. I see "x" isn't taken. > The only point of contention is what e2fsck will do in the case where > COMPAT_DIR_INDEX is set on a filesystem. Does it index all > multi-block directories (and set INDEX_FL for all of them)? That > would be easiest, but it breaks the ability to selectively turn > on/off directory indexing. > > If e2fsck doesn't index all multi-block directories, then how do you > add indexing to an existing directory? Something (kludgy) like: > > mkdir newdir > mv olddir/* newdir > rmdir olddir > > or the following (the "" is the tricky > part...)(*) > > find -xdev -type d -size + | xargs > chattr +h umount > e2fsck > > In general, I can't see a good reason why you would NOT want indexing > on some directories and not others once you have chosen to use > indexing on a filesystem. If people don't want to use indexing, they > just shouldn't turn it on. "Always on" is so much simpler, given that they asked for it. > Since indexing is a "zero information" feature, it is easy to turn > the flag on and off and have e2fsck fix it (assuming you have a > version of e2fsck that understands indexing). Even so, you can turn > off indexing for old e2fsck (by using a backup superblock as it will > suggest) and your filesystem is totally OK. When you later get a new > e2fsck you can re-enable indexing for the filesystem with no loss of > information. > > (*) This brings up a problem with the "is_dx(dir)" simply checking if > the EXT2_INDEX_FL is set. We need to be sure that the code will bail > out to linear directory searching properly if there is no index > present, so it is possible to set this flag from user space for later > e2fsck. You mean in the case that EXT2_INDEX_FL is set, but there isn't actually an index there, or it's corrupt? It's on my list of things to do. > > The max link count logic in your previous patch didn't seem to be > > complete so I didn't put it in. I'll wait for more from you on > > that. I followed the ReseirFS thread on this but I'm not sure > > exactly what form you want this to take in ext2. > > I'm not sure what you mean by incomplete I didn't mean that, I found the rest of the code (in the other patch) shortly after I wrote it and forgot to edit the mail. > (although I did have the > aforementioned testing problems, so I never did try > 64k > subdirectories in a single directory). Basically, it works the same > as reiserfs, with the extra caveat that we limit ourselves to > EXT2_LINK_MAX for non-indexed directories (for performance/sanity > reasons). We keep link counts for indexed directories until we > overflow 64k subdirectories (as opposed to the arbitrary 32000 limit > of EXT2_LINK_MAX), after which we use "1" as the directory link > count. I also want to make sure the code degrades to the original > (slightly simpler) code if CONFIG_EXT2_INDEX is not set. > > > The index create code has been moved out of the non-indexed code > > path and into the scope of the indexed path. This is with a view > > to cutting ext2_add_entry into pieces at some point, for aesthetic > > reasons. > > Yes, I was hoping for something like this. That function is just > getting way too long... > > I'll hopefully get a chance to look at the new code soon. There's more new code today, you can skip yesterday's :-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Monday 14 May 2001 22:04, Andreas Dilger wrote: > Maybe we can have a "noindex" mount option for this? We need that regardless, I just keep forgetting to put it in. I assume the semantics are obvious: no new indexes are created but existing ones are maintained. I.e., -o noindex does not mean 'seek out and destroy all indexes'. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Monday 14 May 2001 20:33, Andreas Dilger wrote: > Daniel, you write: > > Now, if the check routine tells us how much good data it found we > > could use that to set a limit for the dirent scan, thus keeping the > > same robustness as the old code but without having all the checks > > in the inner loop. Or. We could have separate loops for good > > blocks and bad blocks, it's just a very small amount of code. > > Yes, I was thinking about both of those as well. I think the latter > would be easiest, because we only need to keep a single error bit per > buffer. Today's patch has the first part of that fix: http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-6 I broke up Al's check_page routine into the page-specific part and the dirent-specific part, which I call every time a buffer is brought uptodate in ext2_bread. This is roughly as efficient as Al's page-oriented check. I could get rid of the code in Al's check_page that initializes sets the rec_lens of a new dir page to blocksize because I do that explicitly in ext2_add_entry now. This will make it a little cleaner. The next step is to try and incorporate the intelligence about the good parts of a bad dirent block into the entry lookup code cleanly. I moved ext2_bread and ext2_append into dir.c because of their dir-specific nature. (I have some plans for ext2_getblk that have nothing to do with directories, which is why I'm trying to keep it generic.) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Daniel writes: > On Thursday 10 May 2001 09:21, Andreas Dilger wrote: > > I have changed the code to do the following: > > - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the > > INDEX mount option (the same as "mount -o index"). This removes > > the need to specify the "-o index" option each time for filesystems > > which already have indexed directories. > > This applied fine. The semantics are a little loose though: after > mounting -o index, the first directory that overflows its first block > will turn on 'sticky indexing' for that partition. Maybe we can have a "noindex" mount option for this? In any case, if you have a kernel which supports indexed directories, no harm is done, and it will be ignored by other kernels. There is always the option of doing "tune2fs -O ^dir_index" to turn off that bit. > I was originally thinking we should give the admin the ability to > create a nonindexed directory if desired, and that's how it used to be > before we changed the setting of INDEX_FL from directory creation time > to later, when the first directory block overflows. But you still got INDEX_FL on each new (unindexed) directory, which didn't help much, IMHO. You can always do "chattr -h " (or whatever the flag will be called) for a directory which you don't want indexed, and it will immediately cease to be an indexed directory (and the index block will revert to a normal directory block). The 'i' flag is already used for the immutable (unchangeable) property. We could use 'I' for indexed directories, but sometimes you will make a mistake and use 'i' and in _theory_ you cannot remove the immutable flag from a file while in multi-user mode for security reasons, so it would be a hassle to fix this. Note I haven't sent a patch to Ted for the hash_indexed flag for chattr, lsattr yet. The only point of contention is what e2fsck will do in the case where COMPAT_DIR_INDEX is set on a filesystem. Does it index all multi-block directories (and set INDEX_FL for all of them)? That would be easiest, but it breaks the ability to selectively turn on/off directory indexing. If e2fsck doesn't index all multi-block directories, then how do you add indexing to an existing directory? Something (kludgy) like: mkdir newdir mv olddir/* newdir rmdir olddir or the following (the "" is the tricky part...)(*) find -xdev -type d -size + | xargs chattr +h umount e2fsck In general, I can't see a good reason why you would NOT want indexing on some directories and not others once you have chosen to use indexing on a filesystem. If people don't want to use indexing, they just shouldn't turn it on. Since indexing is a "zero information" feature, it is easy to turn the flag on and off and have e2fsck fix it (assuming you have a version of e2fsck that understands indexing). Even so, you can turn off indexing for old e2fsck (by using a backup superblock as it will suggest) and your filesystem is totally OK. When you later get a new e2fsck you can re-enable indexing for the filesystem with no loss of information. (*) This brings up a problem with the "is_dx(dir)" simply checking if the EXT2_INDEX_FL is set. We need to be sure that the code will bail out to linear directory searching properly if there is no index present, so it is possible to set this flag from user space for later e2fsck. > I broke out 'ext2_add_compat_feature' as a trial balloon. It's not > really very satisfying - it's annoying to have a great piece of > machinery that's used in exactly one place. I'd prefer to have > 'ext2_add_feature' that takes the kind of compatibility as a parameter. > We can do this by treating the three ext2_super_block compatibility > fields as an array, what do you think? Then we'd get to use it in two > places :-] Actually, this is used a lot more in ext3, because we set the HAS_JOURNAL flag, and set/unset the RECOVERY flag in several places, so it will be useful there. > The max link count logic in your previous patch didn't seem to be > complete so I didn't put it in. I'll wait for more from you on that. I > followed the ReseirFS thread on this but I'm not sure exactly what form > you want this to take in ext2. I'm not sure what you mean by incomplete (although I did have the aforementioned testing problems, so I never did try > 64k subdirectories in a single directory). Basically, it works the same as reiserfs, with the extra caveat that we limit ourselves to EXT2_LINK_MAX for non-indexed directories (for performance/sanity reasons). We keep link counts for indexed directories until we overflow 64k subdirectories (as opposed to the arbitrary 32000 limit of EXT2_LINK_MAX), after which we use "1" as the directory link count. I also want to make sure the code degrades to the original (slightly simpler) code if CONFIG_EXT2_INDEX is not set. > The index create code has been moved out of the non-indexed code path > and into the scope of the indexed path. This is with a
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Monday 14 May 2001 20:33, Andreas Dilger wrote: > Danie, you write: > > This can go in ext2_bread, which already has dir-specific code in > > it (readahead), and ext2_getblk remains generic, for what it's > > worth. > > Note that the dir-specific code in ext2_bread() is not readahead, but > rather directory block pre-allocation, oops, yes that's what I meant. > which would totally break > indexed directory code (because the empty blocks would not be put > into the index and would remain unreferenced thereafter). The preallocation just does ext2_getblk, it doesn't increase i_size. The indexing code relies on i_size to append a new block, so this should work ok. > For that matter, I'm not sure whether the dir-prealloc feature even > works. The blocks are created/allocated, but are not initialized with > the empty dirent data (i.e. set rec_len = blocksize), so it would > just create a corrupt directory block. I guess the lifetime of these blocks is the same as the lifetime of the directory's dcache entry, but I have to go spelunking to be sure. If that's right then it's probably a good feature. Having speculated about it, I should test this and see what happens. > > > ...(later)... I had a look at pulling out the ext2_check_page() > > > code so that it can be used for both ext2_get_page() and > > > ext2_bread(), but one thing concerns me - if we do checking on > > > the whole page/buffer at once, then any error in the page/buffer > > > will discard all of the dir entries in that page. In the old > > > code, we would at least return all of the entries up to the bad > > > dir entry. Comments on this? > > > > For a completely empty page that's the right thing to do, much > > better than hanging as it does now. We don't want to try to repair > > damage on the fly do we? > > What do you mean by hanging? You refer to new (indexed) code or old > code? I mean the new code, which just loops forever if it gets a zero rec_len. It would be nice to get rid of this fragility without putting an extra check in the inner loop. I think it's worth the effort, we are still cycling linearly through an average of roughly 128 directory entires on every directory operation. I'll try fixing this by method related to what Al does, except instead of using the page flag, the place to do it is where the block gets mapped. [...] No repair on the fly: Ack [...] > > Now, if the check routine tells us how much good data it found we > > could use that to set a limit for the dirent scan, thus keeping the > > same robustness as the old code but without having all the checks > > in the inner loop. Or. We could have separate loops for good > > blocks and bad blocks, it's just a very small amount of code. > > Yes, I was thinking about both of those as well. I think the latter > would be easiest, because we only need to keep a single error bit per > buffer. Heh. I was going to try the other one first. By the way, all the code for the directory link limit enhancement was in fact in your last patch, and is included in the patch I uploaded yesterday. I haven't tested it. It would be nice to know how fast we can create 1,000,000 empty directories :-) I noticed that Postfix creates an elaborate directory bucket structure that will be completely obsolete when the directory indexing becomes generally available. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Danie, you write: > This can go in ext2_bread, which already has dir-specific code in it > (readahead), and ext2_getblk remains generic, for what it's worth. Note that the dir-specific code in ext2_bread() is not readahead, but rather directory block pre-allocation, which would totally break indexed directory code (because the empty blocks would not be put into the index and would remain unreferenced thereafter). For that matter, I'm not sure whether the dir-prealloc feature even works. The blocks are created/allocated, but are not initialized with the empty dirent data (i.e. set rec_len = blocksize), so it would just create a corrupt directory block. > > ...(later)... I had a look at pulling out the ext2_check_page() code > > so that it can be used for both ext2_get_page() and ext2_bread(), but > > one thing concerns me - if we do checking on the whole page/buffer at > > once, then any error in the page/buffer will discard all of the dir > > entries in that page. In the old code, we would at least return all > > of the entries up to the bad dir entry. Comments on this? > > For a completely empty page that's the right thing to do, much better > than hanging as it does now. We don't want to try to repair damage on > the fly do we? What do you mean by hanging? You refer to new (indexed) code or old code? We definitely cannot repair this damage on-the-fly. It would be hard/work to find the next valid dir entry in the block, and that may be a false positive. Running e2fsck fixes this by clearing that dirent and setting the rec_len to the rest of the block (hence deleting all of the remaining dirents in the block). The inodes they point to will be put in lost+found, if that was the only link. Note that e2fsck also guarantees that a directory start with "." and "..", so we are pretty safe with the first block-split code. > Now, if the check routine tells us how much good data it found we could > use that to set a limit for the dirent scan, thus keeping the same > robustness as the old code but without having all the checks in the > inner loop. Or. We could have separate loops for good blocks and bad > blocks, it's just a very small amount of code. Yes, I was thinking about both of those as well. I think the latter would be easiest, because we only need to keep a single error bit per buffer. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Thursday 10 May 2001 09:21, Andreas Dilger wrote: > I previously wrote: > > I was looking at the new patch, and I saw something that puzzles > > me. Why do you set the EXT2_INDEX_FL on a new (empty) directory, > > rather than only setting it when the dx_root index is created? > > > > Setting the flag earlier than that makes it mostly useless, since > > it will be set on basically every directory. Not setting it would > > also make your is_dx() check simply a check for the EXT2_INDEX_FL > > bit (no need to also check size). > > > > Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we > > have a (real) directory with an index, to avoid gratuitous > > incompatibility with e2fsck. This is fixed in today's patch. > I have changed the code to do the following: > - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the > INDEX mount option (the same as "mount -o index"). This removes > the need to specify the "-o index" option each time for filesystems > which already have indexed directories. This applied fine. The semantics are a little loose though: after mounting -o index, the first directory that overflows its first block will turn on 'sticky indexing' for that partition. I was originally thinking we should give the admin the ability to create a nonindexed directory if desired, and that's how it used to be before we changed the setting of INDEX_FL from directory creation time to later, when the first directory block overflows. You are probably right on both counts, but it's something to think about. > - New directories NEVER have the INDEX flag set on them. > - If the INDEX mount option is set, then when directories grow past 1 > block (and have the index added) they will get the directory INDEX > flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off). It works this way now. > This means that you can have common code for indexed and non-indexed > ext2 filesystems, and the admin either needs to explicitly set > COMPAT_DIR_INDEX in the superblock or mount with "-o index" (and > create a directory > 1 block). Actually, I've had common code for indexed and non-indexed right from the first prototype, I hope there isn't any misunderstanding there. > I have also added some tricks to ext2_inc_count() and > ext2_dec_count() so that indexed directories are not subject to the > EXT2_LINK_MAX. I've done the same as reiserfs, and set i_nlink = 1 > if we overflow EXT2_LINK_MAX (which has been increased to 65500 for > indexed directories). Apparently i_nlink = 1 is the right think to do > w.r.t. find and other user tools. > > Patches need some light testing before being posted. Today's Patch --- The main change in today's patch is the simplified handling of the index creation. Besides that there are quite a few incremental improvements and most, but not all of your recent patch set is in. I broke out 'ext2_add_compat_feature' as a trial balloon. It's not really very satisfying - it's annoying to have a great piece of machinery that's used in exactly one place. I'd prefer to have 'ext2_add_feature' that takes the kind of compatibility as a parameter. We can do this by treating the three ext2_super_block compatibility fields as an array, what do you think? Then we'd get to use it in two places :-] I wrote ext2_append as a wrapper on ext2_bread, mainly to bury the i_size handling. Its function is to append a single block to a file. I pass back the block number of the appended block via a pointer, which is clumsy but the only other choice is to calculate the block number every time outside the function, which just leaves more things to get out of sync. I found a real bug in the continued-hash handling - instead of advancing to the next index block the same block would be reread. Nobody hit this because even with a million file directory the error condition would exist for less than 1 in 250 million entries. Still, correct is correct. Having to deal with such rare conditions is the price we have to pay for the benefits of hashed keys. The max link count logic in your previous patch didn't seem to be complete so I didn't put it in. I'll wait for more from you on that. I followed the ReseirFS thread on this but I'm not sure exactly what form you want this to take in ext2. I followed your recommendation and simplified the index creation and INDEX_FL flag interpretation. Now the index is created for any directory that grows over a block, whereas before the condition was 'any new directory created with indexing enabled that later grows over one block'. The index create code has been moved out of the non-indexed code path and into the scope of the indexed path. This is with a view to cutting ext2_add_entry into pieces at some point, for aesthetic reasons. The one place where I'm relying on a newly-created block to be contain zeros got a comment. I was unable to apply your patch all at once, and of course, I'd hacked on the co
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Sunday 13 May 2001 00:18, Alexander Viro wrote: > On Sat, 12 May 2001, Andreas Dilger wrote: > > We could use the "buffer_uptodate" flag on the buffer to signal > > that the block has been checked. AFAIK, a new buffer will not be > > uptodate, and once it is it will not be read from disk again... > > However, if a user-space process read the buffer would also mark it > > uptodate without doing the check... Maybe we should use a new BH_ > > pointer... Just need to factor out the ext2_check_page() code so > > that it works on a generic memory pointer and end pointer. > > Or you could simply use ext2_get_page() and forget about this crap. I tried that first. The resulting code was not nice and worked only for 4K block_size, as far as I took it. I'm not sure what advantage you see in ext2_get_page, perhaps you can explain. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Saturday 12 May 2001 23:41, Andreas Dilger wrote: > Daniel writes: > > Oh yes, I'm well aware it, that's what I mean by the "bullet > > proofing" item on my to-do list. I don't quite agree with the idea > > of embedding the checking of directory entry format inside the > > ext2_get_page routine, it should be moved outside ext2_get_page, on > > basic principal. Never mind that this routine is only used to get > > pages of directory files, it's still not nice. Anyway, I think the > > thing to do is graft a similar one-time check onto ext2_bread. We > > can't use Al's PG_checked mechanism because we might have read only > > one buffer out of several on the page. > > We could use the "buffer_uptodate" flag on the buffer to signal that > the block has been checked. AFAIK, a new buffer will not be > uptodate, and once it is it will not be read from disk again... > However, if a user-space process read the buffer would also mark it > uptodate without doing the check... Maybe we should use a new BH_ > pointer... Just need to factor out the ext2_check_page() code so that > it works on a generic memory pointer and end pointer. Yes, exactly what I had in mind. In other words, do the check when a !uptodate buffer is read. This can go in ext2_bread, which already has dir-specific code in it (readahead), and ext2_getblk remains generic, for what it's worth. > > > It turns out that in adding the checks for rec_len, I fixed my > > > original bug, but added another... Please disregard my previous > > > patch. > > > > In any event, I couldn't apply it due to patch giving a 'malformed' > > error - did you edit it after diffing? I'll wait for a revised > > patch - there's quite a lot of stuff in there including formatting > > changes, ext2 warnings, etc. > > Yes, I think I had mentioned this in the patch, maybe not. I removed > all of the INDEX flag changes I had made from the resulting diff, so > I probably screwed up somewhere... I went through your patch and distilled out the stylistic changes you made. I simplified the INDEX_FL handling along the lines you suggested and factored out a ext2_add_compat_feature routine among other incremental improvements. I'll post the update tomorrow. I did not attempt to address the check_ issue this time round. > ...(later)... I had a look at pulling out the ext2_check_page() code > so that it can be used for both ext2_get_page() and ext2_bread(), but > one thing concerns me - if we do checking on the whole page/buffer at > once, then any error in the page/buffer will discard all of the dir > entries in that page. In the old code, we would at least return all > of the entries up to the bad dir entry. Comments on this? For a completely empty page that's the right thing to do, much better than hanging as it does now. We don't want to try to repair damage on the fly do we? Now, if the check routine tells us how much good data it found we could use that to set a limit for the dirent scan, thus keeping the same robustness as the old code but without having all the checks in the inner loop. Or. We could have separate loops for good blocks and bad blocks, it's just a very small amount of code. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Sat, 12 May 2001, Andreas Dilger wrote: > We could use the "buffer_uptodate" flag on the buffer to signal that > the block has been checked. AFAIK, a new buffer will not be uptodate, > and once it is it will not be read from disk again... However, if a > user-space process read the buffer would also mark it uptodate without > doing the check... Maybe we should use a new BH_ pointer... Just need > to factor out the ext2_check_page() code so that it works on a generic > memory pointer and end pointer. Or you could simply use ext2_get_page() and forget about this crap. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Daniel writes: > Oh yes, I'm well aware it, that's what I mean by the "bullet proofing" > item on my to-do list. I don't quite agree with the idea of embedding > the checking of directory entry format inside the ext2_get_page > routine, it should be moved outside ext2_get_page, on basic principal. > Never mind that this routine is only used to get pages of directory > files, it's still not nice. Anyway, I think the thing to do is graft a > similar one-time check onto ext2_bread. We can't use Al's PG_checked > mechanism because we might have read only one buffer out of several on > the page. We could use the "buffer_uptodate" flag on the buffer to signal that the block has been checked. AFAIK, a new buffer will not be uptodate, and once it is it will not be read from disk again... However, if a user-space process read the buffer would also mark it uptodate without doing the check... Maybe we should use a new BH_ pointer... Just need to factor out the ext2_check_page() code so that it works on a generic memory pointer and end pointer. > > It turns out that in adding the checks for rec_len, I fixed my > > original bug, but added another... Please disregard my previous > > patch. > > In any event, I couldn't apply it due to patch giving a 'malformed' > error - did you edit it after diffing? I'll wait for a revised patch - > there's quite a lot of stuff in there including formatting changes, > ext2 warnings, etc. Yes, I think I had mentioned this in the patch, maybe not. I removed all of the INDEX flag changes I had made from the resulting diff, so I probably screwed up somewhere... ...(later)... I had a look at pulling out the ext2_check_page() code so that it can be used for both ext2_get_page() and ext2_bread(), but one thing concerns me - if we do checking on the whole page/buffer at once, then any error in the page/buffer will discard all of the dir entries in that page. In the old code, we would at least return all of the entries up to the bad dir entry. Comments on this? Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Friday 11 May 2001 18:34, Andreas Dilger wrote: > Al writes: > > On Fri, 11 May 2001, Andreas Dilger wrote: > > > I've tested again, now with kdb, and the system loops in > > > ext2_find_entry() or ext2_add_link(), because there is a > > > directory with a zero rec_len. While the actual cause of this > > > problem is elsewhere, the fact that ext2_next_entry() will loop > > > forever with a bad rec_len is a bug not in the old ext2 code. > > > > No. Bug is that data ends up in pages without being validated. > > That's the real thing to watch for - if ext2_get_page() is the only > > way to get pages in cache you get all checks in one place and done > > once. > > OK, I don't think that Daniel is aware of this, I wasn't either. He > is using ext2_bread() modified to access the page cache instead of > the buffer cache. Oh yes, I'm well aware it, that's what I mean by the "bullet proofing" item on my to-do list. I don't quite agree with the idea of embedding the checking of directory entry format inside the ext2_get_page routine, it should be moved outside ext2_get_page, on basic principal. Never mind that this routine is only used to get pages of directory files, it's still not nice. Anyway, I think the thing to do is graft a similar one-time check onto ext2_bread. We can't use Al's PG_checked mechanism because we might have read only one buffer out of several on the page. > It turns out that in adding the checks for rec_len, I fixed my > original bug, but added another... Please disregard my previous > patch. In any event, I couldn't apply it due to patch giving a 'malformed' error - did you edit it after diffing? I'll wait for a revised patch - there's quite a lot of stuff in there including formatting changes, ext2 warnings, etc. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Al writes: > On Fri, 11 May 2001, Andreas Dilger wrote: > > I've tested again, now with kdb, and the system loops in ext2_find_entry() > > or ext2_add_link(), because there is a directory with a zero rec_len. > > While the actual cause of this problem is elsewhere, the fact that > > ext2_next_entry() will loop forever with a bad rec_len is a bug not in > > the old ext2 code. > > No. Bug is that data ends up in pages without being validated. That's > the real thing to watch for - if ext2_get_page() is the only way to > get pages in cache you get all checks in one place and done once. OK, I don't think that Daniel is aware of this, I wasn't either. He is using ext2_bread() modified to access the page cache instead of the buffer cache. It turns out that in adding the checks for rec_len, I fixed my original bug, but added another... Please disregard my previous patch. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Friday 11 May 2001 09:10, Andreas Dilger wrote: > and previously wrote: > > OK, here are the patches described above. > > > > Unfortunately, they haven't been tested. I've given them several > > eyeballings and they appear OK, but when I try to run the ext2 > > index code (even without "-o index" mount option) my system > > deadlocks somwhere inside my ext2i module (tight loop, but SysRQ > > still works). I doubt it is due to these patches, but possibly > > because I am also working on ext3 code in the same kernel. I'm > > just in the process of getting kdb installed into that kernel so I > > can find out just where it is looping. > > I've tested again, now with kdb, and the system loops in > ext2_find_entry() or ext2_add_link(), because there is a directory > with a zero rec_len. While the actual cause of this problem is > elsewhere, the fact that ext2_next_entry() will loop forever with a > bad rec_len is a bug not in the old ext2 code. > > I have changed ext2_next_entry() to verify rec_len > > EXT2_DIR_REC_LEN(0), and added a helper function ext2_next_empty() > which is nearly the same, but it follows the de links until it finds > one with enough space for a new entry (used in ext2_add_link()). The > former is useful for both Al and Daniel's code, while the latter may > only be useful for Daniel's. Al already answered this, but I'll amplify. Al keeps a bit in the page state that tells you if a page in the page cache has been 'checked' since it was entered into the cache, allowing his code to carry out a careful, expensive check that is performed only once per cache page lifetime. This will pick up bad data coming from disk, and we assume that newly created entries have been created correctly. It's an interesting idea, but note his comment that 'this should die early'. I'm torn on that, it's hard to think of a cuter way of doing this. We could use the same strategy in ext2_getblk, and in fact use the same check_page code, providing the PG_checked bit is going to stay. > However, the way that I currently fixed ext2_next_entry() is probably > not very safe. If it detects a bad rec_len, we should probably not > touch that block anymore, but it currently just makes it look like > the block is empty. That may lead to deleting the rest of the > directory entries in the block, although this is what e2fsck does > anyways... I at least need to set the error flag in the > superblock... Next patch. We can set the PG_error flag on the page, then we get the complaint just once per mount (typically, unless you have cache pressure, then you probably want to get lots of complaints). > I spotted another bug, namely that when allocating a new directory > page it sets rec_len to blocksize, but does not zero the inode > field... This would imply that we would skip a bogus directory entry > at the start of a new page. The block is supposed to be zeroed when created. I think I did that correctly in ext2_getblk. This at least deserves a comment. > As yet I haven't found the cause of the real bug, but at least now my > kernel doesn't lock up on (relatively common) bogus data. Yep, this counts as 'bullet proofing'. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Fri, 11 May 2001, Andreas Dilger wrote: > I've tested again, now with kdb, and the system loops in ext2_find_entry() > or ext2_add_link(), because there is a directory with a zero rec_len. > While the actual cause of this problem is elsewhere, the fact that > ext2_next_entry() will loop forever with a bad rec_len is a bug not in > the old ext2 code. No. Bug is that data ends up in pages without being validated. That's the real thing to watch for - if ext2_get_page() is the only way to get pages in cache you get all checks in one place and done once. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
I previously wrote: > OK, here are the patches described above. > > Unfortunately, they haven't been tested. I've given them several > eyeballings and they appear OK, but when I try to run the ext2 index code > (even without "-o index" mount option) my system deadlocks somwhere > inside my ext2i module (tight loop, but SysRQ still works). I doubt > it is due to these patches, but possibly because I am also working on > ext3 code in the same kernel. I'm just in the process of getting kdb > installed into that kernel so I can find out just where it is looping. I've tested again, now with kdb, and the system loops in ext2_find_entry() or ext2_add_link(), because there is a directory with a zero rec_len. While the actual cause of this problem is elsewhere, the fact that ext2_next_entry() will loop forever with a bad rec_len is a bug not in the old ext2 code. I have changed ext2_next_entry() to verify rec_len > EXT2_DIR_REC_LEN(0), and added a helper function ext2_next_empty() which is nearly the same, but it follows the de links until it finds one with enough space for a new entry (used in ext2_add_link()). The former is useful for both Al and Daniel's code, while the latter may only be useful for Daniel's. However, the way that I currently fixed ext2_next_entry() is probably not very safe. If it detects a bad rec_len, we should probably not touch that block anymore, but it currently just makes it look like the block is empty. That may lead to deleting the rest of the directory entries in the block, although this is what e2fsck does anyways... I at least need to set the error flag in the superblock... Next patch. I spotted another bug, namely that when allocating a new directory page it sets rec_len to blocksize, but does not zero the inode field... This would imply that we would skip a bogus directory entry at the start of a new page. As yet I haven't found the cause of the real bug, but at least now my kernel doesn't lock up on (relatively common) bogus data. Cheers, Andreas PS - patch has recieved some editing to remove the other INDEX flag stuff, so hopefully it is OK -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert diff -ru linux/fs/ext2i.orig/dir.c linux/fs/ext2i/dir.c --- linux/fs/ext2i.orig/dir.c Thu May 10 12:10:22 2001 +++ linux/fs/ext2i/dir.cFri May 11 00:01:46 2001 @@ -573,20 +599,48 @@ } /* - * p is at least 6 bytes before the end of page + * Check record length to ensure we don't loop on a bad directory entry. + * de is at least 6 bytes before the end of page. */ -static inline ext2_dirent *ext2_next_entry(ext2_dirent *p) +static inline ext2_dirent *ext2_next_entry(ext2_dirent *de, ext2_dirent *top) { - return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len)); + int len = le16_to_cpu(de->rec_len); + if (len < EXT2_DIR_REC_LEN(0)) { + printk(KERN_ERR "EXT2-fs: bad directory record length %d!\n", + len); + return top; + } + return (ext2_dirent *)((char *)de + len); +} + +static inline ext2_dirent *ext2_next_empty(ext2_dirent *de, ext2_dirent *top, + unsigned reclen) +{ + while (de <= top) { + unsigned nlen = EXT2_DIR_REC_LEN(de->name_len); + unsigned rlen = le16_to_cpu(de->rec_len); + if (rlen < EXT2_DIR_REC_LEN(0)) { + printk(KERN_ERR + "EXT2-fs: bad directory record length %d!\n", + rlen); + break; + } + if ((de->inode? rlen - nlen: rlen) >= reclen) + return de; + de = (ext2_dirent *) ((char *) de + rlen); + } + return NULL; } static inline unsigned ext2_validate_entry(char *base, unsigned offset, unsigned mask) { ext2_dirent *de = (ext2_dirent*)(base + offset); ext2_dirent *p = (ext2_dirent*)(base + (offset&mask)); - while ((char*)p < (char*)de) - p = ext2_next_entry(p); + while (p < de) + p = ext2_next_entry(p, de); return (char *)p - base; } @@ -640,8 +706,8 @@ types = ext2_filetype_table; for ( ; n < npages; n++, offset = 0) { - char *kaddr, *limit; - ext2_dirent *de; + char *kaddr; + ext2_dirent *de, *top; struct page *page = ext2_get_page(inode, n); if (IS_ERR(page)) @@ -652,8 +718,9 @@ need_revalidate = 0; } de = (ext2_dirent *)(kaddr+offset); - limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1); - for ( ;(char*)de <= limit; de = ext2_next_entry(de)) +
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Thursday 10 May 2001 22:53, Andreas Dilger wrote: > OK, here are the patches described above. > > The first one changes the use of the various INDEX flags, so that > they only appear when we have mounted with "-o index" (or > COMPAT_DIR_INDEX) and actually created an indexed directory. > > The second one changes the i_nlink counting on indexed dirs so that > if it overflows the 16-bit i_link_count field it is set to 1 and we > no longer track link counts on this directory. > > Unfortunately, they haven't been tested. I've given them several > eyeballings and they appear OK, but when I try to run the ext2 index > code (even without "-o index" mount option) my system deadlocks > somwhere inside my ext2i module (tight loop, but SysRQ still works). > I doubt it is due to these patches, but possibly because I am also > working on ext3 code in the same kernel. I'm just in the process of > getting kdb installed into that kernel so I can find out just where > it is looping. I'll apply and test them in uml tomorrow (3 am here now). By the way, uml was made in heaven for this sort of filesystem development. Err, actually it was made by Jeff, but I guess that makes him some kind of angel :-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Wednesday 09 May 2001 23:22, you wrote: > Daniel writes [re index directories]: > > This is lightly tested and apparently stable. > > I was looking at the new patch, and I saw something that puzzles me. > Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather > than only setting it when the dx_root index is created? This makes it follow the rule: new directories will be created with an index. (Never mind that the index creation is deferred.) Now, if you think it's ok to add an index to any directory that grows past one block that's fine with me. I was looking for a little more predictable behavriour. A couple of lines of code will go away and the is_dx(dir) tests will get a little faster if we just use 'dir grew past a block' as the rule. > Setting the flag earlier than that makes it mostly useless, since it > will be set on basically every directory. Not setting it would also > make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no > need to also check size). Yep. > Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we > have a (real) directory with an index, to avoid gratuitous > incompatibility with e2fsck. > > Cheers, Andreas Again yep, you decide, I will fix, or I see a few posts down you have changed the patch, also fine. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
I previously wrote: > I have changed the code to do the following: > - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the > INDEX mount option (the same as "mount -o index"). This removes > the need to specify the "-o index" option each time for filesystems > which already have indexed directories. > - New directories NEVER have the INDEX flag set on them. > - If the INDEX mount option is set, then when directories grow past 1 > block (and have the index added) they will get the directory INDEX > flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off). > > This means that you can have common code for indexed and non-indexed ext2 > filesystems, and the admin either needs to explicitly set COMPAT_DIR_INDEX > in the superblock or mount with "-o index" (and create a directory > 1 block). > > I have also added some tricks to ext2_inc_count() and ext2_dec_count() so > that indexed directories are not subject to the EXT2_LINK_MAX. I've done > the same as reiserfs, and set i_nlink = 1 if we overflow EXT2_LINK_MAX > (which has been increased to 65500 for indexed directories). Apparently > i_nlink = 1 is the right think to do w.r.t. find and other user tools. OK, here are the patches described above. The first one changes the use of the various INDEX flags, so that they only appear when we have mounted with "-o index" (or COMPAT_DIR_INDEX) and actually created an indexed directory. The second one changes the i_nlink counting on indexed dirs so that if it overflows the 16-bit i_link_count field it is set to 1 and we no longer track link counts on this directory. Unfortunately, they haven't been tested. I've given them several eyeballings and they appear OK, but when I try to run the ext2 index code (even without "-o index" mount option) my system deadlocks somwhere inside my ext2i module (tight loop, but SysRQ still works). I doubt it is due to these patches, but possibly because I am also working on ext3 code in the same kernel. I'm just in the process of getting kdb installed into that kernel so I can find out just where it is looping. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert diff -u linux/ext2.orig/dir.c linux/ext2/dir.c --- linux/ext2.orig/dir.c Thu May 10 12:10:22 2001 +++ linux/ext2/dir.cThu May 10 11:09:51 2001 @@ -27,6 +27,7 @@ #include #include #include +#include #include #ifndef assert @@ -201,6 +204,21 @@ return hash0; } +static void ext2_update_to_indexed(struct inode *dir) +{ + struct super_block *sb = dir->i_sb; + + if (test_opt(sb, INDEX) && !EXT2_HAS_COMPAT_FEATURE(sb, DIR_INDEX)) { + dxtrace_on(printk("Adding COMPAT_DIR_INDEX feature flag\n")); + ext2_update_dynamic_rev(sb); + lock_kernel(); + EXT2_SET_COMPAT_FEATURE(sb, DIR_INDEX); + unlock_kernel(); + ext2_write_super(dir->i_sb); + } + dir->u.ext2_i.i_flags |= EXT2_INDEX_FL; +} + /* * Debug */ @@ -696,8 +720,7 @@ char *data, *top; *result = NULL; if (namelen > EXT2_NAME_LEN) return NULL; - if (ext2_dx && is_dx(dir)) - { + if (is_dx(dir)) { u32 hash = dx_hash (name, namelen); struct dx_frame frames[2], *frame; if (!(frame = dx_probe (dir, hash, frames))) @@ -818,8 +860,7 @@ char *data1; if (!namelen) return -EINVAL; - if (ext2_dx && is_dx(dir)) - { + if (is_dx(dir)) { unsigned count, split, hash2, block2; struct buffer_head *bh2; char *data2; @@ -989,7 +1032,7 @@ if ((de->inode? rlen - nlen: rlen) >= reclen) goto add; de = (ext2_dirent *) ((char *) de + rlen); } - if (ext2_dx && blocks == 1 && dir->u.ext2_i.i_flags & EXT2_INDEX_FL) + if (ext2_dx && blocks == 1 && test_opt(dir->i_sb, INDEX)) { struct dx_root *root; struct buffer_head *newbh, *rootbh = bh; @@ -1029,6 +1072,7 @@ dx_set_block (entries, 1); dx_set_count (entries, 1); dx_set_limit (entries, dx_root_limit(dir, sizeof(struct dx_root_info))); + ext2_update_to_indexed(dir); /* Initialize as for dx_probe */ hash = dx_hash (name, namelen); @@ -1107,26 +1154,11 @@ int ext2_make_empty(struct inode *dir, struct inode *parent) { - struct super_block *sb = dir->i_sb; struct buffer_head *bh; - int make_dx = test_opt (sb, INDEX); - dir->i_size = sb->s_blocksize; + + dir->i_size = dir->i_sb->s_blocksize;
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
I previously wrote: > I was looking at the new patch, and I saw something that puzzles me. > Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather > than only setting it when the dx_root index is created? > > Setting the flag earlier than that makes it mostly useless, since it > will be set on basically every directory. Not setting it would also > make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no > need to also check size). > > Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we have > a (real) directory with an index, to avoid gratuitous incompatibility > with e2fsck. I have changed the code to do the following: - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the INDEX mount option (the same as "mount -o index"). This removes the need to specify the "-o index" option each time for filesystems which already have indexed directories. - New directories NEVER have the INDEX flag set on them. - If the INDEX mount option is set, then when directories grow past 1 block (and have the index added) they will get the directory INDEX flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off). This means that you can have common code for indexed and non-indexed ext2 filesystems, and the admin either needs to explicitly set COMPAT_DIR_INDEX in the superblock or mount with "-o index" (and create a directory > 1 block). I have also added some tricks to ext2_inc_count() and ext2_dec_count() so that indexed directories are not subject to the EXT2_LINK_MAX. I've done the same as reiserfs, and set i_nlink = 1 if we overflow EXT2_LINK_MAX (which has been increased to 65500 for indexed directories). Apparently i_nlink = 1 is the right think to do w.r.t. find and other user tools. Patches need some light testing before being posted. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Daniel writes [re index directories]: > This is lightly tested and apparently stable. I was looking at the new patch, and I saw something that puzzles me. Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather than only setting it when the dx_root index is created? Setting the flag earlier than that makes it mostly useless, since it will be set on basically every directory. Not setting it would also make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no need to also check size). Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we have a (real) directory with an index, to avoid gratuitous incompatibility with e2fsck. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
This patch updates ext2_getblk and ext2_bread to use the ERR_PTR style of error return. As Al Viro pointed out, this is a better way of doing things for a function returning a pointer. This approach would have prevented the bug I fixed with the previous patch. 20 20 hindsight, and I can only plead that I was following the interface of the old ext2_getblk. But since these functions are only used only by the ext2 directory code - which in turn is the only part of ext2 that is interested in file data - there was no problem changing the interface. The patch is at: http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-4 This is lightly tested and apparently stable. I wish I could say the same for kernel 2.4.4 - cache performance sucks horribly. (Nontechnical evaluation.) So it is probably not a good idea to take benchmarks too seriously this month. The previous stable kernels, 2.4.2 and 2.4.3, had their problems too, fixable via patching. Maybe next month... This patch requires Al Viro's directory-in-page-cache patch to be applied first, available from: ftp://ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz The other flavor of indexing patch, dx.testme..., also does directory-in-page-cache, using the good old ext2 directory code. This works fine and is stable, but IMHO Al's patches constitute a pretty major cleanup. To apply: cd source/tree zcat ext2-dir-patch-S4.gz | patch -p1 cat dx.pcache-2.4.4-4 | patch -p0 To create an indexed directory: mount /dev/hdxxx /test -o index mkdir /test/foo -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Works fine now. I'll keep on testing. Thanks a bunch, Albert Daniel Phillips wrote: > > > This combination against 2.4.4 won't allow directories to be moved. > > Ex: mv a b #fails with I/O error. See attached strace. > > But with ext2-dir-patch-S4 by itself, mv works as it should. > > Now it also works with my index patch as it should: > > http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-3 > -- Albert Cranford Deerfield Beach FL USA [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
> This combination against 2.4.4 won't allow directories to be moved. > Ex: mv a b #fails with I/O error. See attached strace. > But with ext2-dir-patch-S4 by itself, mv works as it should. Now it also works with my index patch as it should: http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-3 It was just an uninitialized *err return. Next patch I'll follow Al Viro's suggestion and change ext2_getblk (used *only* in the directory code) to use ERR_PTR instead of *err, saving an argument and eliminating the chance for this kind of dumb error. ext2_getblk now looks like: struct buffer_head *ext2_getblk (struct inode *inode, u32 block, int create, int *err) { unsigned blockshift = inode->i_sb->s_blocksize_bits; unsigned blocksize = 1 << blockshift; unsigned pshift = PAGE_CACHE_SHIFT - blockshift; struct buffer_head *bh; struct page *page; if (!(page = grab_cache_page(inode->i_mapping, block >> pshift))) goto fail_page; if (!page->buffers) create_empty_buffers (page, inode->i_dev, blocksize); bh = page_buffer (page, block & ((1 << pshift) - 1)); atomic_inc(&bh->b_count); UnlockPage(page); page_cache_release(page); *err = 0; if (buffer_mapped(bh)) return bh; if ((*err = ext2_get_block(inode, block, bh, create))) goto out_null; if (!buffer_mapped(bh)) goto out_null; if (!buffer_new(bh)) return bh; lock_buffer(bh); memset(bh->b_data, 0, blocksize); mark_buffer_uptodate(bh, 1); mark_buffer_dirty_inode(bh, inode); unlock_buffer(bh); return bh; out_null: brelse(bh); return NULL; fail_page: *err = -EIO; return NULL; } A little longer, somewhat clearer and much more correct. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
On Thursday 03 May 2001 03:15, you wrote: > Hello Daniel, > This combination against 2.4.4 won't allow directories to be moved. > Ex: mv a b #fails with I/O error. See attached strace. > But with ext2-dir-patch-S4 by itself, mv works as it should. > Later, > Albert Thanks Albert, this was easily reproduceable here but did not show up with 2.4.2 (uml), which I'd used to develop the patch. Analyzing now... -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Hello Daniel, This combination against 2.4.4 won't allow directories to be moved. Ex: mv a b #fails with I/O error. See attached strace. But with ext2-dir-patch-S4 by itself, mv works as it should. Later, Albert Daniel Phillips wrote: > > > Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz > > Here is my ext2 directory index as a patch against your patch: > > http://kernelnewbies.org/~phillips/htree/dx.pcache-2.4.4 > > Changes: > > - COMBSORT macro replaced by custom sort code > - Most #ifdef CONFIG_EXT2_INDEX's changed to if () > > To do: > > - Split up the split code > - Finalize hash function > - Test/debug big endian > - Fall back to linear search if bad index detected > - Fail gracefully on random data > - Remove the tracing and test options > > To apply: > > cd source/tree > zcat ext2-dir-patch-S4.gz | patch -p1 > cat dx.pcache-2.4.4 | patch -p0 > > To create an indexed directory: > > mount /dev/hdxxx /test -o index > mkdir /test/foo > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Albert Cranford Deerfield Beach FL USA [EMAIL PROTECTED] execve("/bin/mv", ["mv", "a", "b"], [/* 47 vars */]) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40007000 mprotect(0x4000, 21025, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 mprotect(0x8048000, 52997, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 stat("/etc/ld.so.cache", {st_mode=S_IFREG|0644, st_size=44476, ...}) = 0 open("/etc/ld.so.cache", O_RDONLY) = 3 mmap(NULL, 44476, PROT_READ, MAP_SHARED, 3, 0) = 0x40008000 close(3)= 0 stat("/etc/ld.so.preload", {st_mode=S_IFREG|0644, st_size=1, ...}) = 0 open("/etc/ld.so.preload", O_RDONLY)= 3 mmap(NULL, 2, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x40013000 close(3)= 0 munmap(0x40013000, 2) = 0 open("/lib/libc.so.5", O_RDONLY)= 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\*\1\000"..., 4096) = 4096 mmap(NULL, 786432, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40013000 mmap(0x40013000, 555135, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x40013000 mmap(0x4009b000, 21344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x87000) = 0x4009b000 mmap(0x400a1000, 204364, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x400a1000 close(3)= 0 mprotect(0x40013000, 555135, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 munmap(0x40008000, 44476) = 0 mprotect(0x8048000, 52997, PROT_READ|PROT_EXEC) = 0 mprotect(0x40013000, 555135, PROT_READ|PROT_EXEC) = 0 mprotect(0x4000, 21025, PROT_READ|PROT_EXEC) = 0 personality(PER_LINUX) = 0 geteuid() = 0 getuid()= 0 getgid()= 0 getegid() = 0 brk(0x80568bc) = 0x80568bc brk(0x8057000) = 0x8057000 geteuid() = 0 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 stat("b", 0xb76c) = -1 ENOENT (No such file or directory) lstat("a", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat("b", 0x8056660) = -1 ENOENT (No such file or directory) rename("a", "b")= -1 EIO (Input/output error) write(2, "mv: ", 4mv: ) = 4 write(2, "cannot move `a\' to `b\'", 22cannot move `a' to `b') = 22 stat("/etc/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or directory) stat("/usr/share/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or directory) stat("/usr/share/locale/libc/C", 0xb2f0) = -1 ENOENT (No such file or directory) stat("/usr/share/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or directory) stat("/usr/local/share/locale/C/libc.cat", 0xb2f0) = -1 ENOENT (No such file or directory) write(2, ": I/O error", 11: I/O error) = 11 write(2, "\n", 1 ) = 1 _exit(1)= ?
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
> Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz Here is my ext2 directory index as a patch against your patch: http://kernelnewbies.org/~phillips/htree/dx.pcache-2.4.4 Changes: - COMBSORT macro replaced by custom sort code - Most #ifdef CONFIG_EXT2_INDEX's changed to if () To do: - Split up the split code - Finalize hash function - Test/debug big endian - Fall back to linear search if bad index detected - Fail gracefully on random data - Remove the tracing and test options To apply: cd source/tree zcat ext2-dir-patch-S4.gz | patch -p1 cat dx.pcache-2.4.4 | patch -p0 To create an indexed directory: mount /dev/hdxxx /test -o index mkdir /test/foo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][CFT] (updated) ext2 directories in pagecache
Patch rediffed to 2.4.4, otherwise - no changes (2.4.4 has a fix for ext2 race, but it's unrelated to the thing). Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz Please, help with testing. Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][CFT] (updated) ext2 directories in pagecache
On Thu, 12 Apr 2001, Alexander Viro wrote: > Folks, IMO ext2-dir-patch got to the stable stage. Currently > it's against 2.4.4-pre2, but it should apply to anything starting with > 2.4.2 or so. > > Ted, could you review it for potential inclusion into 2.4 once > it gets enough testing? It's ext2-only (the only change outside of > ext2 is exporting waitfor_one_page()), it doesn't change fs layout, > it seriously simplifies ext2/dir.c and ext2/namei.c and it gives better > VM behaviour. Previous variant left junk in ->d_type of directory entries on "old" filesystems (i.e. ones where it should be zeroed). Harmless (on these filesystems readdir() returned DT_UNKNOWN anyway), but it PO'd fsck and was the wrong thing anyway. Fixed and rediffed against current tree (2.4.4-pre6). Folks, please help with testing. Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4-pre6.gz Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/