Erez Zadok <[EMAIL PROTECTED]> writes:

> In message <[EMAIL PROTECTED]>, Chris Wedgwood writes:
> > On Tue, May 02, 2000 at 12:15:20AM -0400, Theodore Y. Ts'o wrote:
> > 
> >        Date:        Mon, 1 May 2000 11:27:04 -0400 (EDT)
> >        From: Alexander Viro <[EMAIL PROTECTED]>
> >     
> > Keep in mind that userland may need to be taught how to deal with getdents()
> 
> >        returning duplicates - there is no reasonable way to serve that in
> >        the kernel. 
> >     
> >     *BSD does this in libc, for the exactly same reason; there's no good way
> >     to do this in the kernel.
> [...]
> > I'm not sure how efficient and fast the code would be to make this
> > work quickly, for large numbers of file systems it might prove
> > horribly slow.
> 
> IMHO the BSD hacks to libc support unionfs were ugly.  To write unionfs,
> they used the existing nullfs "template", but then they had to modify the
> VFS *and* other user-land stuff.
> 
> It depends what you mean by "reasonable way" and "good way".  I've done it
> in my prototype implementation of unionfs which uses fan-out stackable f/s:
> 
> (1) you read directory 1, and store the names you see in a hash table.
> (2) as you read each entry from subsequent directories, you check if it's in
>     the hash table.  If it is, skip it; if it's not, add it to the getdents
>     output buf, and add the entry to the hash-table.
> 
> This was a simple design and easy to implement.  Yes it added overhead to
> readdir(2) but not as much as you'd think.  It was certainly not "horribly
> slow", nor did it chew up lots of ram.  I tried it on several directories
> with several dozen entries each (i.e., typical directory sizes), not on
> directories with thousands or more entries.
> 
> I think that if we're adding directory unification features into the linux
> kernel, then we should add unique-fication of names as well to the kernel.
> One possible way would be to take advantage of the fact that most
> readdir()'s are followed by lstat()s of each entry (hence NFSv3's
> READDIRPLUS): so when you do a readdir, maybe it's best to pre-create a
> mini-dentry for each such entry, in anticipation of its probable use.  The
> advantage there is that the dentry already has the name, and we already have
> code to do dentry lookups based on their name.
> 
> >  --cw
> 

Could someone tell me in small words why we think there is a guarantee of
uniqueness for entries in directories.  I admit that on a stable directory
that is not modified this is true.  However if anyone is touching the
directory while you are reading it, the guarantee of uniqueness breaks down.

If we were to do database things like read consistency then yes I could see this,
but we already don't do that.

That plus reiserfs is working at making it possible to have fast directories that
are larger than you can fit in RAM with millions of entries.  

Though I suspect the problem with duplicate inodes is more likely to byte os
if we do a union fs.

Stepping back the one VFS operation a directory with millions of entries will
have to have ensure is fast is lookup.  So it might be reasonable in 
union_getdents to do:
getdents from fs1
union_lookup ...
if directory comes from fs1 return it....
else drop it.

Which at least scales to millions of entries if not to thousands of directories
mounted on top of each other. 
 
Having enough knowledge (somewhere) to know that you don't need these kinds
of shenanigans is even better.

Seeing if we can avoid deduping the list for userspace or simply exporting the problem
to user space as a user space fs, sounds like a more reasonable solution.

Why do what's unnecessary?

Eric

Reply via email to