I looked at the code a little just now. The getdents system call passes
a filldir() callback function into the file system readdir()
implementation that lets it fill entries until the user's dentry buffer
is full. The dentries at this level use variable length strings. The
only remaining cap at this point is the size of the dentry buffer passed
in from user space (and any artificial cap introduced by the file system
implementation).
http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L270
http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L232
If I do an strace on a directory with 300 entries on ext3, this is what
happens:
getdents64(3, /* 170 entries */, 4096) = 4080
getdents64(3, /* 132 entries */, 4096) = 3168
getdents64(3, /* 0 entries */, 4096) = 0
If I do the same thing on a PVFS volume, this is what happens:
getdents64(3, /* 34 entries */, 4096) = 816
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 12 entries */, 4096) = 288
getdents64(3, /* 0 entries */, 4096) = 0
The latter is not filling up the getdents buffer because our code is
stopping at 32 entries per iteration. If I then apply Bart's patch,
things improve in terms of how much it fits into one getdents system
call, but on my box at least (2.6.24-19, 32bit, current PVFS trunk)
something new breaks:
getdents64(3, /* 170 entries */, 4096) = 4080
getdents64(3, /* 0 entries */, 4096) = 0
It looks like it stopped after one getdents (the actual output from ls
only shows 170 entries).
So... I would like to apply this patch, but first I need to dig a little
more and find out what the bug is on my system that is making it stop at
the first getdents call. It must not be handling the token right in the
case where PVFS returns more entries than filldir() can consume.
-Phil
Rob Ross wrote:
Has the internal kernel value changed since we last looked?
Rob
On Sep 4, 2008, at 4:16 PM, Phil Carns wrote:
Sam Lang wrote:
Hi Bart,
Thanks for the patch. For users with that many files in a directory,
using pvfs2-ls is probably a good alternative.
The kernel does readdir requests 32 entries at a time, so increasing
MAX_NUM_DIRENTS won't help for ls. Long listings requires getting
the size of files, which in PVFS is fairly expensive.
Unfortunately, we haven't kept up with the readdirplus
implementation, some bugs have probably crept in since Murali added
that tool. If you were motivated to look at where the servers were
crashing, we'd certainly be interested in helping with the debugging
there.
Thanks again,
-sam
It does look like ls improved with the patches for some reason, though.
The 256 and 512 results are also just about close enough to be noise.
It looks like most of the benefit came from the jump from 32/64 to 256.
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers