On Oct 9, 2006, at 11:02 AM, Rob Ross wrote:

Hi Phil,

That's been around for a while (years). It's an artifact of ls reading pieces of the directory list at a time, and the directory list changing between directory reads.

We've talked about having pvfs2-client pull out duplicates (or the kernel module) in the cases where one of those chooses to break a readdir into multiple operations, but we haven't spent much time investigating where the replication is actually happening in order to accomplish this.

Other solutions could include locking the directory (not going to happen), restarting the ls entirely if the directory changes during the read (would cause starvation), improving ls to remove duplicates on its own (probably realistic for pvfs2-ls, unlikely to get accepted by GNU tools group for stock ls), and reordering directory entries returned so that most recently changing entries are returned last (high overhead on server to sort, lots of coding probably).

Any other ideas?


Yeah I guess my proposed changes wouldn't help in this case. Berkeley db has the notion of a secondary (read-only) database based on the primary, where keys in the secondary are based on the primary's data and some function you provide. So it might be possible to create a secondary database for iterating based on update time instead of alphabetically based on component name. I'm not sure how efficient that would be though...we might just be pushing the sorting problem to the db layer. Also, we would have to start storing update times in keyval dirent entries.

-sam

Rob

Phil Carns wrote:
We are seeing a strange bug where if we list the contents of a directory while files are being created in it, we sometimes get duplicates and/or
missing files in the output.
I can reproduce it on a single machine by running these two scripts at
the same time:
tester.sh:
-----------------------------------
   #!/bin/tcsh
   foreach file ( `seq 1 10000` )
      touch /mnt/pvfs2/testdir/${file}
   end
watcher.sh:
-----------------------------------
   #!/bin/tcsh
   there:
      set foo=`ls /mnt/pvfs2/testdir | wc -l`
      set bar=`ls /mnt/pvfs2/testdir | uniq -d | wc -l`
      echo listing count: $foo, duplicates: $bar
      sleep 1
   goto there
The test machine that I am using is pretty slow. On faster machines you
may need to create more than 10,000 files, or maybe slow it down by
actually writing a little bit of data into each file.
At any rate, the output looks normal for a while, but then we start
seeing results like this from watcher.sh:
...
listing count: 6310, duplicates: 0
listing count: 6320, duplicates: 0
listing count: 6334, duplicates: 0
listing count: 6371, duplicates: 0
listing count: 6382, duplicates: 0
listing count: 6396, duplicates: 5024
listing count: 6406, duplicates: 0
listing count: 10896, duplicates: 5344
listing count: 6430, duplicates: 5120
listing count: 6434, duplicates: 0
listing count: 11574, duplicates: 6048
listing count: 6472, duplicates: 0
...
The listing count is supposed to steadily increase, and the duplicates
field should always be zero.  The problem only occurs while files are
being created.  Once tester.sh is done, the listing looks perfectly
normal.
Anyone have any ideas? I think this problem has been hanging around for a little while but we just now figured out how to reliably trigger it. It is at least in current cvs head and was in a snapshot from August 21.
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to