Yeah that is odd. Setting the cursor for each call to iterate_handles
may be the reason for it starting over. Do you know how many times it
starts over? The number of times iterate_handles is called will be (#
of files / 4096).
It only goes through the file twice if I am looking at the log
correctly. Also, I just realized that on both passes (the one jumping
backwards 40KB at a time and the one jumping backwards 4KB at a time) it
is only reading 4KB per pread. I don't know what it is doing from a db
point of view, but from an access point of view it looks like it goes
backwards with a strided pattern and then goes backwards reading the
entire thing. There are some other reads scattered here and there, but
those two cycles represent the overwhelming majority of the total preads
in the strace file. By spot checking I don't really see any significant
divergence from the patterns.
It also just occurred to me that maybe I should repeat the strace and
try to capture it with timestamps; I'm not really sure if both of these
pread cycles are actually during the scan or not.
-Phil
Maybe it has to do with setting the iterator with the RECNUM flag,
which we set so that we can keep track of positions over the
iterate_handles call. Since we already use the handles to sort the
entries, maybe the two are conflicting with each other. The berkeley
db doc does mention that RECNUM will hinder performance, but only on
writes:
--
Configuring a Btree for record numbers should not be done lightly.
While often useful, it may significantly slow down the speed at which
items can be stored into the database, and can severely impact
application throughput. Generally it should be avoided in trees with a
need for high write concurrency.
--
If we could return the handle as the position, we could get rid of the
RECNUM flag and set the cursor with the last handle, but the position
field is only uint32_t. Its really annoying that we only use the first
32 bits of the PVFS_handle right now too. Can we change that
PVFS_ds_position type to be 64 bit?
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers