Hi Phil,
Cool. It might help to increase that MAX_NUM_VERIFY_HANDLE_COUNT
value in trove-handle-mgmt.h to something even bigger than 4096. I'm
guessing your SAN will give you the best performance when you're
doing reads in the megabytes range, so putting it at as high as 16000
might be ok.
Given these results it might make sense to make similar changes to
the keyval iterate code as well. Even with a max of 32 entries at
once, we're still likely to get some improvement.
-sam
On Feb 23, 2007, at 11:23 AM, Phil Carns wrote:
Ok, I have tried several iterations both with and without these
patches. The test system is again using a SAN, this time with a
dataspace_attributes.db file of about 451 MB on a particular
server. I'm not sure how many files are on the file system; I just
cranked out files on it until the db file looked big enough to get
good measurements on the startup time. I was able to turn on the
"trove,server" logging mask along with the "usec" timestamp to see
the scan time on both versions without any logging occuring during
the actual scan itself.
for example:
[D 10:00:46.541646] dbpf collection 752900094 - Setting collection
handle ranges to 4-536870914,4294967292-4831838202
[D 10:04:19.414723] dbpf collection 752900094 - Setting
HIGH_WATERMARK to -1
If I unmount between each server start, the original version takes
an average of 3 minutes, 17 seconds to complete the scan.
The patched version takes an average of 2 minutes, 22 seconds to
complete the same scan.
This is definitely a big improvement- almost 30% in my test case.
-Phil
Phil Carns wrote:
Thanks Sam! We will give these patches a try and report back.
-Phil
Sam Lang wrote:
Hi Phil,
Attached mult.patch implements iterating over the dspace db
using DB_MULTIPLE_KEY. This may allow for the db get call to do
larger reads from your SAN. I was seeing slightly better
performance with local disk after creating 20K files in a fresh
storage space. Doing strace doesn't show fewer mmaps or larger
reads though, so I'm not sure how berkeley db pulls in its
pages. Anyway, if it helps improve performance for you guys, I
can clean it up a bit and commit it. I don't think anything
uses dspace_iterate_handles besides that ledger handle
management code.
You can fiddle the MAX_NUM_VERIFY_HANDLE_COUNT value to set how
many handles to get at a time. Right now its set to 4096. Keep
in mind that this requires a much larger buffer allocated in
dbpf_dspace_iterate_handles_op_svc, since we have to get keys
and values, so essentially we do a get with a buffer that's 4096*
(sizeof (handle) + sizeof(stored_attr)), which ends up being
about 300K.
I also attached a patch (server-start.patch) that prints out the
start message as well as ready message after server
initialization has completed. If you set the Logstamp to usec,
you'll be able to see the time it takes to initialize the
server. Also, this might help in knowing when you can mount the
clients, although, hopefully at some point we'll be able to add
the zero-conf stuff and then we can return EAGAIN or something.
I'm not sure its time to replace the ledger code. It seems to
work ok, and to fix the slowness you're seeing would mean
switching to some kind of range tree that could be serialized to
disk so that we wouldn't have to iterate through the entire
dspace db on startup. That opens up the possibility of the
dspace db and the ledger-on-disk getting out of sync, which I'd
rather avoid.
We could hand out new handles by choosing one randomly, and then
checking if its in the DB, getting rid of the need for a ledger
entirely, but I assume this idea was already scratched to avoid
the potential costs at creation time, especially as the
filesystem grows.
-sam
On Feb 20, 2007, at 11:23 AM, Phil Carns wrote:
Robert Latham wrote:
On Tue, Feb 20, 2007 at 07:29:16AM -0500, Phil Carns wrote:
Oh, and one other detail; the memory usage of the servers
looks fine during startup, so this doesn't appear to be a
memory leak. There is quite a bit of CPU work, but I am
guessing that is just berkeley db keeping busy in the
iteration function.
How long does it take to scan 1.4 million files on startup?
==rob
That's an interesting issue :)
A few observations:
- we were looking at this on SAN; the results may be different
on local disks
- the db files are on the order of 500 MB for this particular setup
- the time to scan varies depending on if the db files are hot
in the Linux buffer cache
If we start the daemon right after killing another one that
just did the same scan, then the process is CPU intensive, but
fast (about 5 seconds). If we unmount/mount the SAN between
the two runs so that the buffer cache is cleared, then it is
very slow (about 5 minutes).
An interesting trick is to use dd with a healthy buffer size to
read the .db files and throw the output into /dev/null before
starting the servers. This only takes a few seconds, and makes
it so that the scan consistently finishes in just a few seconds
as well. I think the reason is just that it forces the db data
into the Linux buffer cache using an efficient access pattern
so that berkeley db doesn't have to wait on disk latency for
whatever small accesses it is performing.
This seems to indicate that berkeley db's access pattern
generated by PVFS2 for this case isn't very friendly, at least
to SANs that aren't specifically tuned for it.
The 5 minute scan time is a problem, because it makes it hard
to tell when you will actually be able to mount the file system
after the daemons appear to have started. We would be happy to
try out any optimizations here :)
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-
developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers