Re: [Pvfs2-developers] server crash on startup with millions of files

Phil Carns Thu, 15 Mar 2007 18:33:50 -0800

Thanks Sam!

I tried your patches, and here are the performance results for startupof a file system that was created with a 356 MB dataspace_attributes.dbfile:


- old format storage space, stock server: 7 minutes, 10 seconds
- old format storage space, patched server: 6 minutes, 49 seconds
- new format storage space, patched server: 1 minute, 13 seconds

As expected, much better performance if the file system is generatedwith the new storage format!

All of the functionality seemed fine regardless of whether the patchedserver was run on the new or old storage format. In all three cases Icreated 960,000 files on a single meta server to get to that db size.There was also a slight improvement (about 1%) in file creationperformance using the new format and patched server.


This is definitely a big help to be able to get the servers started quicker.

-Phil


Sam Lang wrote:

Here's a patch with the suggested changes. Handling the comparisonfunction with a different storage format ended up being a bit uglierthan I expected. Removing the DB_RECNUM flag from our db's was alsonot as easy as expected. I did the following:
* changed the dspace comparison function to use > instead of <. Thisshould allow the iterate_handles function to get berkeley db to read inpages from front to back instead of back to front.
* Modified the symantics of our storage format version a bit. Theprevious version was 0.1.2, and unless I'm mistaken, the individualcomponents of the version didn't carry much meaning. Any versionchange meant that the new code would abort on older storage versions.I've given the components names: major.minor.incremental, and allowedthe incremental value to be changed (0.1.3) so that new code cansupport older formats, but all major and minor value changes are _not_backward compatible.
* With the storage version changes, we now accept 0.1.3 and 0.1.2, andcall the appropriate comparison function based on the version.
* Changed the PVFS_ds_position from int32_t to uint64_t. Note thatthis required changing many of the request encoding/decoding functionsthat pass a position field, and incrementing the protocol major version(do we zero the minor version when we increment the major version?).It required getting the alignment right for device requests as well.
* It turns out that once a db is created with DB_RECNUM, it always hasto be opened with DB_RECNUM, so that's another storage format change.For now, I try to open without DB_RECNUM, and if that returns EINVAL Iretry with DB_RECNUM. Newly created dbs don't get the DB_RECNUM flag,so hopefully that will improve performance (the doc says it can reallyslow things down).
Let me know how these changes look, and if someone gets a chance tolook at performance differences, that would be great.
Thanks,

-sam



On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
Can we conclude this discussion?  In summary:
* The current comparison function causes bad IO patterns foriterate on the dspace db. We can change it but the disk formatwill change in new releases.- If we change it, either we check a version number and providethe right comparison function, or we perform migration to the newstorage format.- If we don't change it, we can still improve performance byiterating from the last entry to the first, but we can't useDB_MULTIPLE_KEY, which also improves performance for big filesystems.
I don't really have a preference either way.
* If we change PVFS_ds_position from uint32_t to uint64_t, we canuse the handle as the position, and avoid opening the dspace dbwith the RECNO flag, which is killing our performance on writes.
I think this sounds good too. We would be happy to help test anycombination of the options you list.
-Phil


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] server crash on startup with millions of files

Reply via email to