Thanks Sam!

I tried your patches, and here are the performance results for startup of a file system that was created with a 356 MB dataspace_attributes.db file:

- old format storage space, stock server: 7 minutes, 10 seconds
- old format storage space, patched server: 6 minutes, 49 seconds
- new format storage space, patched server: 1 minute, 13 seconds

As expected, much better performance if the file system is generated with the new storage format!

All of the functionality seemed fine regardless of whether the patched server was run on the new or old storage format. In all three cases I created 960,000 files on a single meta server to get to that db size. There was also a slight improvement (about 1%) in file creation performance using the new format and patched server.

This is definitely a big help to be able to get the servers started quicker.

-Phil


Sam Lang wrote:

Here's a patch with the suggested changes. Handling the comparison function with a different storage format ended up being a bit uglier than I expected. Removing the DB_RECNUM flag from our db's was also not as easy as expected. I did the following:

* changed the dspace comparison function to use > instead of <. This should allow the iterate_handles function to get berkeley db to read in pages from front to back instead of back to front.

* Modified the symantics of our storage format version a bit. The previous version was 0.1.2, and unless I'm mistaken, the individual components of the version didn't carry much meaning. Any version change meant that the new code would abort on older storage versions. I've given the components names: major.minor.incremental, and allowed the incremental value to be changed (0.1.3) so that new code can support older formats, but all major and minor value changes are _not_ backward compatible.

* With the storage version changes, we now accept 0.1.3 and 0.1.2, and call the appropriate comparison function based on the version.

* Changed the PVFS_ds_position from int32_t to uint64_t. Note that this required changing many of the request encoding/decoding functions that pass a position field, and incrementing the protocol major version (do we zero the minor version when we increment the major version?). It required getting the alignment right for device requests as well.

* It turns out that once a db is created with DB_RECNUM, it always has to be opened with DB_RECNUM, so that's another storage format change. For now, I try to open without DB_RECNUM, and if that returns EINVAL I retry with DB_RECNUM. Newly created dbs don't get the DB_RECNUM flag, so hopefully that will improve performance (the doc says it can really slow things down).

Let me know how these changes look, and if someone gets a chance to look at performance differences, that would be great.

Thanks,

-sam



On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:


Can we conclude this discussion?  In summary:
* The current comparison function causes bad IO patterns for iterate on the dspace db. We can change it but the disk format will change in new releases. - If we change it, either we check a version number and provide the right comparison function, or we perform migration to the new storage format. - If we don't change it, we can still improve performance by iterating from the last entry to the first, but we can't use DB_MULTIPLE_KEY, which also improves performance for big filesystems.


I don't really have a preference either way.

* If we change PVFS_ds_position from uint32_t to uint64_t, we can use the handle as the position, and avoid opening the dspace db with the RECNO flag, which is killing our performance on writes.


I think this sounds good too. We would be happy to help test any combination of the options you list.

-Phil



_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to