Thanks Sam!
I tried your patches, and here are the performance results for startup
of a file system that was created with a 356 MB dataspace_attributes.db
file:
- old format storage space, stock server: 7 minutes, 10 seconds
- old format storage space, patched server: 6 minutes, 49 seconds
- new format storage space, patched server: 1 minute, 13 seconds
As expected, much better performance if the file system is generated
with the new storage format!
All of the functionality seemed fine regardless of whether the patched
server was run on the new or old storage format. In all three cases I
created 960,000 files on a single meta server to get to that db size.
There was also a slight improvement (about 1%) in file creation
performance using the new format and patched server.
This is definitely a big help to be able to get the servers started quicker.
-Phil
Sam Lang wrote:
Here's a patch with the suggested changes. Handling the comparison
function with a different storage format ended up being a bit uglier
than I expected. Removing the DB_RECNUM flag from our db's was also
not as easy as expected. I did the following:
* changed the dspace comparison function to use > instead of <. This
should allow the iterate_handles function to get berkeley db to read in
pages from front to back instead of back to front.
* Modified the symantics of our storage format version a bit. The
previous version was 0.1.2, and unless I'm mistaken, the individual
components of the version didn't carry much meaning. Any version
change meant that the new code would abort on older storage versions.
I've given the components names: major.minor.incremental, and allowed
the incremental value to be changed (0.1.3) so that new code can
support older formats, but all major and minor value changes are _not_
backward compatible.
* With the storage version changes, we now accept 0.1.3 and 0.1.2, and
call the appropriate comparison function based on the version.
* Changed the PVFS_ds_position from int32_t to uint64_t. Note that
this required changing many of the request encoding/decoding functions
that pass a position field, and incrementing the protocol major version
(do we zero the minor version when we increment the major version?).
It required getting the alignment right for device requests as well.
* It turns out that once a db is created with DB_RECNUM, it always has
to be opened with DB_RECNUM, so that's another storage format change.
For now, I try to open without DB_RECNUM, and if that returns EINVAL I
retry with DB_RECNUM. Newly created dbs don't get the DB_RECNUM flag,
so hopefully that will improve performance (the doc says it can really
slow things down).
Let me know how these changes look, and if someone gets a chance to
look at performance differences, that would be great.
Thanks,
-sam
On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
Can we conclude this discussion? In summary:
* The current comparison function causes bad IO patterns for
iterate on the dspace db. We can change it but the disk format
will change in new releases.
- If we change it, either we check a version number and provide
the right comparison function, or we perform migration to the new
storage format.
- If we don't change it, we can still improve performance by
iterating from the last entry to the first, but we can't use
DB_MULTIPLE_KEY, which also improves performance for big filesystems.
I don't really have a preference either way.
* If we change PVFS_ds_position from uint32_t to uint64_t, we can
use the handle as the position, and avoid opening the dspace db
with the RECNO flag, which is killing our performance on writes.
I think this sounds good too. We would be happy to help test any
combination of the options you list.
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers