yeah zero the minor when changing the major version. thanks for putting
this together! -- rob
Sam Lang wrote:
Here's a patch with the suggested changes. Handling the comparison
function with a different storage format ended up being a bit uglier
than I expected. Removing the DB_RECNUM flag from our db's was also not
as easy as expected. I did the following:
* changed the dspace comparison function to use > instead of <. This
should allow the iterate_handles function to get berkeley db to read in
pages from front to back instead of back to front.
* Modified the symantics of our storage format version a bit. The
previous version was 0.1.2, and unless I'm mistaken, the individual
components of the version didn't carry much meaning. Any version change
meant that the new code would abort on older storage versions. I've
given the components names: major.minor.incremental, and allowed the
incremental value to be changed (0.1.3) so that new code can support
older formats, but all major and minor value changes are _not_ backward
compatible.
* With the storage version changes, we now accept 0.1.3 and 0.1.2, and
call the appropriate comparison function based on the version.
* Changed the PVFS_ds_position from int32_t to uint64_t. Note that this
required changing many of the request encoding/decoding functions that
pass a position field, and incrementing the protocol major version (do
we zero the minor version when we increment the major version?). It
required getting the alignment right for device requests as well.
* It turns out that once a db is created with DB_RECNUM, it always has
to be opened with DB_RECNUM, so that's another storage format change.
For now, I try to open without DB_RECNUM, and if that returns EINVAL I
retry with DB_RECNUM. Newly created dbs don't get the DB_RECNUM flag,
so hopefully that will improve performance (the doc says it can really
slow things down).
Let me know how these changes look, and if someone gets a chance to look
at performance differences, that would be great.
Thanks,
-sam
On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
Can we conclude this discussion? In summary:
* The current comparison function causes bad IO patterns for iterate
on the dspace db. We can change it but the disk format will change
in new releases.
- If we change it, either we check a version number and provide
the right comparison function, or we perform migration to the new
storage format.
- If we don't change it, we can still improve performance by
iterating from the last entry to the first, but we can't use
DB_MULTIPLE_KEY, which also improves performance for big filesystems.
I don't really have a preference either way.
* If we change PVFS_ds_position from uint32_t to uint64_t, we can
use the handle as the position, and avoid opening the dspace db with
the RECNO flag, which is killing our performance on writes.
I think this sounds good too. We would be happy to help test any
combination of the options you list.
-Phil
------------------------------------------------------------------------
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers