jhutz found some old notes as well, the main thing I take from them being backwards compatibility.

Up to now I had been only concerned about staying within the existing block structure, which is fairly entertwined with backwards compatibility.

I'll try to list the design considerations I know of for a prdb extension, and say a bit about some of them: backwards and forwards compatibility, the ability for db-maintenance tools to recover from (e.g.) hash table corruption, and preserving existing invariants come to mind. Also, jhutz has just about convinced me that it is irresponsible to use the last spare field for a specific extension (as opposed to a general extensible structure), even if we think that a full format revision is coming "soon", so that sould be added to the design considerations list.

Are three other considerations that we should take into account?

Within these considerations (and any others that come up in this discussion), I am working on a concrete proposal. Would people prefer to see this in the form of ptserver.h struct declarations and comments, or an addition to the prdb format writeup I have at https://github.com/kaduk/openafs/blob/prdb/doc/txt/prdb.txt ?

Per-consideration notes:
%%%

Backwards compatibility is pretty easy, all we have to do is not touch existing structures and stick to strictly extensions of the existing format. Then the new code will handle existing databases just fine.

%%%

It is strongly desirable to have forwards compatibility, namely, an old ptserver should not choke on or scribble over the new style entries. It is hard to guarantee that an old ptserver will not see new style entries without updating all dbservers at once, and there are operational issues to wish to phase in new code. By lucky chance, forward compatibility is possible -- the old code recognizes PRFOREIGN and PRINST in the flags field as being valid entries, but does not generate them. This lets us steal one of these bits, say PRINST, to indicate that an entry is an "extended entry", and within such extended entries use the unallocate flags bits to distinguish between types of entries. There are eight unused "type flags" bits, though perhaps we need not claim all of them, particularly if we use them as an integral enumeration of types and not as flag bits. I'm not entirely sure what other types of extended entries we might want and whether the enum treatment is appropriate. The old notes I'm looking at sketch out a generic "optentry" to hold "option blocks", with a field for what kind of option and an afsUUID to which they belong (to prevent option blocks from being incorrectly reused when a pts id is recycled), but I'm not tied to that. The comments indicate it could be used for supergroup information if someone wanted to clean up/reimplement that code.

%%%

The following fields are invariant in all existing entry structures; retaining them should allow old ptservers to recognize (and print, to some extent) the new entries we add:

flags (really only the low 16 bits, which I call "type_flags" in my format 
writeup)
id
cellid
next

Note that cellid is only rarely used.
Flags including PRINST will tell old code that this block is allocated, and next allows a utility reading the database to follow the chain of blocks in the same logical structure, even if it does not know exactly how to interpret those blocks.

%%%

Another desired property for a format extension is recoverability from minor corruption. extention entries will include the id of the entry they correspond to, and link fields help tie related entries together. That should be enough to (say) reconstruct a hash table if it gets lost or corrupted. This design goal is necessarily less well specified than the others, as it will always be possible to corrupt a database to an unrecoverable state. There is a tradeoff between resiliency and efficiency -- lots of link fields ease reconstruction but consume space and resources. I don't think that our application is particularly sensitive to this tradeoff; any reasonable level of linking is probably fine.

%%%


On Sat, 18 May 2013, Simon Wilkinson wrote:

Across the tree, I've been moving OpenAFS towards using jhash for hashing. However, there are some challenges about using this for ubik databases. In particular, the current code doesn't attempt to cater for endianness. I suspect you will get different answers for jhash2 on big and little endian processors. Fixing this shouldn't be that complex - the original lookup3.c code does the right thing, it's just a case of adapting that for OpenAFS.

Yeah, we'd need to either make a wrapper that does byteswaps or pull in a new snapshot. A new snapshot with 'nbo' or similar in the name sounds promising.

The jenkins family of hashes also has the nice property that the table size need not be a prime -- we can use a size of 8192 and a mask to get the table index instead of a modular division.

-Ben
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to