Re: New revlog format, plan page

Joerg Sonnenberger Thu, 07 Jan 2021 10:22:36 -0800

On Thu, Jan 07, 2021 at 12:04:06PM -0500, Josef 'Jeff' Sipek wrote:
> On Tue, Jan 05, 2021 at 19:33:36 +0100, Joerg Sonnenberger wrote:
> > On Tue, Jan 05, 2021 at 04:38:20PM +0100, Raphaël Gomès wrote:
> > > I've opened a very much draft plan page [1] to try to list all the things 
> > > we
> > > want to do in that version and try to figure out an efficient new format.
> > 
> > "No support for hash version"
> > 
> > I don't think that points really matters. The plan for the hash
> > migration allows them in theory to coexist fully on the revlog layer and
> > the main problems for mixing them are on the changeset/manifest layer
> > anyway. That is, any migration strategy will IMO rewrite all revlogs to
> > the newer hash anyway and only keep a secondary index for changesets and
> > maybe manifests.
> 
> At the same time, I think it is sensible (and very useful when looking an a
> revlog without repo-level info) for revlogs to identify which hash they
> contain.  Either in some sort of revlog header or in each entry (if hash can
> vary between entries).


I plan the replacement hash to be tagged, so yes, they can be
individually distinguished. 

> 
> > "No support for sidedata"
> > 
> > My big design level concern is that revlog ATM is optimized for fast
> > integer indexing and append-only storage.
> 
> This is an interesting point.  What *are* the most common revlog operations?
> It probably varies between repos, but I suspect that they are mostly reads
> rather than writes.  As a consequence, a good revlog format would optimize
> for the common case (without making the less common cases completely suck).

The problem is that anything that needs inplace writes is a lot more
difficult to get right for on-disk consistency and for concurrent
read-access. Normal revision data does not change, by design. That's
quite different from any unversioned metadata. This can include
signatures for example, it could include obsolescence data etc.
Separating mutable and immutable data is a natural design choice.

> hg already makes use of CBOR, so it'd be reasonable to use here - either for
> the whole entry or just for parts of it.  For example, CBOR's interegers are
> encoded as 1 byte type, followed by 0, 1, 2, 4, or 8 byte integer.  Smaller
> values use less space.  For example, values less than 2^32 use 1-5 bytes.

Needing a separate index from the index for efficient access would
defeat the point of revlog being an index format in first place...

Joerg
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: New revlog format, plan page

Reply via email to