Re: New revlog format, plan page

Pierre-Yves David Thu, 07 Jan 2021 14:21:52 -0800


On 1/7/21 8:52 PM, Pierre-Yves David wrote:



On 1/5/21 7:33 PM, Joerg Sonnenberger wrote:

On Tue, Jan 05, 2021 at 04:38:20PM +0100, Raphaël Gomès wrote:

I've opened a very much draft plan page [1] to try to list all thethings wewant to do in that version and try to figure out an efficient newformat.


"No support for hash version"

I don't think that points really matters. The plan for the hash
migration allows them in theory to coexist fully on the revlog layer and
the main problems for mixing them are on the changeset/manifest layer
anyway. That is, any migration strategy will IMO rewrite all revlogs to
the newer hash anyway and only keep a secondary index for changesets and
maybe manifests.

I agree here, the hash used will likely be defined at repository level(or at least revlog level).


"No support for sidedata"

My big design level concern is that revlog ATM is optimized for fast
integer indexing and append-only storage. At least for some sidedata
use cases I have, that is an ill fit.


The current spirit for sidedata is to have


Looks like this sentence got interru…

The current spirit for sidedata is for them to contain computed datathat are inherent to the changesets (or revision in general) and can becomputed once and for all when the changesets is added.

The storage proposed in revlog v2 requires the data to be added at"revision addition time" but does not requires the sidedata to be nextthe changeset data. This simplify operation that needs the rest of thechangegroupe (manifest, filelog) to be added before computation.

It also means one could "update" the sidedata by "simply" rewriting theindex.

I am sympathetic to a more generic storage for more volatile data.However the current proposal is good enough for the current goal and acouple of other and quite simple to implement. So the plan is to go withit for now.

"No support for unified revlog"

IMO this should be the driving feature. The biggest issue for me is that
it creates two challenges that didn't exist so far:
(1) Inter-file patches and how they interact with the wire protocol
I not worried here, inter-file patches should be able as simple as usinga delta base pointing to the content of another file. And regarding thewireprotocol, we are already very bad at dealing with delta tonon-parent, so we should be about as bad.
(2) Identical revisions stored in different places.
The broad plan of unified revlog is to have store things using a pair ofidentifier (content hash (eg: filenodeid) and content identifier. Thetwo main options here are:
* using a hash of the target content (taking 32bits, "expensive" to search
* using some integer identifier and an associated side mapping forcontent → ID mapping. (and over the wire translation to non localidentifier).
The second option seems more time and space efficient, so I am leaningtoward it.
Either way, I think similar content (ie: same nodeid), should probablybe stored twice in the index to keep current properly, we can reuse thedata segment however. So the uniqness and indexing would happens usingthe (nodeid, contentid) pairs
"No support for larger files"

Supporting large revlog files is sensible and having a store for
design-challenged file systems might be necessary. Microsoft, I'm
looking at you. Otherwise the concern is space use in the revlog file
and RAM use during operations. I don't think the latter is as big an
issue now as it was 15 years ago, but the former is real. But it might
be a good point in time to just go for 64bit offsets by default...
Right now, offset are 6 bytes, so we can use revlog.d up to 281 TB, thatseems good enough. The main "limitation" is about the file content,currently limited at 4GB. Given that we hold these in RAM for now, Idon't think we need to bump it. We can bump it when introducing smarterRAM handling for such file.


--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: New revlog format, plan page

Reply via email to