Re: Call for comments on new dirstate format contents

Kyle Lippincott Tue, 29 Jun 2021 11:49:15 -0700

On Mon, Jun 28, 2021 at 2:50 AM Raphaël Gomès <raphael.go...@octobus.net>
wrote:


> Hello all,
>
> As you probably know my colleagues at Octobus and I have been working on
> a new version of the dirstate, and we're coming pretty close to
> something usable in production, so we need to freeze the format soon.
> This email is not meant to discuss the exact byte-per-byte layout
> details of the format, but rather its contents: what do you think should
> be included (or at least have space reserved for) in the new version?
>
> We have already discussed this at previous sprints and various other
> discussion channels, but I thought it'd be better to give a "last call"
> chance for people to get their voices heard.
>
> I remember Google people saying they'd like to separate information that
> is frequently written to a separate file to help with their filesystem
> shenanigans. What exactly would be the plan and can we do it easily? I
> may be pessimistic, but this looks like it would require a lot of work
> which (so far) no one wants to sponsor, though I'm happy to be proven
> wrong either way.
>

The original thinking had been that we'd have two or three files:
1. p1/p2
2. anything the user did (`hg mv/cp/add/rm`)
3. anything hg can generate in `hg debugrebuilddirstate`

The thinking was that #3 could either be generated by the filesystem
itself, or if there was a network write race (when using filesystems like
our internal CitC filesystem, or maybe with things like NFS, if it can
determine write races, I'm honestly not sure...) it could either just let
one side win arbitrarily.

After learning more, neither of those really work for us. Our virtual
filesystem is "dumb" - it honestly knows very little about the files it's
being asked to store, so it would be a huge change to have it track enough
information to feasibly produce something that could replace the data in #3
above. Additionally, in the network write race scenario, letting one side
win arbitrarily just opens you up to dirstate corruption, which is not a
place anyone wants to be in. :) In the network write race case, we could
teach the virtual filesystem server to delete/poison the file (triggering a
dirstate rebuild on the next command), but it's probably not worthwhile at
this point.

I was then thinking that we could just store #3 "where it belongs", in
.hg/wcache, and just not replicate it. That still opens you up for dirstate
corruption issues (modify the working directory on machine A, and then use
it on machine B - we still need some way of telling machine B it's out of
date; that could be a timestamp in the non-cache part of the dirstate, I
guess?).


> To Matt Harbison: you said something about storing exec bit and symlink
> info explicitly to help platforms like Windows that don't have them,
> could you please elaborate?
>
> As a general recap (and to help understand some decisions), the new
> format will be an append-only tree with no stem compression for
> performance reasons.


Can you elaborate a bit on what this append-only tree looks like (and why
that's preferred) and why stem compression would cause performance issues?

When loading this new dirstate, would it require loading the entire thing
from the beginning and replacing entries with the newer ones? IMHO, we
should be optimizing as much as possible for the read performance, even if
it costs some small amount of write performance. Writes seem less frequent
to me (and more tolerant of slightly higher latency) than things like `hg
status` (being executed by an IDE, or by someone in `watch`, or in a shell
prompt, or something...)


> The Python implementation will be functional but
> very basic and will offer no purposeful performance improvements (unless
> someone wants to have fun!), as we currently only have the bandwidth for
> optimizing the Rust implementation.
>

You say the Python implementation will offer no purposeful performance
improvements, but how likely is it that it will be slower than the current
format? What level of performance degradation would be considered
acceptable?


>
> An overview of the current target (some implementation-detail level
> contents omitted):
>
>      - A docket file that contains global metadata about the dirstate:
>

What happens if the docket and data file get out of sync somehow (maybe hg
crashes in the middle of writing, or Google has a network write race)?


>          - NodeID of the parents (32 bytes reserved, 20 used for now)
>          - A total count of files (including Removed ones)

         - A count of dead (unreachable) bytes
>          - A count of alive (reachable) bytes
>

What are these two?


>          - A hash of ignore patterns (see
> https://phab.mercurial-scm.org/D10836)

     - In the data file, for each directory/file (it can be both at the
> same time):
>          - The full path in bytes of the file (or directory)
>          - The full path of the copy source (optional)
>          - How many tracked recursive descendants it has
>          - How many recursive copies it has
>          - Exec bit
>          - mtime (probably up to nanosecond precision, both files and
> directories)
>

Is there a good way of determining what the timestamp resolution of a
filesystem is? (I can google it, of course, but that doesn't help us
determine it programmatically). Having it able to store nanosecond
precision seems good even if we don't have a way of reliably obtaining the
information such that we can be confident that we're actually getting
something more precise than 1s? (I don't know how various OSes treat these
timestamps when the underlying filesystem doesn't support higher precision;
is it 100% guaranteed that they just extend it with zeroes?)


>          - Clean file size when applicable
>          - Its state: if it's removed, added, clean, etc.
>          - Whether it's from p1 or p2
>          - Whether it's ambiguous (it appears clean but the mtime is the
> same as the last status, probably will only happen with the Python
> implementation)
>          - All of the info needed to get the previous state of a Removed
> file in case we `hg add` it back
>

Can you explain the use case for this (and/or what would be in it)? I would
think that `hg rm foo && echo hi > foo && hg add foo` should be equivalent
to `echo hi > foo`, but I might be missing something?


>          - (My idea as I type this: ) store the "raw bytes" version of
> the OS path if it differs from the normalized hg version (on Windows and
> MacOS for example) to cache the filefoldmap.
>

It might be useful to describe what's in the current dirstate for
comparison. I believe it's:
- p1, p2 (20 byte hashes, 40 bytes total)
- for each *file*:
  - file name (bytes)
  - mtime (1s precision); this is -1 if it's unknown/ambiguous
  - state (normal/add/remove/etc.)
  - copy info (just path? not sure what's in that)

So the following would be new:
- tracking directories
- count of recursive descendants
- count of recursive copies
- exec bit
- mtime with increased precision
- whether it's from p1 or p2
- whether it's ambiguous
- info to recover 'Removed' file state
- "raw bytes" version of the OS path if it differs


> I *think* that's it? I might be wrong, if so, please tell me!
>

My biggest concern is extensibility. As an example, as you were writing
this up, you thought of something else to add, so we probably don't want to
restrict ourselves too much :) The file format is already going to not be
anything resembling fixed record size, having a section for generic
key/value data that extensions can use might be quite useful (and maybe
future core code, though I'm assuming the format can be such that this
would be able to be made to work without the size/parsing complexity of
key/value).

Random things that might make sense to keep in dirstate (I don't have any
real uses for them right now, just thinking what might be desired now or in
the indefinite future):
- some merge conflict information (just a bit saying a file is in a merge
conflict state?)
  - somewhat similar, for cases where hg generates files such as .orig,
.rej, etc., maybe we store these in the dirstate marked as a new state
("generated"? "internal"?). `hg status` could show them using a new
character, and there could be a command (`hg resolve --cleanup`?) that
deletes these files. Right now I believe people generally add these file
names to the ignored pattern, but that's caused us some problems with
overly generic globs in build files.
- information that might be useful for interacting with other VCSes. This
would almost certainly be done via Mercurial extension modules and thus
relegated to the generic key/value store, but as a specific example: when
interfacing with a Perforce repository (
https://www.mercurial-scm.org/wiki/PerfarceExtension), it might be useful
to be able to keep track of the Perforce file type (
https://www.perforce.com/manuals/v15.1/p4guide/appendix.filetypes.html), if
it's non-obvious (something like symlink would be "obvious", some of the
modifiers [specifically the exec bit] are tracked elsewhere, but there's no
way of storing that you want the file to be 'binary', for example).
- (really unsure): case collision information?
- (unsure): some information about files being "smeared", maybe just a list
of filters applied? Things like the EOL extension (
https://www.mercurial-scm.org/wiki/EolExtension) could possibly use it.
- Is there anything LFS would want to put in here?
- Is there anything fsmonitor would want to add?
- Would it make sense to put the "active" bookmark into the dirstate?
Active topic? Active branch? (I'm leaning towards no for all three, but
figured I should mention them :))



> Raphaël
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: Call for comments on new dirstate format contents

Reply via email to