Hello all,
As you probably know my colleagues at Octobus and I have been working on
a new version of the dirstate, and we're coming pretty close to
something usable in production, so we need to freeze the format soon.
This email is not meant to discuss the exact byte-per-byte layout
details of the format, but rather its contents: what do you think should
be included (or at least have space reserved for) in the new version?
We have already discussed this at previous sprints and various other
discussion channels, but I thought it'd be better to give a "last call"
chance for people to get their voices heard.
I remember Google people saying they'd like to separate information that
is frequently written to a separate file to help with their filesystem
shenanigans. What exactly would be the plan and can we do it easily? I
may be pessimistic, but this looks like it would require a lot of work
which (so far) no one wants to sponsor, though I'm happy to be proven
wrong either way.
To Matt Harbison: you said something about storing exec bit and symlink
info explicitly to help platforms like Windows that don't have them,
could you please elaborate?
As a general recap (and to help understand some decisions), the new
format will be an append-only tree with no stem compression for
performance reasons. The Python implementation will be functional but
very basic and will offer no purposeful performance improvements (unless
someone wants to have fun!), as we currently only have the bandwidth for
optimizing the Rust implementation.
An overview of the current target (some implementation-detail level
contents omitted):
- A docket file that contains global metadata about the dirstate:
- NodeID of the parents (32 bytes reserved, 20 used for now)
- A total count of files (including Removed ones)
- A count of dead (unreachable) bytes
- A count of alive (reachable) bytes
- A hash of ignore patterns (see
https://phab.mercurial-scm.org/D10836)
- In the data file, for each directory/file (it can be both at the
same time):
- The full path in bytes of the file (or directory)
- The full path of the copy source (optional)
- How many tracked recursive descendants it has
- How many recursive copies it has
- Exec bit
- mtime (probably up to nanosecond precision, both files and
directories)
- Clean file size when applicable
- Its state: if it's removed, added, clean, etc.
- Whether it's from p1 or p2
- Whether it's ambiguous (it appears clean but the mtime is the
same as the last status, probably will only happen with the Python
implementation)
- All of the info needed to get the previous state of a Removed
file in case we `hg add` it back
- (My idea as I type this: ) store the "raw bytes" version of
the OS path if it differs from the normalized hg version (on Windows and
MacOS for example) to cache the filefoldmap.
I *think* that's it? I might be wrong, if so, please tell me!
Raphaël
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel