Hello all,

As you probably know my colleagues at Octobus and I have been working on a new version of the dirstate, and we're coming pretty close to something usable in production, so we need to freeze the format soon. This email is not meant to discuss the exact byte-per-byte layout details of the format, but rather its contents: what do you think should be included (or at least have space reserved for) in the new version?

We have already discussed this at previous sprints and various other discussion channels, but I thought it'd be better to give a "last call" chance for people to get their voices heard.

I remember Google people saying they'd like to separate information that is frequently written to a separate file to help with their filesystem shenanigans. What exactly would be the plan and can we do it easily? I may be pessimistic, but this looks like it would require a lot of work which (so far) no one wants to sponsor, though I'm happy to be proven wrong either way.

To Matt Harbison: you said something about storing exec bit and symlink info explicitly to help platforms like Windows that don't have them, could you please elaborate?

As a general recap (and to help understand some decisions), the new format will be an append-only tree with no stem compression for performance reasons. The Python implementation will be functional but very basic and will offer no purposeful performance improvements (unless someone wants to have fun!), as we currently only have the bandwidth for optimizing the Rust implementation.

An overview of the current target (some implementation-detail level contents omitted):

    - A docket file that contains global metadata about the dirstate:
        - NodeID of the parents (32 bytes reserved, 20 used for now)
        - A total count of files (including Removed ones)
        - A count of dead (unreachable) bytes
        - A count of alive (reachable) bytes
        - A hash of ignore patterns (see https://phab.mercurial-scm.org/D10836)     - In the data file, for each directory/file (it can be both at the same time):
        - The full path in bytes of the file (or directory)
        - The full path of the copy source (optional)
        - How many tracked recursive descendants it has
        - How many recursive copies it has
        - Exec bit
        - mtime (probably up to nanosecond precision, both files and directories)
        - Clean file size when applicable
        - Its state: if it's removed, added, clean, etc.
        - Whether it's from p1 or p2
        - Whether it's ambiguous (it appears clean but the mtime is the same as the last status, probably will only happen with the Python implementation)         - All of the info needed to get the previous state of a Removed file in case we `hg add` it back         - (My idea as I type this: ) store the "raw bytes" version of the OS path if it differs from the normalized hg version (on Windows and MacOS for example) to cache the filefoldmap.

I *think* that's it? I might be wrong, if so, please tell me!

Raphaël

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to