Re: Call for comments on new dirstate format contents
> On Oct 18, 2021, at 4:16 AM, Pierre-Yves David > wrote: > > > On 10/15/21 2:22 PM, Pierre-Yves David wrote: >> >> On 10/13/21 10:57 AM, Simon Sapin wrote: >>> Please let us know of any question or comment! >> >> >> I remember discussion about storing WC exec-bit and symlink status to help >> system without support for thoses (Windows we are looking at you). That is >> necessary to solve things like "issue5883". >> >> Storage wise this should be fairly simpler, so we should be able to reserve >> some useful value for that in the new format. Regarding the implementation >> of a behavior fixing the associated issues, it seems complicated to get >> something done as the freeze is a couple of days away. >> >> I just remembered this and I am not actively working on it today so I don't >> have a very concret idea about it yet. Matt Harbison might have more >> concretes idea about this. > > Okay, so the core of the issue is : "Windows" has no way of storing the exec > bits in the file-system and since we read it from the file system at commit > time, we have no way to set it (or unset it) in general, and during merges in > particular. The solution people have been leaning at for years is "store the > exec bits in the dirstate entry for the file and has some API (+ UI) to > set/unset it. > > So it seems like we need to reserve at least 2 bits for this. (WC_EXEC_BITS, > WC_SYMLINK), then comes the question of how do we manage these two bits: > > 1) We could enforce them all the time, requiring `hg status` runs to > synchronize the value between the fs and the dirstate when they differ, as 1 > mean set and 0 mean unset. That approach requires more work but would allow > repository to move to a different fs without loosing information. > > 2) We could have a third bits to indicate if the feature is in use. > Repository on file system that needs it could start using the feature > conditionally. This as the advantage that we just needs to reserve the flags, > and can implement the feature later, the code behaving as it is today if that > third flag is unset. However this means that moving repository between file > system might loose information. > In this scenario we would still need code to at least preserve the existing > value in the code. > > I guess I'll start poking a (2) and see how much work it actually save use > compared to (1) This feels like it comes back to past desires to separate the parts of dirstate that track user intent and the parts of dirstate that record caches of filesystem state, but it also seems awfully late in the process of v2 to address that here. I read through the design: it seems like a good improvement over the old dirstate format, with my only gripe that we (still) haven’t resolved the cache-vs-userdata duality of dirstate. Seems fine enough to me. AF > > -- > Pierre-Yves David > ___ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Call for comments on new dirstate format contents
On 10/15/21 2:22 PM, Pierre-Yves David wrote: On 10/13/21 10:57 AM, Simon Sapin wrote: Please let us know of any question or comment! I remember discussion about storing WC exec-bit and symlink status to help system without support for thoses (Windows we are looking at you). That is necessary to solve things like "issue5883". Storage wise this should be fairly simpler, so we should be able to reserve some useful value for that in the new format. Regarding the implementation of a behavior fixing the associated issues, it seems complicated to get something done as the freeze is a couple of days away. I just remembered this and I am not actively working on it today so I don't have a very concret idea about it yet. Matt Harbison might have more concretes idea about this. Okay, so the core of the issue is : "Windows" has no way of storing the exec bits in the file-system and since we read it from the file system at commit time, we have no way to set it (or unset it) in general, and during merges in particular. The solution people have been leaning at for years is "store the exec bits in the dirstate entry for the file and has some API (+ UI) to set/unset it. So it seems like we need to reserve at least 2 bits for this. (WC_EXEC_BITS, WC_SYMLINK), then comes the question of how do we manage these two bits: 1) We could enforce them all the time, requiring `hg status` runs to synchronize the value between the fs and the dirstate when they differ, as 1 mean set and 0 mean unset. That approach requires more work but would allow repository to move to a different fs without loosing information. 2) We could have a third bits to indicate if the feature is in use. Repository on file system that needs it could start using the feature conditionally. This as the advantage that we just needs to reserve the flags, and can implement the feature later, the code behaving as it is today if that third flag is unset. However this means that moving repository between file system might loose information. In this scenario we would still need code to at least preserve the existing value in the code. I guess I'll start poking a (2) and see how much work it actually save use compared to (1) -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Call for comments on new dirstate format contents
On 10/13/21 10:57 AM, Simon Sapin wrote: On 28/06/2021 11:49, Raphaël Gomès wrote: Hello all, As you probably know my colleagues at Octobus and I have been working on a new version of the dirstate, and we're coming pretty close to something usable in production, so we need to freeze the format soon. Hello again, Together with the Rust implementation of the new status algorithm, this dirstate-v2 file format enables great performance improvements of `hg status` on large repositories. We Octobus are hoping to stabilize it very soon after a few remaining changes, so that the format will not be experimental anymore in the upcoming Mercurial 6.0 release. It will not yet be enabled by default, but future Mercurial versions will need to be compatible both ways with 6.0 when accessing a given local repository that uses dirstate-v2. A short user guide (how to enable, upgrade, or downgrade) as well as detailed documentation of the file format can be found at: https://www.mercurial-scm.org/repo/hg-committed/file/tip/mercurial/helptext/internals/dirstate-v2.txt … or in a source repository by running `make local` then `./hg help internals.dirstate-v2` The remaining format changes we have planned are: * Add sub-second precision to stored file/symlink mtime, and share its location with that of directory mtime. (This part of the format is a bit of a mess right now since we’re in the middle of this change.) * Maybe add a flag bit to allow marking files as "known modified at this mtime". `hg status` sometimes needs to read the contents of files in case of possible size-preserving changes. If there is indeed a change, currently this read is repeated every time status runs again. The new bit would record that result. * Maybe add some node-specific or dirstate-wide flags or a "mode switch" to make the format and its storage of directory mtimes less tied to details of the current readdir-skipping optimization. (For example, a future version of Mercurial might want to add dirstate nodes for unknown or/and ignored files to skip readdir in more cases.) Non-format changes that we want to have in 6.0: * Merge D11520 and the rest of that stack to have a Python implementation of the format, so that repositories that use it are usable when Rust extensions are not enabled. This is slower, in the order of 0.1 to 0.3 seconds added to `hg status` commands taking 0.4 to 2.5 seconds with dirstate-v1 without Rust on various repositories. * Add configuration to either abort, warn, or silently continue when this slow code path is or would be used. And decide its default. I’m personally inclined at least not to abort by default since the slow path is not *horribly* slow. Please let us know of any question or comment! I remember discussion about storing WC exec-bit and symlink status to help system without support for thoses (Windows we are looking at you). That is necessary to solve things like "issue5883". Storage wise this should be fairly simpler, so we should be able to reserve some useful value for that in the new format. Regarding the implementation of a behavior fixing the associated issues, it seems complicated to get something done as the freeze is a couple of days away. I just remembered this and I am not actively working on it today so I don't have a very concret idea about it yet. Matt Harbison might have more concretes idea about this. Cheers, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Call for comments on new dirstate format contents
On 28/06/2021 11:49, Raphaël Gomès wrote: Hello all, As you probably know my colleagues at Octobus and I have been working on a new version of the dirstate, and we're coming pretty close to something usable in production, so we need to freeze the format soon. Hello again, Together with the Rust implementation of the new status algorithm, this dirstate-v2 file format enables great performance improvements of `hg status` on large repositories. We Octobus are hoping to stabilize it very soon after a few remaining changes, so that the format will not be experimental anymore in the upcoming Mercurial 6.0 release. It will not yet be enabled by default, but future Mercurial versions will need to be compatible both ways with 6.0 when accessing a given local repository that uses dirstate-v2. A short user guide (how to enable, upgrade, or downgrade) as well as detailed documentation of the file format can be found at: https://www.mercurial-scm.org/repo/hg-committed/file/tip/mercurial/helptext/internals/dirstate-v2.txt … or in a source repository by running `make local` then `./hg help internals.dirstate-v2` The remaining format changes we have planned are: * Add sub-second precision to stored file/symlink mtime, and share its location with that of directory mtime. (This part of the format is a bit of a mess right now since we’re in the middle of this change.) * Maybe add a flag bit to allow marking files as "known modified at this mtime". `hg status` sometimes needs to read the contents of files in case of possible size-preserving changes. If there is indeed a change, currently this read is repeated every time status runs again. The new bit would record that result. * Maybe add some node-specific or dirstate-wide flags or a "mode switch" to make the format and its storage of directory mtimes less tied to details of the current readdir-skipping optimization. (For example, a future version of Mercurial might want to add dirstate nodes for unknown or/and ignored files to skip readdir in more cases.) Non-format changes that we want to have in 6.0: * Merge D11520 and the rest of that stack to have a Python implementation of the format, so that repositories that use it are usable when Rust extensions are not enabled. This is slower, in the order of 0.1 to 0.3 seconds added to `hg status` commands taking 0.4 to 2.5 seconds with dirstate-v1 without Rust on various repositories. * Add configuration to either abort, warn, or silently continue when this slow code path is or would be used. And decide its default. I’m personally inclined at least not to abort by default since the slow path is not *horribly* slow. Please let us know of any question or comment! -- Simon Sapin ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Call for comments on new dirstate format contents
On Mon, Jun 28, 2021 at 2:50 AM Raphaël Gomès wrote: > > Hello all, > > As you probably know my colleagues at Octobus and I have been working on > a new version of the dirstate, and we're coming pretty close to > something usable in production, so we need to freeze the format soon. > This email is not meant to discuss the exact byte-per-byte layout > details of the format, but rather its contents: what do you think should > be included (or at least have space reserved for) in the new version? > > We have already discussed this at previous sprints and various other > discussion channels, but I thought it'd be better to give a "last call" > chance for people to get their voices heard. > > I remember Google people saying they'd like to separate information that > is frequently written to a separate file to help with their filesystem > shenanigans. What exactly would be the plan and can we do it easily? I > may be pessimistic, but this looks like it would require a lot of work > which (so far) no one wants to sponsor, though I'm happy to be proven > wrong either way. > > To Matt Harbison: you said something about storing exec bit and symlink > info explicitly to help platforms like Windows that don't have them, > could you please elaborate? > > As a general recap (and to help understand some decisions), the new > format will be an append-only tree with no stem compression for > performance reasons. The Python implementation will be functional but > very basic and will offer no purposeful performance improvements (unless > someone wants to have fun!), as we currently only have the bandwidth for > optimizing the Rust implementation. > > An overview of the current target (some implementation-detail level > contents omitted): > > - A docket file that contains global metadata about the dirstate: > - NodeID of the parents (32 bytes reserved, 20 used for now) > - A total count of files (including Removed ones) > - A count of dead (unreachable) bytes > - A count of alive (reachable) bytes > - A hash of ignore patterns (see > https://phab.mercurial-scm.org/D10836) > - In the data file, for each directory/file (it can be both at the > same time): > - The full path in bytes of the file (or directory) > - The full path of the copy source (optional) > - How many tracked recursive descendants it has > - How many recursive copies it has > - Exec bit > - mtime (probably up to nanosecond precision, both files and > directories) > - Clean file size when applicable > - Its state: if it's removed, added, clean, etc. > - Whether it's from p1 or p2 > - Whether it's ambiguous (it appears clean but the mtime is the > same as the last status, probably will only happen with the Python > implementation) > - All of the info needed to get the previous state of a Removed > file in case we `hg add` it back > - (My idea as I type this: ) store the "raw bytes" version of > the OS path if it differs from the normalized hg version (on Windows and > MacOS for example) to cache the filefoldmap. > > I *think* that's it? I might be wrong, if so, please tell me! My recollection of previous discussions can be summarized as "the dirstate file does multiple things: we should split it up." Given the breadth of things tracked in this list, I'm a bit concerned about potential for write amplification where changing something small results in writing out a large number of bytes. But a lot of this hinges on the layout of this file. If we start adding complexity to the file layout to minimize I/O, I worry that we'd be reinventing a bespoke data store and we'd be better served by splitting the content or leveraging something designed for the purpose (like SQLite or LevelDB or somesuch). The only other thing I'd consider adding to this list is something that could help unify with external filesystem tracking tools. Maybe an append only list of "externally monitored" filesystem changes [found from watchman] that could be used to speed up aspects of `hg status`. I haven't thought too much about this and my comment may be off base. But my recollection is that the way fsmonitor integrates today is somewhat hacky. I suspect there's a way to integrate that functionality more tightly into the "dirstate umbrella" so things are less hacky. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Call for comments on new dirstate format contents
On 29/06/2021 20:48, Kyle Lippincott wrote: Can you elaborate a bit on what this append-only tree looks like (and why that's preferred) It’s a tree in that there are nodes for files and directories. We can quickly find root nodes, and from a given node we can quickly find its direct child nodes, all without parsing the entire file. It’s "append-mostly" because changes are made by adding new nodes at the end of the file and reusing nodes for unchanged sub-trees. Nodes that have been replaced become unreachable but still take up space. Occasionally based on some heuristic, the whole file is rewritten without unreachable nodes. This makes most writes cheaper than re-serializing and writing the entire file. and why stem compression would cause performance issues? Each node contains its full path from the repository root. This allows status code to pass around a slice (pointer + length) to the middle of the mmap’ed file. If a node only had its basename we’d have to allocate a string to reconstitute a path by concatenating the names of ancestor directories. The cost of many memory allocations can add up. When loading this new dirstate, would it require loading the entire thing from the beginning and replacing entries with the newer ones? No, that’s the point of making it a tree of fixed-size nodes that contain data at fixed-size offsets, with pseudo-pointers for variable-size data (paths and child nodes). You say the Python implementation will offer no purposeful performance improvements, but how likely is it that it will be slower than the current format? The current implementations (Python and C) of dirstate-v1 work by parsing the entire dirstate into large Python dicts. The Python implementation of dirstate-v2 would do the same, only parsing a different format. What level of performance degradation would be considered acceptable? Good question. We don’t have a hard criteria. However this fallback implementation of dirstate-v2 will only be used when for accessing an existing local repository that uses that format. When creating a new clone, dirstate-v2 is only used if a fast implementation is available. What happens if the docket and data file get out of sync somehow (maybe hg crashes in the middle of writing, or Google has a network write race)? A docket that refers to a new data file is only swap-renamed after the data file was finished writing. I don’t know what ordering guarantees between writes exist or not on Google’s network filesystem. - A count of dead (unreachable) bytes - A count of alive (reachable) bytes What are these two? Only one of them is needed, the other can be deduced by subtracting from the size of the file. Unreachable means obsolete parts of the file that have been replaced by other nodes, see "append-mostly" above. The heuristic for rewriting the whole file to get rid of unreachable data is based on this counter. Is there a good way of determining what the timestamp resolution of a filesystem is? As far as I know there is not. What we can do is create a temporary file and take its mtime as the current time with the same (unknown) truncation as other file’s mtimes. If we observe a "current mtime" strictly later than a given file’s mtime, we know that further changes to that file are extremely likely[1] to cause a different mtime since the clock has already ticked since the last change. ([1] The system clock is not monotonous, so it could jump back and still have the same clock-reported date happen again. If we get unlucky another change to the file could happen exactly then, modulo truncation.) See comments starting at https://www.mercurial-scm.org/repo/hg-committed/file/5fa083a5ff04/rust/hg-core/src/dirstate_tree/status.rs#l401 (I don't know how various OSes treat these timestamps when the underlying filesystem doesn't support higher precision; is it 100% guaranteed that they just extend it with zeroes?) Regardless, there’s also the case where the filesystem can store enough bits but the kernel only updates an internal clock at some arbitrary ticks: https://stackoverflow.com/a/14393315/1162888 - All of the info needed to get the previous state of a Removed file in case we `hg add` it back Can you explain the use case for this (and/or what would be in it)? I would think that `hg rm foo && echo hi > foo && hg add foo` should be equivalent to `echo hi > foo`, but I might be missing something? I still don’t fully understand this, but it also exists in dirstate-v1. I think it’s relevant when in the middle of merging. https://www.mercurial-scm.org/wiki/DirState#Summary My biggest concern is extensibility. As an example, as you were writing this up, you thought of something else to add, so we probably don't want to restrict ourselves too much :) The file format is already going to not be anything resembling fixed record size, having a s
Re: Call for comments on new dirstate format contents
On Mon, Jun 28, 2021 at 2:50 AM Raphaël Gomès wrote: > Hello all, > > As you probably know my colleagues at Octobus and I have been working on > a new version of the dirstate, and we're coming pretty close to > something usable in production, so we need to freeze the format soon. > This email is not meant to discuss the exact byte-per-byte layout > details of the format, but rather its contents: what do you think should > be included (or at least have space reserved for) in the new version? > > We have already discussed this at previous sprints and various other > discussion channels, but I thought it'd be better to give a "last call" > chance for people to get their voices heard. > > I remember Google people saying they'd like to separate information that > is frequently written to a separate file to help with their filesystem > shenanigans. What exactly would be the plan and can we do it easily? I > may be pessimistic, but this looks like it would require a lot of work > which (so far) no one wants to sponsor, though I'm happy to be proven > wrong either way. > The original thinking had been that we'd have two or three files: 1. p1/p2 2. anything the user did (`hg mv/cp/add/rm`) 3. anything hg can generate in `hg debugrebuilddirstate` The thinking was that #3 could either be generated by the filesystem itself, or if there was a network write race (when using filesystems like our internal CitC filesystem, or maybe with things like NFS, if it can determine write races, I'm honestly not sure...) it could either just let one side win arbitrarily. After learning more, neither of those really work for us. Our virtual filesystem is "dumb" - it honestly knows very little about the files it's being asked to store, so it would be a huge change to have it track enough information to feasibly produce something that could replace the data in #3 above. Additionally, in the network write race scenario, letting one side win arbitrarily just opens you up to dirstate corruption, which is not a place anyone wants to be in. :) In the network write race case, we could teach the virtual filesystem server to delete/poison the file (triggering a dirstate rebuild on the next command), but it's probably not worthwhile at this point. I was then thinking that we could just store #3 "where it belongs", in .hg/wcache, and just not replicate it. That still opens you up for dirstate corruption issues (modify the working directory on machine A, and then use it on machine B - we still need some way of telling machine B it's out of date; that could be a timestamp in the non-cache part of the dirstate, I guess?). > To Matt Harbison: you said something about storing exec bit and symlink > info explicitly to help platforms like Windows that don't have them, > could you please elaborate? > > As a general recap (and to help understand some decisions), the new > format will be an append-only tree with no stem compression for > performance reasons. Can you elaborate a bit on what this append-only tree looks like (and why that's preferred) and why stem compression would cause performance issues? When loading this new dirstate, would it require loading the entire thing from the beginning and replacing entries with the newer ones? IMHO, we should be optimizing as much as possible for the read performance, even if it costs some small amount of write performance. Writes seem less frequent to me (and more tolerant of slightly higher latency) than things like `hg status` (being executed by an IDE, or by someone in `watch`, or in a shell prompt, or something...) > The Python implementation will be functional but > very basic and will offer no purposeful performance improvements (unless > someone wants to have fun!), as we currently only have the bandwidth for > optimizing the Rust implementation. > You say the Python implementation will offer no purposeful performance improvements, but how likely is it that it will be slower than the current format? What level of performance degradation would be considered acceptable? > > An overview of the current target (some implementation-detail level > contents omitted): > > - A docket file that contains global metadata about the dirstate: > What happens if the docket and data file get out of sync somehow (maybe hg crashes in the middle of writing, or Google has a network write race)? > - NodeID of the parents (32 bytes reserved, 20 used for now) > - A total count of files (including Removed ones) - A count of dead (unreachable) bytes > - A count of alive (reachable) bytes > What are these two? > - A hash of ignore patterns (see > https://phab.mercurial-scm.org/D10836) - In the data file, for each directory/file (it can be both at the > same time): > - The full path in bytes of the file (or directory) > - The full path of the copy source (optional) > - How many tracked recursive descendants it has > -
Re: Call for comments on new dirstate format contents
On Mon, Jun 28, 2021 at 5:49 AM Raphaël Gomès wrote: > > To Matt Harbison: you said something about storing exec bit and symlink > info explicitly to help platforms like Windows that don't have them, > could you please elaborate? The idea is essentially this, without having to steal undefined/undocumented bits: https://www.mercurial-scm.org/wiki/DirState#Proposed_extensions I'm not sure what the point of having fallback vs real(?) bits was, but I guess it could be useful to disambiguate on a system that *does* support +x or symlink. Support for this would fix issue2020, issue5883, and maybe a few other corner cases. > As a general recap (and to help understand some decisions), the new > format will be an append-only tree with no stem compression for > performance reasons. The Python implementation will be functional but > very basic and will offer no purposeful performance improvements (unless > someone wants to have fun!), as we currently only have the bandwidth for > optimizing the Rust implementation. Is a toggle-able bit like this a hassle for an append-only data structure? I suppose it's not much different than adding and removing a file several times, but I haven't paid a lot of attention to this discussion up to this point. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Call for comments on new dirstate format contents
Hello all, As you probably know my colleagues at Octobus and I have been working on a new version of the dirstate, and we're coming pretty close to something usable in production, so we need to freeze the format soon. This email is not meant to discuss the exact byte-per-byte layout details of the format, but rather its contents: what do you think should be included (or at least have space reserved for) in the new version? We have already discussed this at previous sprints and various other discussion channels, but I thought it'd be better to give a "last call" chance for people to get their voices heard. I remember Google people saying they'd like to separate information that is frequently written to a separate file to help with their filesystem shenanigans. What exactly would be the plan and can we do it easily? I may be pessimistic, but this looks like it would require a lot of work which (so far) no one wants to sponsor, though I'm happy to be proven wrong either way. To Matt Harbison: you said something about storing exec bit and symlink info explicitly to help platforms like Windows that don't have them, could you please elaborate? As a general recap (and to help understand some decisions), the new format will be an append-only tree with no stem compression for performance reasons. The Python implementation will be functional but very basic and will offer no purposeful performance improvements (unless someone wants to have fun!), as we currently only have the bandwidth for optimizing the Rust implementation. An overview of the current target (some implementation-detail level contents omitted): - A docket file that contains global metadata about the dirstate: - NodeID of the parents (32 bytes reserved, 20 used for now) - A total count of files (including Removed ones) - A count of dead (unreachable) bytes - A count of alive (reachable) bytes - A hash of ignore patterns (see https://phab.mercurial-scm.org/D10836) - In the data file, for each directory/file (it can be both at the same time): - The full path in bytes of the file (or directory) - The full path of the copy source (optional) - How many tracked recursive descendants it has - How many recursive copies it has - Exec bit - mtime (probably up to nanosecond precision, both files and directories) - Clean file size when applicable - Its state: if it's removed, added, clean, etc. - Whether it's from p1 or p2 - Whether it's ambiguous (it appears clean but the mtime is the same as the last status, probably will only happen with the Python implementation) - All of the info needed to get the previous state of a Removed file in case we `hg add` it back - (My idea as I type this: ) store the "raw bytes" version of the OS path if it differs from the normalized hg version (on Windows and MacOS for example) to cache the filefoldmap. I *think* that's it? I might be wrong, if so, please tell me! Raphaël ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel