Re: Call for comments on new dirstate format contents
On Mon, Jun 28, 2021 at 2:50 AM Raphaël Gomès wrote: > > Hello all, > > As you probably know my colleagues at Octobus and I have been working on > a new version of the dirstate, and we're coming pretty close to > something usable in production, so we need to freeze the format soon. > This email is not meant to discuss the exact byte-per-byte layout > details of the format, but rather its contents: what do you think should > be included (or at least have space reserved for) in the new version? > > We have already discussed this at previous sprints and various other > discussion channels, but I thought it'd be better to give a "last call" > chance for people to get their voices heard. > > I remember Google people saying they'd like to separate information that > is frequently written to a separate file to help with their filesystem > shenanigans. What exactly would be the plan and can we do it easily? I > may be pessimistic, but this looks like it would require a lot of work > which (so far) no one wants to sponsor, though I'm happy to be proven > wrong either way. > > To Matt Harbison: you said something about storing exec bit and symlink > info explicitly to help platforms like Windows that don't have them, > could you please elaborate? > > As a general recap (and to help understand some decisions), the new > format will be an append-only tree with no stem compression for > performance reasons. The Python implementation will be functional but > very basic and will offer no purposeful performance improvements (unless > someone wants to have fun!), as we currently only have the bandwidth for > optimizing the Rust implementation. > > An overview of the current target (some implementation-detail level > contents omitted): > > - A docket file that contains global metadata about the dirstate: > - NodeID of the parents (32 bytes reserved, 20 used for now) > - A total count of files (including Removed ones) > - A count of dead (unreachable) bytes > - A count of alive (reachable) bytes > - A hash of ignore patterns (see > https://phab.mercurial-scm.org/D10836) > - In the data file, for each directory/file (it can be both at the > same time): > - The full path in bytes of the file (or directory) > - The full path of the copy source (optional) > - How many tracked recursive descendants it has > - How many recursive copies it has > - Exec bit > - mtime (probably up to nanosecond precision, both files and > directories) > - Clean file size when applicable > - Its state: if it's removed, added, clean, etc. > - Whether it's from p1 or p2 > - Whether it's ambiguous (it appears clean but the mtime is the > same as the last status, probably will only happen with the Python > implementation) > - All of the info needed to get the previous state of a Removed > file in case we `hg add` it back > - (My idea as I type this: ) store the "raw bytes" version of > the OS path if it differs from the normalized hg version (on Windows and > MacOS for example) to cache the filefoldmap. > > I *think* that's it? I might be wrong, if so, please tell me! My recollection of previous discussions can be summarized as "the dirstate file does multiple things: we should split it up." Given the breadth of things tracked in this list, I'm a bit concerned about potential for write amplification where changing something small results in writing out a large number of bytes. But a lot of this hinges on the layout of this file. If we start adding complexity to the file layout to minimize I/O, I worry that we'd be reinventing a bespoke data store and we'd be better served by splitting the content or leveraging something designed for the purpose (like SQLite or LevelDB or somesuch). The only other thing I'd consider adding to this list is something that could help unify with external filesystem tracking tools. Maybe an append only list of "externally monitored" filesystem changes [found from watchman] that could be used to speed up aspects of `hg status`. I haven't thought too much about this and my comment may be off base. But my recollection is that the way fsmonitor integrates today is somewhat hacky. I suspect there's a way to integrate that functionality more tightly into the "dirstate umbrella" so things are less hacky. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[Bug 6532] New: Pulling from user repo with evolve extension enabled
https://bz.mercurial-scm.org/show_bug.cgi?id=6532 Bug ID: 6532 Summary: Pulling from user repo with evolve extension enabled Product: Mercurial Version: 5.7.1 Hardware: PC OS: Linux Status: UNCONFIRMED Severity: bug Priority: wish Component: evolution Assignee: bugzi...@mercurial-scm.org Reporter: zeger...@me.com CC: mercurial-devel@mercurial-scm.org, pierre-yves.da...@ens-lyon.org Python Version: --- Created attachment 2117 --> https://bz.mercurial-scm.org/attachment.cgi?id=2117=edit full traceback I have `evolution = all` enabled and am using the evolve extension (version 10.3.2). When I try to pull from another user's repo I get an error. The other user also has evolution enabled, but is not using the evolve extension. The error message is similar to https://bz.mercurial-scm.org/show_bug.cgi?id=6432 ``` remote: File "/usr/lib64/python3.6/site-packages/mercurial/util.py", line 1747, in __get__ remote: result = self.func(obj) remote: File "/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py", line 518, in stablerange remote: cache.update(self) remote: File "/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/genericcaches.py", line 111, in update remote: self.load(repo) remote: File "/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py", line 400, in load remote: if self._con is not None: remote: File "/usr/lib64/python3.6/site-packages/mercurial/util.py", line 1747, in __get__ remote: result = self.func(obj) remote: File "/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py", line 262, in _con remote: con = self._db() remote: File "/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py", line 251, in _db remote: isolation_level=r"IMMEDIATE") remote: sqlite3.OperationalError: unable to open database file ``` Temporarily disabling the evolve extension allows me to pull the changes. -- You are receiving this mail because: You are on the CC list for the bug. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D10918: Backed out changeset 9b8f326731ac
Mathiasdm created this revision. Herald added a reviewer: hg-reviewers. Herald added a subscriber: mercurial-patches. REVISION SUMMARY Unfortunately, disabling the rust extensions means newer Mercurial versions no longer have the persistent-nodemap feature enabled. This means a Mercurial 5.8.1 on RPM-based Linux distributions will no longer be able to read repositories created by a Mercurial 5.8 on RPM-based Linux distributions. This violates the compatibility rules (see https://www.mercurial-scm.org/wiki/CompatibilityRules ). For this reason, I have to backout this change. I'll try to find another solution to the 'hg purge' crashes. REPOSITORY rHG Mercurial BRANCH stable REVISION DETAIL https://phab.mercurial-scm.org/D10918 AFFECTED FILES contrib/packaging/docker/centos7 contrib/packaging/docker/centos8 contrib/packaging/mercurial.spec CHANGE DETAILS diff --git a/contrib/packaging/mercurial.spec b/contrib/packaging/mercurial.spec --- a/contrib/packaging/mercurial.spec +++ b/contrib/packaging/mercurial.spec @@ -110,14 +110,14 @@ LD_LIBRARY_PATH=$PYPATH $PYPATH/python setup.py install --root="$RPM_BUILD_ROOT" cd - -PATH=$PYPATH:$PATH LD_LIBRARY_PATH=$PYPATH make install PYTHON=%{pythonexe} DESTDIR=$RPM_BUILD_ROOT PREFIX=%{hgpyprefix} MANDIR=%{_mandir} +PATH=$PYPATH:$PATH LD_LIBRARY_PATH=$PYPATH make install PYTHON=%{pythonexe} DESTDIR=$RPM_BUILD_ROOT PREFIX=%{hgpyprefix} MANDIR=%{_mandir} PURE="--rust" mkdir -p $RPM_BUILD_ROOT%{_bindir} ( cd $RPM_BUILD_ROOT%{_bindir}/ && ln -s ../..%{hgpyprefix}/bin/hg . ) ( cd $RPM_BUILD_ROOT%{_bindir}/ && ln -s ../..%{hgpyprefix}/bin/python2.? %{pythonhg} ) %else -make install PYTHON=%{pythonexe} DESTDIR=$RPM_BUILD_ROOT PREFIX=%{_prefix} MANDIR=%{_mandir} +make install PYTHON=%{pythonexe} DESTDIR=$RPM_BUILD_ROOT PREFIX=%{_prefix} MANDIR=%{_mandir} PURE="--rust" %endif diff --git a/contrib/packaging/docker/centos8 b/contrib/packaging/docker/centos8 --- a/contrib/packaging/docker/centos8 +++ b/contrib/packaging/docker/centos8 @@ -13,3 +13,6 @@ # For creating repo meta data RUN yum install -y createrepo + +# For rust extensions +RUN yum install -y cargo diff --git a/contrib/packaging/docker/centos7 b/contrib/packaging/docker/centos7 --- a/contrib/packaging/docker/centos7 +++ b/contrib/packaging/docker/centos7 @@ -15,3 +15,6 @@ # For creating repo meta data RUN yum install -y createrepo + +# For rust extensions +RUN yum install -y cargo To: Mathiasdm, #hg-reviewers Cc: mercurial-patches, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel