Re: Call for comments on new dirstate format contents

2021-06-30 Thread Gregory Szorc
On Mon, Jun 28, 2021 at 2:50 AM Raphaël Gomès  wrote:
>
> Hello all,
>
> As you probably know my colleagues at Octobus and I have been working on
> a new version of the dirstate, and we're coming pretty close to
> something usable in production, so we need to freeze the format soon.
> This email is not meant to discuss the exact byte-per-byte layout
> details of the format, but rather its contents: what do you think should
> be included (or at least have space reserved for) in the new version?
>
> We have already discussed this at previous sprints and various other
> discussion channels, but I thought it'd be better to give a "last call"
> chance for people to get their voices heard.
>
> I remember Google people saying they'd like to separate information that
> is frequently written to a separate file to help with their filesystem
> shenanigans. What exactly would be the plan and can we do it easily? I
> may be pessimistic, but this looks like it would require a lot of work
> which (so far) no one wants to sponsor, though I'm happy to be proven
> wrong either way.
>
> To Matt Harbison: you said something about storing exec bit and symlink
> info explicitly to help platforms like Windows that don't have them,
> could you please elaborate?
>
> As a general recap (and to help understand some decisions), the new
> format will be an append-only tree with no stem compression for
> performance reasons. The Python implementation will be functional but
> very basic and will offer no purposeful performance improvements (unless
> someone wants to have fun!), as we currently only have the bandwidth for
> optimizing the Rust implementation.
>
> An overview of the current target (some implementation-detail level
> contents omitted):
>
>  - A docket file that contains global metadata about the dirstate:
>  - NodeID of the parents (32 bytes reserved, 20 used for now)
>  - A total count of files (including Removed ones)
>  - A count of dead (unreachable) bytes
>  - A count of alive (reachable) bytes
>  - A hash of ignore patterns (see
> https://phab.mercurial-scm.org/D10836)
>  - In the data file, for each directory/file (it can be both at the
> same time):
>  - The full path in bytes of the file (or directory)
>  - The full path of the copy source (optional)
>  - How many tracked recursive descendants it has
>  - How many recursive copies it has
>  - Exec bit
>  - mtime (probably up to nanosecond precision, both files and
> directories)
>  - Clean file size when applicable
>  - Its state: if it's removed, added, clean, etc.
>  - Whether it's from p1 or p2
>  - Whether it's ambiguous (it appears clean but the mtime is the
> same as the last status, probably will only happen with the Python
> implementation)
>  - All of the info needed to get the previous state of a Removed
> file in case we `hg add` it back
>  - (My idea as I type this: ) store the "raw bytes" version of
> the OS path if it differs from the normalized hg version (on Windows and
> MacOS for example) to cache the filefoldmap.
>
> I *think* that's it? I might be wrong, if so, please tell me!

My recollection of previous discussions can be summarized as "the
dirstate file does multiple things: we should split it up."

Given the breadth of things tracked in this list, I'm a bit concerned
about potential for write amplification where changing something small
results in writing out a large number of bytes. But a lot of this
hinges on the layout of this file. If we start adding complexity to
the file layout to minimize I/O, I worry that we'd be reinventing a
bespoke data store and we'd be better served by splitting the content
or leveraging something designed for the purpose (like SQLite or
LevelDB or somesuch).

The only other thing I'd consider adding to this list is something
that could help unify with external filesystem tracking tools. Maybe
an append only list of "externally monitored" filesystem changes
[found from watchman] that could be used to speed up aspects of `hg
status`. I haven't thought too much about this and my comment may be
off base. But my recollection is that the way fsmonitor integrates
today is somewhat hacky. I suspect there's a way to integrate that
functionality more tightly into the "dirstate umbrella" so things are
less hacky.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[Bug 6532] New: Pulling from user repo with evolve extension enabled

2021-06-30 Thread mercurial-bugs
https://bz.mercurial-scm.org/show_bug.cgi?id=6532

Bug ID: 6532
   Summary: Pulling from user repo with evolve extension enabled
   Product: Mercurial
   Version: 5.7.1
  Hardware: PC
OS: Linux
Status: UNCONFIRMED
  Severity: bug
  Priority: wish
 Component: evolution
  Assignee: bugzi...@mercurial-scm.org
  Reporter: zeger...@me.com
CC: mercurial-devel@mercurial-scm.org,
pierre-yves.da...@ens-lyon.org
Python Version: ---

Created attachment 2117
  --> https://bz.mercurial-scm.org/attachment.cgi?id=2117=edit
full traceback

I have `evolution = all` enabled and am using the evolve extension (version
10.3.2).

When I try to pull from another user's repo I get an error.
The other user also has evolution enabled, but is not using the evolve
extension.
The error message is similar to
https://bz.mercurial-scm.org/show_bug.cgi?id=6432

```
remote:   File "/usr/lib64/python3.6/site-packages/mercurial/util.py", line
1747, in __get__
remote: result = self.func(obj)
remote:   File
"/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py",
line 518, in stablerange
remote: cache.update(self)
remote:   File
"/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/genericcaches.py",
line 111, in update
remote: self.load(repo)
remote:   File
"/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py",
line 400, in load
remote: if self._con is not None:
remote:   File "/usr/lib64/python3.6/site-packages/mercurial/util.py", line
1747, in __get__
remote: result = self.func(obj)
remote:   File
"/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py",
line 262, in _con
remote: con = self._db()
remote:   File
"/home/zvandeva/.local/lib/python3.6/site-packages/hgext3rd/evolve/stablerangecache.py",
line 251, in _db
remote: isolation_level=r"IMMEDIATE")
remote: sqlite3.OperationalError: unable to open database file
```

Temporarily disabling the evolve extension allows me to pull the changes.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D10918: Backed out changeset 9b8f326731ac

2021-06-30 Thread Mathias De Maré
Mathiasdm created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  Unfortunately, disabling the rust extensions means newer
  Mercurial versions no longer have the persistent-nodemap
  feature enabled.
  This means a Mercurial 5.8.1 on RPM-based Linux distributions
  will no longer be able to read repositories created
  by a Mercurial 5.8 on RPM-based Linux distributions.
  
  This violates the compatibility rules
  (see https://www.mercurial-scm.org/wiki/CompatibilityRules ).
  
  For this reason, I have to backout this change.
  I'll try to find another solution to the 'hg purge' crashes.

REPOSITORY
  rHG Mercurial

BRANCH
  stable

REVISION DETAIL
  https://phab.mercurial-scm.org/D10918

AFFECTED FILES
  contrib/packaging/docker/centos7
  contrib/packaging/docker/centos8
  contrib/packaging/mercurial.spec

CHANGE DETAILS

diff --git a/contrib/packaging/mercurial.spec b/contrib/packaging/mercurial.spec
--- a/contrib/packaging/mercurial.spec
+++ b/contrib/packaging/mercurial.spec
@@ -110,14 +110,14 @@
 LD_LIBRARY_PATH=$PYPATH $PYPATH/python setup.py install 
--root="$RPM_BUILD_ROOT"
 cd -
 
-PATH=$PYPATH:$PATH LD_LIBRARY_PATH=$PYPATH make install PYTHON=%{pythonexe} 
DESTDIR=$RPM_BUILD_ROOT PREFIX=%{hgpyprefix} MANDIR=%{_mandir}
+PATH=$PYPATH:$PATH LD_LIBRARY_PATH=$PYPATH make install PYTHON=%{pythonexe} 
DESTDIR=$RPM_BUILD_ROOT PREFIX=%{hgpyprefix} MANDIR=%{_mandir} PURE="--rust"
 mkdir -p $RPM_BUILD_ROOT%{_bindir}
 ( cd $RPM_BUILD_ROOT%{_bindir}/ && ln -s ../..%{hgpyprefix}/bin/hg . )
 ( cd $RPM_BUILD_ROOT%{_bindir}/ && ln -s ../..%{hgpyprefix}/bin/python2.? 
%{pythonhg} )
 
 %else
 
-make install PYTHON=%{pythonexe} DESTDIR=$RPM_BUILD_ROOT PREFIX=%{_prefix} 
MANDIR=%{_mandir}
+make install PYTHON=%{pythonexe} DESTDIR=$RPM_BUILD_ROOT PREFIX=%{_prefix} 
MANDIR=%{_mandir} PURE="--rust"
 
 %endif
 
diff --git a/contrib/packaging/docker/centos8 b/contrib/packaging/docker/centos8
--- a/contrib/packaging/docker/centos8
+++ b/contrib/packaging/docker/centos8
@@ -13,3 +13,6 @@
 
 # For creating repo meta data
 RUN yum install -y createrepo
+
+# For rust extensions
+RUN yum install -y cargo
diff --git a/contrib/packaging/docker/centos7 b/contrib/packaging/docker/centos7
--- a/contrib/packaging/docker/centos7
+++ b/contrib/packaging/docker/centos7
@@ -15,3 +15,6 @@
 
 # For creating repo meta data
 RUN yum install -y createrepo
+
+# For rust extensions
+RUN yum install -y cargo



To: Mathiasdm, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel