Re: [RFC] Interaction between strip and caches

2021-02-27 Thread Joerg Sonnenberger
On Fri, Feb 26, 2021 at 10:52:52PM -0500, Augie Fackler wrote:
> 
> 
> > On Dec 14, 2020, at 5:03 PM, Joerg Sonnenberger  wrote:
> > 
> > Hello all,
> > while looking at the revbranchcache, I noticed that it is doing quite an
> > expensive probalistic invalidation dance. It is essentially looking up
> > the revision in the changelog again and compares the first 32bit to see
> > if they (still) match. Other caches are doing cheaper checks like
> > remembering the head revision and node and checking it again to match.
> > The goal is in all cases to detect one of two cases:
> > 
> > (1) Repository additions by a hg instance without support for the cache.
> > (2) Repository removals by strip without update support specific to the
> > cache in use.
> > 
> > The first part is generally handled reasonable well and cheap. Keep
> > track of the number of revisions and process to all missing changesets
> > is something code has to support anyway. The real difficult problem is
> > the second part. I would like us to adopt a more explicit way of dealing
> > with this and opt-in support via a repository requirement. Given that
> > the strip command has finally become part of core, it looks like a good
> > time to do this now.
> > 
> > The first option is to require strip to nuke all caches that it can't
> > update. This is easy to implement and works reliable by nature with all
> > existing caches. It is also the more blunt option.
> 
> Won’t the caches invalidate themselves an this defect happens today?

Only if the cache implementation hooks into strip and is active at the
time. As mentioned at the start, it is expensive and complex. I'd say
80% of the complexity of the new .hgtags cache version I am working on
is dealing with the current cache invalidation.

> > The second option is to keep a journal of strips. This can be a single
> > monotonically increasing counter and every cache just reads the counter
> > and rebuilds itself. Alternatively it could be a full journal that lists
> > the revisions and associated nodes removed. This requires changes to
> > existing caches but has the advantage that strip can be replayed by the
> > cache logic to avoid a full rebuild.
> 
> Potentially complicated, but could be worthwhile in a large repo with
> strips. Is that something you expect to encounter? For the most part
> we’ve historically considered strip an anti-pattern of sorts and not
> worried super hard about optimizing it.

My hope is that if we can handle additions by non-cache-aware clients as
we do now, it is good enough. Replaying changes is moderately cheap if
we don't have to deal with strip.

There is also the related issue of cache invalidation for obsstore, but
the same concerns apply -- replaying changes is easy as long as we don't
have to handle removal of entries.

Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[Bug 6493] New: Changeset 53a5e100b497 references nonexistent "content" subrepo

2021-02-27 Thread mercurial-bugs
https://bz.mercurial-scm.org/show_bug.cgi?id=6493

Bug ID: 6493
   Summary: Changeset 53a5e100b497 references nonexistent
"content" subrepo
   Product: Mercurial project
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: bug
  Priority: wish
 Component: website
  Assignee: bugzi...@mercurial-scm.org
  Reporter: p...@bissex.net
CC: mercurial-devel@mercurial-scm.org

I cloned the hg-website repo to make a patch for one of the two open bugs. When
I tried to push my local clone to my server (i.e. for my fork), it died thus:

$ hg push ssh://h...@hg.sr.ht/~paulbissex/hg-website
pushing to ssh://h...@hg.sr.ht/~paulbissex/hg-website
searching for changes
abort: subrepo 'content' not found in revision 53a5e100b497

Here's the commit. You can see it does make a change to .hgsub:
https://www.mercurial-scm.org/repo/hg-website/rev/53a5e100b497

Attempting to check out that rev produces a 404:

cloning subrepo content from
https://www.mercurial-scm.org/repo/hg-website/content
abort: HTTP Error 404: Not Found

Not sure what tactics are applicable here. But this is an obstacle to users
(like me) who want to contribute fixes to the website. Happy to pitch in on a
fix such as I'm able.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: SHA1 replacement steps

2021-02-27 Thread Joerg Sonnenberger
On Fri, Feb 26, 2021 at 10:49:45PM -0500, Augie Fackler wrote:
> 
> 
> > On Feb 15, 2021, at 9:18 AM, Joerg Sonnenberger  wrote:
> > 
> > Hello all,
> > to help the review process along, here are the rough steps I see in
> > preparation for supporting 256bit hashes:
> > 
> > (1) Move the current 160bit constants from mercurial.node into a
> > subclass. Instead of a global constant, derive the correct constant from
> > the repo/revlog/... instance and pass it down as necessary. The API
> > change itself is in D9750. The expectation for this step is that a
> > repository has one hash size and one set of magic values, but it doesn't
> > change anything regarding the hash function itself. A follow-up change
> > is necessary to replace the global constants (approximately D9465 minus
> > D9750).
> 
> I was somewhat assuming we’d alter various codepaths to always emit
> 256-bit hashes, even if they end in all NUL. Your way sounds a little
> more complicated but is also fine, I don’t feel strongly.

I was considering it. It is harder to ensure that nothing breaks as we
would have to fix all external interactions at the same time from
output commands to network IO. Keeping the code parts somewhat separate
has the advantage that we can sort out the new hash first and the compat
migration as second step without having to worry about compat as much.

> > 
> > (2) Adjust various on-disk formats to switch between the current 160bit
> > and 256bit encoding based on the node constants in use. This would be a
> > non-functional change for existing repositories.
> 
> Are any on-disk formats not already using 256-bit storage?
> I know revlogs are, so I _think_ this is only going to matter for caches.

It's a mixed bag. The textual formats are fine, revlog is fine. Most of
the ad-hoc formats are 160bit only.

> > 
> > (3) Introduce the tagged 256(*) hash function. My plan here is to use
> > Blake2b configured for 248bit output and a suffix of b'\x01'. It is a
> > bit wasteful to reserve 8bit for the tag, but simplifies code. Biggest
> > downside is that the full Blake2b support is not available in Python 2.
> 
> Honestly I think new hash functions is exactly the kind of thing we
> should gate on Python 3. If someone is _really_ enthusiastic they can
> write a backport extension or something, for the users that are
> (bafflingly) caring about modern hashes but stuck on an ancient Python.

Yeah, I'm mostly fine with not supporting Python 2 here. Just needs some
small feature testing I think to not mess up the current status.

> > The tag would allow different hash functions to co-exist and embed
> > existing SHA1 hashes by zero padding.
> > 
> > (4) Adjust hash verification logic to derive the hash function from the
> > tag of a node, not just hard-coding it.
> > 
> > At the end of step 4, most repositories can be converted in a mostly
> > transparent way.
> 
> Notably, you can allow people to only upgrade new hashes if they’re so
> inclined, which lets you preserve gpg signatures etc.

Right, migration will not be forced in the near future. Maybe in a few
years, but not now.

> > Some additional changes might be necessary for allowing
> > "short" node ids for things like .hgtags, but overall, existing hashes
> > should just continue to work as before.
> 
> Overall +1. We can arm-wrestle later about if allowing a “new commits
> are blake2b” mode (vs convert-the-repo mode) is reasonable, I don’t
> think it’ll take a ton of code either way.
> 
> One request: I think we should reserve a couple of suffixes (0xff and
> 0xfe, perhaps?) for “private use” - this would be useful for large
> installations that do strange things with hashing out of necessity.

All F is currently used as marker entry by the hgtags cache for example,
so yeah, it certainly sounds sensible to reserve certain entries.

Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Failed pipeline for branch/default | mercurial-devel | 1593d180

2021-02-27 Thread Heptapod


Pipeline #18553 has failed!

Project: mercurial-devel ( https://foss.heptapod.net/octobus/mercurial-devel )
Branch: branch/default ( 
https://foss.heptapod.net/octobus/mercurial-devel/-/commits/branch/default )

Commit: 1593d180 ( 
https://foss.heptapod.net/octobus/mercurial-devel/-/commit/1593d1803d2f150fa99bd6f705b0aded326e5852
 )
Commit Message: config: test priority involving alias and cli

...
Commit Author: Pierre-Yves David ( https://foss.heptapod.net/marmoute )

Pipeline #18553 ( 
https://foss.heptapod.net/octobus/mercurial-devel/-/pipelines/18553 ) triggered 
by Administrator ( https://foss.heptapod.net/root )
had 1 failed build.

Job #171339 ( 
https://foss.heptapod.net/octobus/mercurial-devel/-/jobs/171339/raw )

Stage: tests
Name: test-py3-chg

-- 
You're receiving this email because of your account on foss.heptapod.net.



___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel