Evgeny Kotkov via dev wrote on Tue, Dec 20, 2022 at 11:14:00 +0300:
> [Moving discussion to a new thread]
>
> We currently have a problem that a working copy relies on the checksum type
> with known collisions (SHA1). A solution to that problem
Why is libsvn_wc's use of SHA-1 a problem? What's the scenario wherein
Subversion will behave differently than it should?
> is to switch to a different checksum type without known collisions in
> one of the newer working copy formats.
Such as SHA-1 salted by NODES.LOCAL_RELPATH and NODES.WC_ID (or a per-wc UUID)?
> Since we plan on shipping a new working copy format in 1.15, this seems to
> be an appropriate moment of time to decide whether we'd also want to switch
> to a checksum type without known collisions in that new format.
>
What's the acceptance test we use for candidate checksum algorithms?
You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that? Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
nothing else, return the input.
2. Else, return the SHA-1 of the input.
This algorithm has a trivial first preimage attack. If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?
> Below are the arguments for including a switch to a different checksum type
> in the working copy format for 1.15:
>
> 1) Since the "is the file modified?" check now compares checksums, leaving
>everything as-is may be considered a regression, because it would
>introduce additional cases where a working copy currently relies on
>comparing checksums with known collisions.
>
Well, SHA-1 is still collision-free so long as one is not deliberately
trying to use collisions, so this would only be a regression if we
consider "Deliberately store files that have the same checksum" to be
a use-case. Do we?
I recall we discussed this when shattered.io was announced, and we
didn't rush to upgrade the checksums we use everywhere, so I guess back
then we came to the conclusion that wasn't a use-case. (Of course we
can change our opinion; that's just a datapoint, and there may be more,
on both sides, in the old thread.)
I looked for the old thread and didn't find it. (I looked in the
private@ archives too in case the thread was there.)
> 2) We already need a working copy format bump for the pristines-on-demand
>feature. So using that format bump to solve the SHA1 issue might reduce
>the overall number of required bumps for users (assuming that we'll still
>need to switch from SHA1 at some point later).
>
Considering that 1.15 will support reading and writing both f31 and f32,
the "overall number of required bumps" between 1.8 and trunk@HEAD is
zero, meaning the proposed change can't reduce that number.
> 3) While the pristines-on-demand feature is not released, upgrading
>with a switch to the new checksum type seems to be possible without
>requiring a network fetch.
I infer the scenario in question here is upgrading a (say) pristinesless
wc to a a newer format that supports a new checksum algorithm.
>But if some of the pristines are optional, we lose the possibility
>to rehash all contents in place. So we might find ourselves having
>to choose between two worse alternatives of either requiring
>a network fetch during upgrade or entirely prohibiting an upgrade
>of working copies with optional pristines.
Why would we want to rehash everything in place? The 1.15→1.16 upgrade
could simply leave pristineless files' checksums as SHA-1 until the next
«svn up», just like «svnadmin upgrade» of FSFS doesn't retroactively add
SHA-1 checksums to node-rev headers or "-file" or "-dir" indicators in
the changed-paths section.
There may be yet other alternatives.
> Thoughts?
I'm not voting either -0 or +0 at this time.
Cheers,
Daniel