Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Tue, Dec 20, 2022 at 11:14:00 +0300:
> [Moving discussion to a new thread]
> 
> We currently have a problem that a working copy relies on the checksum type
> with known collisions (SHA1).  A solution to that problem

Why is libsvn_wc's use of SHA-1 a problem?  What's the scenario wherein
Subversion will behave differently than it should?

> is to switch to a different checksum type without known collisions in
> one of the newer working copy formats.

Such as SHA-1 salted by NODES.LOCAL_RELPATH and NODES.WC_ID (or a per-wc UUID)?

> Since we plan on shipping a new working copy format in 1.15, this seems to
> be an appropriate moment of time to decide whether we'd also want to switch
> to a checksum type without known collisions in that new format.
> 

What's the acceptance test we use for candidate checksum algorithms?

You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that?  Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
   nothing else, return the input.
2. Else, return the SHA-1 of the input.

This algorithm has a trivial first preimage attack.  If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?

> Below are the arguments for including a switch to a different checksum type
> in the working copy format for 1.15:
> 
> 1) Since the "is the file modified?" check now compares checksums, leaving
>everything as-is may be considered a regression, because it would
>introduce additional cases where a working copy currently relies on
>comparing checksums with known collisions.
> 

Well, SHA-1 is still collision-free so long as one is not deliberately
trying to use collisions, so this would only be a regression if we
consider "Deliberately store files that have the same checksum" to be
a use-case.  Do we?

I recall we discussed this when shattered.io was announced, and we
didn't rush to upgrade the checksums we use everywhere, so I guess back
then we came to the conclusion that wasn't a use-case.  (Of course we
can change our opinion; that's just a datapoint, and there may be more,
on both sides, in the old thread.)

I looked for the old thread and didn't find it.  (I looked in the
private@ archives too in case the thread was there.)

> 2) We already need a working copy format bump for the pristines-on-demand
>feature.  So using that format bump to solve the SHA1 issue might reduce
>the overall number of required bumps for users (assuming that we'll still
>need to switch from SHA1 at some point later).
> 

Considering that 1.15 will support reading and writing both f31 and f32,
the "overall number of required bumps" between 1.8 and trunk@HEAD is
zero, meaning the proposed change can't reduce that number.

> 3) While the pristines-on-demand feature is not released, upgrading
>with a switch to the new checksum type seems to be possible without
>requiring a network fetch.

I infer the scenario in question here is upgrading a (say) pristinesless
wc to a a newer format that supports a new checksum algorithm.

>But if some of the pristines are optional, we lose the possibility
>to rehash all contents in place.  So we might find ourselves having
>to choose between two worse alternatives of either requiring
>a network fetch during upgrade or entirely prohibiting an upgrade
>of working copies with optional pristines.

Why would we want to rehash everything in place?  The 1.15→1.16 upgrade
could simply leave pristineless files' checksums as SHA-1 until the next
«svn up», just like «svnadmin upgrade» of FSFS doesn't retroactively add
SHA-1 checksums to node-rev headers or "-file" or "-dir" indicators in
the changed-paths section.

There may be yet other alternatives.

> Thoughts?

I'm not voting either -0 or +0 at this time.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Branko Čibej

On 20.12.2022 09:14, Evgeny Kotkov wrote:

2) We already need a working copy format bump for the pristines-on-demand
feature.  So using that format bump to solve the SHA1 issue might reduce
the overall number of required bumps for users (assuming that we'll still
need to switch from SHA1 at some point later).


Using a new hashing algorithm in the working copy is relatively simple. 
Making such a change backwards-compatible is not. It would be really 
nice if this could be done in a way that allows newer clients to still 
support older working copies without upgrading them; after all, we have 
the infrastructure for this in place now.


-- Brane

Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> > While here, I would like to raise a topic of incorporating a switch from
> > SHA1 to a different checksum type (without known collisions) for the new
> > working copy format.  This topic is relevant to the pristines-on-demand
> > branch, because the new "is the file modified?" check relies on the
> > checksum comparison, instead of comparing the contents of working and
> > pristine files.
> >
> > And so while I consider it to be out of the scope of the pristines-on-
> > demand branch, I think that we might want to evaluate if this is something
> > that should be a part of the next release.
>
> Good point.  Maybe worth a new thread?

[Moving discussion to a new thread]

We currently have a problem that a working copy relies on the checksum type
with known collisions (SHA1).  A solution to that problem is to switch to a
different checksum type without known collisions in one of the newer working
copy formats.

Since we plan on shipping a new working copy format in 1.15, this seems to
be an appropriate moment of time to decide whether we'd also want to switch
to a checksum type without known collisions in that new format.

Below are the arguments for including a switch to a different checksum type
in the working copy format for 1.15:

1) Since the "is the file modified?" check now compares checksums, leaving
   everything as-is may be considered a regression, because it would
   introduce additional cases where a working copy currently relies on
   comparing checksums with known collisions.

2) We already need a working copy format bump for the pristines-on-demand
   feature.  So using that format bump to solve the SHA1 issue might reduce
   the overall number of required bumps for users (assuming that we'll still
   need to switch from SHA1 at some point later).

3) While the pristines-on-demand feature is not released, upgrading with a
   switch to the new checksum type seems to be possible without requiring a
   network fetch.  But if some of the pristines are optional, we lose the
   possibility to rehash all contents in place.  So we might find ourselves
   having to choose between two worse alternatives of either requiring a
   network fetch during upgrade or entirely prohibiting an upgrade of
   working copies with optional pristines.

Thoughts?


Thanks,
Evgeny Kotkov