On 30. 12. 25 17:01, Evgeny Kotkov via dev wrote:
Johan Corveleyn<[email protected]> writes:

I think what is really missing here is a clear analysis of the
*problem*. What problem are we trying to solve?
I would say the problem statement is there, starting from the very first
email in the original thread [1] and the BRANCH-README [2]:

- Working copy relies on an assumption that files with identical checksum
   values have identical content.

- This assumption is not true if there is a checksum collision.

   In such case the working copy behaves as if the files had identical content,
   whereas in fact they do not.  This can result in unintended and buggy
   behavior, possibly accompanied by security issues.

- We currently use SHA1 checksum with known collisions, so there are known
   cases when this assumption does not hold.

- When the SHA1 collision situation worsens, things like a chosen-prefix
   attack could allow finding more meaningful collisions such as working
   executables/scripts that would have bigger exploitation potential.

- There's an example use-case [3] with data forgery (content change)
   during checkout if the repository contains files with different content
   and colliding checksums.

- The pristines-on-demands feature regresses this further by starting to
   rely on checksum comparison for "svn st".

I explored the feasibility of the following technical changes and was
proposing that we implement them to improve the whole situation:

- Make the technical groundwork so that the working copy would not be limited
   to just using the SHA1 as the checksum kind:
   https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind

- Build up on that, and start using dynamically-salted SHA1 checksums:
   https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

   ```
   With the dynamic salt:

   - Publicly known SHA-1 collisions no longer result in collisions when
     checksummed by the working copy.  This is because the actually
     checksummed content now includes the random prefix salt.

   - Constructing a chosen-prefix SHA-1 collision no longer results in a
     collision when checksummed by the working copy.  This is because the
     constructed collision cannot account for the random prefix salt, because
     it's unknown in advance.
   ```


[Not having looked at the code, just some general observations.)]

I don't see anything in particular missing from the on-list discussion standpoint. The old mail threads are there for anyone to see, and the BRANCH-READMEs and diffs are also there. The problem statement is clear and well discussed and there is (was) consensus that the problem is real.

*There was never any requirement or expectation or dogs forfend tradition that, e.g., Evgeny must go cap in hand to the dev@ list and beg for his actual implementation design to be rubber-stamped in advance.*

If anyone has technical concerns, by all means review the branches and share your observations here. Procedural matters are not subject to vetos and furthermore they have been amply catered to.

I won't go into technical discussion here, other than an observation: using a salted hash seems like an eminently good idea, /regardless/ [1] of whether we stay with SHA-1 for now or not.


-- Brane

[1] Don't get me started about "irregardless" illiterate newspeak...

P.S.: I'm not sure why there are two branches, since they appear to implement essentially the same functionality. Anyway, that's not important.

P.P.S: What about fsfs? It uses SHA-1 content indexing for representation sharing. But that's off by default.

Reply via email to