On 30. 12. 25 17:01, Evgeny Kotkov via dev wrote:
Johan Corveleyn<[email protected]> writes:
I think what is really missing here is a clear analysis of the
*problem*. What problem are we trying to solve?
I would say the problem statement is there, starting from the very first
email in the original thread [1] and the BRANCH-README [2]:
- Working copy relies on an assumption that files with identical checksum
values have identical content.
- This assumption is not true if there is a checksum collision.
In such case the working copy behaves as if the files had identical content,
whereas in fact they do not. This can result in unintended and buggy
behavior, possibly accompanied by security issues.
- We currently use SHA1 checksum with known collisions, so there are known
cases when this assumption does not hold.
- When the SHA1 collision situation worsens, things like a chosen-prefix
attack could allow finding more meaningful collisions such as working
executables/scripts that would have bigger exploitation potential.
- There's an example use-case [3] with data forgery (content change)
during checkout if the repository contains files with different content
and colliding checksums.
- The pristines-on-demands feature regresses this further by starting to
rely on checksum comparison for "svn st".
I explored the feasibility of the following technical changes and was
proposing that we implement them to improve the whole situation:
- Make the technical groundwork so that the working copy would not be limited
to just using the SHA1 as the checksum kind:
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind
- Build up on that, and start using dynamically-salted SHA1 checksums:
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
```
With the dynamic salt:
- Publicly known SHA-1 collisions no longer result in collisions when
checksummed by the working copy. This is because the actually
checksummed content now includes the random prefix salt.
- Constructing a chosen-prefix SHA-1 collision no longer results in a
collision when checksummed by the working copy. This is because the
constructed collision cannot account for the random prefix salt, because
it's unknown in advance.
```
[Not having looked at the code, just some general observations.)]
I don't see anything in particular missing from the on-list discussion
standpoint. The old mail threads are there for anyone to see, and the
BRANCH-READMEs and diffs are also there. The problem statement is clear
and well discussed and there is (was) consensus that the problem is real.
*There was never any requirement or expectation or dogs forfend
tradition that, e.g., Evgeny must go cap in hand to the dev@ list and
beg for his actual implementation design to be rubber-stamped in advance.*
If anyone has technical concerns, by all means review the branches and
share your observations here. Procedural matters are not subject to
vetos and furthermore they have been amply catered to.
I won't go into technical discussion here, other than an observation:
using a salted hash seems like an eminently good idea, /regardless/ [1]
of whether we stay with SHA-1 for now or not.
-- Brane
[1] Don't get me started about "irregardless" illiterate newspeak...
P.S.: I'm not sure why there are two branches, since they appear to
implement essentially the same functionality. Anyway, that's not important.
P.P.S: What about fsfs? It uses SHA-1 content indexing for
representation sharing. But that's off by default.