On 20 Dec 2022, Evgeny Kotkov via dev wrote:
[Moving discussion to a new thread]

We currently have a problem that a working copy relies on the checksum type with known collisions (SHA1). A solution to that problem is to switch to a different checksum type without known collisions in one of the newer working
copy formats.

Since we plan on shipping a new working copy format in 1.15, this seems to be an appropriate moment of time to decide whether we'd also want to switch
to a checksum type without known collisions in that new format.

Below are the arguments for including a switch to a different checksum type
in the working copy format for 1.15:

1) Since the "is the file modified?" check now compares checksums, leaving everything as-is may be considered a regression, because it would introduce additional cases where a working copy currently relies on
  comparing checksums with known collisions.

2) We already need a working copy format bump for the pristines-on-demand feature. So using that format bump to solve the SHA1 issue might reduce the overall number of required bumps for users (assuming that we'll still
  need to switch from SHA1 at some point later).

3) While the pristines-on-demand feature is not released, upgrading with a switch to the new checksum type seems to be possible without requiring a network fetch. But if some of the pristines are optional, we lose the possibility to rehash all contents in place. So we might find ourselves having to choose between two worse alternatives of either requiring a network fetch during upgrade or entirely prohibiting an upgrade of
  working copies with optional pristines.

Thoughts?

A few thoughts:

First, Daniel Shahaf raises the question of whether there is really a problem here. I.e., Why do we care about possible collisions when they're unlikely to happen in practice unless deliberately caused?

My answer is: we should care because it's very difficult to imagine all the consequences -- including but not limited to clever deliberate attacks -- that might follow from losing a property we formerly had. The hash semantics we have always assumed are "If the file is modified, the hash will change." When those semantics change, we don't need to be able to think immediately of a specific problematic scenario to know that this is a significant development. We've lost the guarantee; that's enough to be worth worrying about.

BUT, if you want a scenario, here's one:

I have put WordPress installations under Subversion version control before. Once, I detected an attack on one of those WordPress servers when one of the things the attacker did was modify some of the WordPress scripts on the server. Those files showed up as modified when I ran 'svn st', and from there I ran 'svn diff' and figured out what had happened. But a super-careful attacker could make modifications that leave the version-controlled files with the same SHA1 hash they had before, thus making it harder to detect the attack.

Yes, I realize there are other ways to detect modifications, and that random attackers are unlikely to take the trouble to preserve hashes. On the other hand, a well-resourced spear-fishing attacker who knows something about the usage of SVN at their target might indeed try a hash-preserving approach to breaking in. The point is, if we're counting on the hashes having certain semantics, then our users are counting on it too. If SHA1 no longer has those semantics, we should upgrade.

Second, +1 to what Branko said: we should upgrade to a new hash when we upgrade a working copy anyway, but new clients should still be able to handle the old hash in old working copies without upgrading them.

Now, how hard would this be to actually implement? The pristineless-format WC upgrade is an opportunity to make other format changes, but I'd hate to block the release of pristineless working copies on this...

Best regards,
-Karl

Reply via email to