Evgeny Kotkov via dev wrote on Mon, Jan 23, 2023 at 02:28:50 +0300: > Daniel Shahaf <d...@daniel.shahaf.name> writes: > > > > I can complete the work on this branch and bring it to a production-ready > > > state, assuming there are no objections. > > > > Your assumption is counterfactual: > > > > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E > > > > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E > > I don't see any explicit objections in these two emails (here I assume that > if something is not clear to a PMC member, it doesn't automatically become > an objection). If the "why?" question is indeed an objection, then I would > say it has already been discussed and responded to in the thread. >
The "Why?" was sent _after_ the post you're quoting, and in any case was just an elevator pitch summary of something I had explained more verbosely. The first post in this thread asserts X is a problem and Y is a solution to it, and argues that Y is a good thing. However, that post does not explain /why/ X is a problem, does not consider alternatives to Y, and does not consider possible cons of Y. That's what's missing. > Now, returning to the problem: > > As described in the advisory [1], we have a supported configuration that > makes data forgery possible: > > - A repository with disabled rep-sharing allows storing different files with > colliding SHA-1 values. > - Having a repository with disabled rep-sharing is a supported configuration. > There may be a certain number of such repositories in the wild > (for example, created with SVN < 1.6 and not upgraded afterwise). > - A working copy uses an assumption that the pristine contents are equal if > their SHA-1 hashes are equal. > - So committing different files with colliding SHA-1 values makes it possible > to forge the contents of a file that will be checked-out and used by the > client. > > I would say that this state is worrying just by itself. > I assume this situation could happen accidentally, say, if someone adds shattered-1.pdf and shattered-2.pdf to the same wc in a particular way. That is, I'm not assuming "forgery" (which implies Mallory is involved). Still, this is a potential data integrity issue with the new-in-1.15 wc format, so we should address it before the release. What are our options to address that? Switching to another checksum is an option, yes, but we [as in, dev@] don't seem to have considered any alternatives to that. Just off the top of my head, we could: - Encourage or require use of rep-sharing [the advisory already recommends this] - Encourage or require use of tools/hook-scripts/reject-detected-sha1-collisions.sh [the advisory already recommends this] - Have f32 wc's refuse to talk to servers that don't detect SHA-1 collisions. (1.15 users will still be able to interoperate with old servers by using f31.) And there may be more options. (Lurkers are invited to speak up!) > However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's > probably only a matter of time until the situation becomes worse. > Quoting the third hunk of <https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C20221220201300.GH32332%40tarpaulin.shahaf.local2%3E>: What's the acceptance test we use for candidate checksum algorithms? You say we should switch to a checksum algorithm that doesn't have known collisions, but, why should we require that? Consider the following 160-bit checksum algorithm: . 1. If the input consists of 40 ASCII lowercase hex digits and nothing else, return the input. 2. Else, return the SHA-1 of the input. This algorithm has a trivial first preimage attack. If a wc used this identity-then-sha1 algorithm instead of SHA-1, then… what? > That could happen after a public disclosure of a pair of executable > files/scripts where the forged version allows for remote code execution. > Or maybe something similar with a file format that is often stored in > repositories and that can be executed or used by a build script, etc. > Err, hang on. Your reference described a chosen-prefix attack, while this scenario concerns a single public collision. These are two different things. Disclosure of of a pair of executable files/scripts isn't by itself a problem unless one of the pair ("file A") is in a repository somewhere. Now, was the colliding file ("file B") generated _before_ or _after_ file A was committed? - If _before_, then it would seem Mallory had somehow managed to: 1. get a file of his choosing committed to Alice's repository; and 2. get a wc of Alice's repository into one of the codepaths that assume SHA-1 is one-to-one / collission-free (currently that's the ra_serf optimization and the 1.15 wc status). Now, step #1 seems plausible enough. As to step #2, it's not clear to me how file B would reach the wc in step #2… but insofar as security assumptions go, it seems reasonable to assume Mallory can make this happen. So, I agree it's a scenario we should address. What options do we have to address it? (I grant that migrating away from SHA-1 is one option.) - If _after_, then you're presuming not simply a collision attack but a second preimage attack. Should we assume Mallory to be able to mount a second preimage attack? Chosen-prefix collision attacks can help Mallory in a variant of the "before" case: Mallory computes a collision, sends file A to Alice (who commits it), and invokes his assumed ability to inject file B into Alice's wc. This would work for file formats that ignore the unchosen suffix. > [1] https://subversion.apache.org/security/sha1-advisory.txt > [2] https://sha-mbles.github.io/ > > > Speaking of the proposed switch to SHA-256 or a different checksum, there's > an argument by contradiction: if we were designing the pristineless working > copy from scratch today, would we choose SHA-1 as the best available hash > that can be used to assert content equality? If we were designing f32 from the ground up, I hope we'd first nail down our requirements and then check what are the possible ways to address them. We might specify that "The probability of birthday collisions in <USE-CASE> must not exceed <PERCENTAGE>.". [E.g., my parents named each of their kids for the CRC32 of the timestamp,longitude,latitude,altitude of that kid's birth, and that was fine: they had no collisions. In contrast, sea turtles shouldn't use CRC32 to name their kids, since they'd have a ≈5% chance of a collision due to their larger number of offspring. However, sea turtles would have no collisions if they used MD5.] We might specify that "The hash of an <N>-byte file can be computed within <T> milliseconds on <SUCH AND SUCH> hardware." [E.g., the existence of https://en.wikipedia.org/wiki/Intel_SHA_extensions is a consideration.] We might specify that "An attacker who is capable of <SUCH AND SUCH> will not be able to cause a false positive or a false negative in the wc status optimization.". [E.g., see above about second preimage attacks.] And then we'd brainstorm possible solutions (plural) and run each of them through the specifications, which would be our acceptance test checklist. (And since we aren't designing from scratch, our actual acceptance test would also include implementation and maintenance costs for us and upgrade costs for our users.) > If yes, how can one prove that? Well, for starters, rep-sharing was released in in 2009, the first public collision (shattered) was published in 2017, a chosen-prefix attack (shambled) in 2020, and we haven't had any complaints since then <fine print>other than from people literally trying to store shattered-1.pdf and shattered-2.pdf in their repos</fine print>? And so long as we're doing thought experiments, here's another: If we switched to using only MD5 internally, would anyone notice? (Cf. above about identity-then-sha1, which is even weaker than MD5.) > > Objections have been raised, been left unanswered, and now implementation > > work has commenced following the original design. That's not acceptable. > > I'm vetoing the change until a non-rubber-stamp design discussion has > > been completed on the public dev@ list. > > I would like to note that vetoing a code modification should be accompanied > with a technical justification, and I have certain doubts that the above > arguments qualify as such: > > https://www.apache.org/foundation/voting.html > [[[ > To prevent vetoes from being used capriciously, the voter must provide > with the veto a technical justification showing why the change is bad > (opens a security exposure, negatively affects performance, etc. ). > A veto without a justification is invalid and has no weight. > ]]] > > (I'm not saying that the above rules have to be used in this particular case > and that a veto is invalid, but still thought it’s worth mentioning.) > I vetoed the change because it hadn't been designed on the dev@ list, had not garnered dev@'s consensus, and was being railroaded through. (as far as I could tell) > Anyway, I'll stop working on the branch, because a veto has been casted. That's your decision. Implementing one design on a branch while other options are being considered by dev@ /is/ possible, but there are some risks with that; cf. my remarks in <https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230121092231.GA3174%40tarpaulin.shahaf.local2%3E>. And once again, for clarity: I'm not vetoing migrating away from SHA-1. (In fact, my intuition was that it'd be a good idea.) Daniel > > Regards, > Evgeny Kotkov