On Fri, Jan 20, 2023 at 9:51 AM Nathan Hartman <hartman.nat...@gmail.com> wrote:
>
> On Fri, Jan 20, 2023 at 7:18 AM Daniel Shahaf <d...@daniel.shahaf.name> wrote:
> >
> > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > > I can complete the work on this branch and bring it to a production-ready
> > > state, assuming there are no objections.
> >
> > Your assumption is counterfactual:
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> >
> > Objections have been raised, been left unanswered, and now
> > implementation work has commenced following the original design.  That's
> > not acceptable.  I'm vetoing the change until a non-rubber-stamp design
> > discussion has been completed on the public dev@ list.
>
>
> I think we can start by discussing some of the pros and cons.
>
> There are two separate things here but they end up being mixed
> together in the discussions:
>
> 1. Pros/cons of switching from SHA1 to another hash.
> 2. Supporting different hash types in f32.
>
> Regarding the first item:
>
> Do we need to switch from SHA1 to another hash? One con that was
> already mentioned [1] is that we'll never really be able to switch
> away from SHA1, as there are existing clients, servers, and working
> copies out there. Not only will we have to support SHA1 forever for
> backwards compatibility, but any new hash that is ever added will need
> to be supported forever as well. If we accumulate many of those, it
> might become a burden, but perhaps there will be only one new hash and
> it will be the "blessed" one for the next 20 years.
>
> There were concerns about collisions; since the space of possible
> input datasets is infinite and the hash code size is fixed and finite
> (pretty large, but very much finite), there will always be collisions
> with any hash. The significant questions are: how small is the
> probability of a collision, and (for the purposes of security) how
> hard is it to generate input data that produces a collision? The
> answer to the first question is fixed; the second one is probably
> expected to change over time, as algorithms are studied and new
> vulnerabilities are found. Which hash type do you pick, and who knows
> if a hash thought to be very strong (today) later proves easier to
> crack than one that is thought not as strong? We can only guess.
>
> Taking a step back, this discussion started because pristine-free WCs
> are IIUC more dependent on comparing hashes than pristineful WCs, and
> therefore a hash collision could have more impact in a pristine-free
> WC. "Guarantees" were mentioned, but I think it's important to state
> that there's only a guarantee of probability, since as mentioned above
> all hashes will have collisions.
>
> We already can't store files with identical SHA1 hashes, but AFAIK the
> only meaningful impact we've ever heard is that security researchers
> cannot track files they generate with deliberate collisions. The same
> would be true with any hash type, for collisions within that hash
> type.
>
> Advantages of switching to a new hash type might include: reducing the
> already small probability of collisions; choosing an algorithm that is
> faster or that has (or is expected to have in the future) hardware
> acceleration on commodity systems, perhaps addressing user perception
> (if SHA1 is seen as old and uncool), but then again, we can't really
> get rid of SHA1...
>
> [1] https://lists.apache.org/thread/v3dv1dtod2t9yrf920h4838g2t0l94cw
>
> Regarding the second item:
>
> Since the premise of this feature is to support adding new hash types
> without bumping wc formats, it follows that any new hash type will
> create compatibility problems for clients that support f32 but not the
> specific new hash type. In light of that, it might just be better to
> bump the wc format and then you know at the outset that you need to
> upgrade your client. Just thinking out loud here but this might be
> (partly) mitigated by trying to guess which hash types we might want
> in the future and supporting them now, even if no existing client will
> actually use them, but I don't really like this idea.
>
> I'll have to return later with more thoughts...

Just quickly I want to say that although I mentioned mostly cons
above, I don't want to appear to be against switching hashes nor
against supporting multiple hash types in f32; rather, since the
i525-pod feature necessitated a format bump anyway, I do think it
makes sense to consider adding such changes now, to avoid a future
format bump, and I'm considering arguments contrary to that from a
desire to be unbiased about it.

I have more thoughts (including more pros) but have some things to
attend to now.

Looking forward to hearing others' thoughts as well.

Cheers,
Nathan

Reply via email to