Nathan Hartman wrote on Fri, 20 Jan 2023 14:51 +00:00:
> 1. Pros/cons of switching from SHA1 to another hash.
⋮
> Do we need to switch from SHA1 to another hash? One con that was
> already mentioned [1] is that we'll never really be able to switch
> away from SHA1, as there are existing clients, servers, and working
> copies out there. Not only will we have to support SHA1 forever for
> backwards compatibility,
Actually, I think it's MD5, not SHA-1, that we have to support
indefinitely, since our uses of SHA-1 fall into two categories:
- Accompanied by MD5. (wc.db PRISTINE table, FSFS node-rev headers,
dumpfiles' Text-content-* headers)
- An optional optimization. (ra_serf, rep-cache.db)
> but any new hash that is ever added will need
> to be supported forever as well. If we accumulate many of those, it
> might become a burden,
Good point. Then perhaps we should continue to record two checksums, as
both wc.db and FSFS do? If we record, say, both «(svn_checksum_kind_t)42»
checksums and «(svn_checksum_kind_t)value_of_the_month» checksums, then
we'll only need to be able to upgrade from the former.
> but perhaps there will be only one new hash and
> it will be the "blessed" one for the next 20 years.
Cheers,
Daniel
P.S. wc-metadata.sql implies that having MD5 collisions in a wc is supported:
1 /* wc-metadata.sql -- schema used in the wc-metadata SQLite database
2 * This is intended for use with SQLite 3
⋮
94 CREATE TABLE PRISTINE (
95 /* The SHA-1 checksum of the pristine text. This is a unique key. The
96 SHA-1 checksum of a pristine text is assumed to be unique among all
97 pristine texts referenced from this database. */
98 checksum TEXT NOT NULL PRIMARY KEY,
99
⋮
114 /* Alternative MD5 checksum used for communicating with older
115 repositories. Not strictly guaranteed to be unique among table
rows. */
116 md5_checksum TEXT NOT NULL
117 );
118
119 CREATE INDEX I_PRISTINE_MD5 ON PRISTINE (md5_checksum);