Nathan Hartman wrote on Fri, 20 Jan 2023 14:51 +00:00: > 1. Pros/cons of switching from SHA1 to another hash. ⋮ > Do we need to switch from SHA1 to another hash? One con that was > already mentioned [1] is that we'll never really be able to switch > away from SHA1, as there are existing clients, servers, and working > copies out there. Not only will we have to support SHA1 forever for > backwards compatibility,
Actually, I think it's MD5, not SHA-1, that we have to support indefinitely, since our uses of SHA-1 fall into two categories: - Accompanied by MD5. (wc.db PRISTINE table, FSFS node-rev headers, dumpfiles' Text-content-* headers) - An optional optimization. (ra_serf, rep-cache.db) > but any new hash that is ever added will need > to be supported forever as well. If we accumulate many of those, it > might become a burden, Good point. Then perhaps we should continue to record two checksums, as both wc.db and FSFS do? If we record, say, both «(svn_checksum_kind_t)42» checksums and «(svn_checksum_kind_t)value_of_the_month» checksums, then we'll only need to be able to upgrade from the former. > but perhaps there will be only one new hash and > it will be the "blessed" one for the next 20 years. Cheers, Daniel P.S. wc-metadata.sql implies that having MD5 collisions in a wc is supported: 1 /* wc-metadata.sql -- schema used in the wc-metadata SQLite database 2 * This is intended for use with SQLite 3 ⋮ 94 CREATE TABLE PRISTINE ( 95 /* The SHA-1 checksum of the pristine text. This is a unique key. The 96 SHA-1 checksum of a pristine text is assumed to be unique among all 97 pristine texts referenced from this database. */ 98 checksum TEXT NOT NULL PRIMARY KEY, 99 ⋮ 114 /* Alternative MD5 checksum used for communicating with older 115 repositories. Not strictly guaranteed to be unique among table rows. */ 116 md5_checksum TEXT NOT NULL 117 ); 118 119 CREATE INDEX I_PRISTINE_MD5 ON PRISTINE (md5_checksum);