I think venti could deal with it: Rwrite returns a score, Tread provides a score, and the caller typically uses it as an opaque value. If not, whether a different sha1 is returned or a new algorithm is used, the caller could still not rely on sha1(block)=score.
In any case, fossil needs a fix to cope with venti returning "score collision", to prevent it failing to archive once it hits a shattered file, or rather the first venti-sized block of them. On Mon, 27 Feb 2017, 21:37 Riddler, <riddler...@gmail.com> wrote: > I think much in the same vein as git, venti doesn't need to worry too > much about collisions given the behavior when collisions occur is > well-defined and sensible in both systems. > It's second-preimage's that are more of a concern (and still not > possible with SHA1). The lack of preimage attacks on SHA1 prevents > people from maliciously creating a file with the same hash as one you > created. They can only duplicate ones they created which should limit > the scope of any maliciousness to stuff they have control over. > At the point preimages are practical, I'd want to be long gone from > SHA1 but IIRC even MD5 still has no practical second-preimage attacks > so we're probably a long way off from there. > > Technically, anything relying on venti should handle the collision > detected response gracefully, as it's always a possibility no matter > the algorithm. > If fossil doesn't handle it very well perhaps it's not venti that > needs changed (given it detects & reports) but fossil. > A top-of-the-head suggestion would be for fossil to respond to the > collision notice by doing something to the block that can be undone > later (as others above have hinted at) such as appending something, > XOR, etc., marking it as such in its own data structures then passing > it back to venti. It could then reverse the operation when retrieving > the files with the 'collision fixed' flag set. > I don't know how feasible that idea is (been a while since I looked at > fossil) but worth looking into maybe? It would seem, at a cursory > glance, fix the problem for fossil+venti indefinitely at the cost of a > minor computational overhead for retrieving collided files. > > As Charles pointed out, you could also just do that in venti, I guess > it depends if the write API call contract in venti is "returns SHA1 of > file" or "returns arbitrary file id". > If the behavior was put into venti you couldn't assume the ID returned > = sha1(block) anymore - but I don't know if anything relies on that > behavior. > As for venti, I wouldn't say 'no point' to an algorithm update, but > I'd rather have fossil updated to manage to deal with collisions > better first. > > > On Mon, Feb 27, 2017 at 8:14 PM, Bakul Shah <ba...@bitblocks.com> wrote: > > On Mon, 27 Feb 2017 19:02:29 GMT Charles Forsyth < > charles.fors...@gmail.com> wrote: > >> On 27 February 2017 at 18:30, Charles Forsyth < > charles.fors...@gmail.com> > >> wrote: > >> > >> > that's a separate argument that venti would never work for you, > regardless > >> > of the hash algorithm used. > > > >> since venti returns the resulting score from each write, and it knows > >> whether there's been a collision, > >> it appears it could return a modified score (having ensured that is now > >> unique, "and the next judge said that's a very shaggy dog") > > > > Consider what can happens you want to consolidate two venti > > archives into another one. Each source venti has a different > > file with the same hash. When you discover in the destination > > venti that they collide, it is too late to return a modified > > score -- you have to find and fix all pointer blocks that > > refer to this block as well. > > > > In theory the chance of a random collion with SHA1 may be > > 1 in 2^80 but we have existing files that collide (unlike the > > hypothetical argument of someone wanting to store 10^21 byte > > size files -- but if they can produce it, we can store it!). > > Your argument is that since venti is readonly, existing data > > in it is not vulnerable but not everyone stores their archives > > on readonly medium. Another argument would be that almost > > always venti is privately used and unlikely to be accessible > > to the badguys. Yet another argument is that hardly anyone > > uses venti so why even bother. These are behavior patterns > > that are true today but why limit its usefulness? > > > > Just as we move archived data we care about to more modern > > media (as we no longer have easy access to floppies, 9track > > tapes, 1.4" streamer tape etc.), and update our crypto keys, > > since they too have limited shelf-life, we can replace the use > > of SHA1. This is a fixable problem. [It is much much worse > > for git given the amount of s/w that relies on it. I think > > it is a matter of time before someone comes up with a > > collision between two different types of git objects (such as > > a blob and a tree) but we'll let Linus worry about it :-)] > > > > The solution is to convert from sha1 to blake2b or something > > strong and be prepared to move the data again in 10-20 years. > > > >