Dermot wrote: > 2010/1/13 Roger Burton West <[email protected]>: > > You may want to be slightly cleverer about it - taking a SHAsum is > > computationally expensive, and it's only worth doing if the files > > have the same size. > > Unfortunately the size varies quite a bit.
You might've missed his point. If two files are of different sizes, they cannot be identical. Getting the size of a file is substantially cheaper than hashing it. So you check all your filesizes, and need only hash those pairs or groups that are all the same size. -- Avi Greenbury
