On Fri, Feb 12, 2021 at 07:23:12AM +0000, frame via Digitalmars-d-learn wrote: > On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote: > > > This turns the OP's O(n log n) algorithm into an O(n) algorithm, > > doesn't need to copy the entire content of the file into memory, and > > also uses much less memory by storing only hashes. > > But this kind of hash is maybe insufficient to avoid hash collisions. > For such big data slower but stronger algorithms like SHA are > advisable.
I used toHash merely as an example. Obviously, you should use a hash that works well with the input data you're trying to process (i.e., minimal chances of collision, not too slow to compute, etc.). SHA hashes are probably a safe bet, as chances of collision are negligible. > Also associative arrays uses the same weak algorithm where you can run > into collision issues. Thus using the hash from string data as key can > be a problem. I always use a quick hash as key but hold actually a > collection of hashes in them and do a lookup to be on the safe side. [...] You can use a struct wrapper that implements its own toHash method. T -- Mediocrity has been pushed to extremes.