On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:
This turns the OP's O(n log n) algorithm into an O(n)
algorithm, doesn't
need to copy the entire content of the file into memory, and
also uses
much less memory by storing only hashes.
But this kind of hash is maybe insufficient to avoid hash
collisions. For such big data slower but stronger algorithms like
SHA are advisable.
Also associative arrays uses the same weak algorithm where you
can run into collision issues. Thus using the hash from string
data as key can be a problem. I always use a quick hash as key
but hold actually a collection of hashes in them and do a lookup
to be on the safe side.