On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:

This turns the OP's O(n log n) algorithm into an O(n) algorithm, doesn't need to copy the entire content of the file into memory, and also uses
much less memory by storing only hashes.

But this kind of hash is maybe insufficient to avoid hash collisions. For such big data slower but stronger algorithms like SHA are advisable.

Also associative arrays uses the same weak algorithm where you can run into collision issues. Thus using the hash from string data as key can be a problem. I always use a quick hash as key but hold actually a collection of hashes in them and do a lookup to be on the safe side.


Reply via email to