Ok, to wrap this up for now: r1102471 finally put these thoughts into notes/diff-optimizations.txt, with some of Stefan2's feedback/ideas integrated into it.
I also added another, previously mentioned idea into the notes file, which I forgot to mention in this mailthread: --- 8< --- Avoid some hashing by exploiting the fact that matching lines often come in series. - If the previous line had a match with the other file, first try to directly compare (memcmp) the next line with the successor of the matched line. Only if it doesn't match, calculate the hash to insert it into the container. - This approach probably conflicts with the "Merge hash calculation with EOL scanning" suggestion. --- 8< --- (not sure if this is a worthwhile idea, but just thought I'd mention it). Cheers, -- Johan