Ok, to wrap this up for now: r1102471 finally put these thoughts into
notes/diff-optimizations.txt, with some of Stefan2's feedback/ideas
integrated into it.

I also added another, previously mentioned idea into the notes file,
which I forgot to mention in this mailthread:

--- 8< ---
Avoid some hashing by exploiting the fact that matching lines often come
   in series.

  - If the previous line had a match with the other file, first try to
    directly compare (memcmp) the next line with the successor of the
    matched line. Only if it doesn't match, calculate the hash to insert
    it into the container.
  - This approach probably conflicts with the "Merge hash calculation with
    EOL scanning" suggestion.
--- 8< ---

(not sure if this is a worthwhile idea, but just thought I'd mention it).

Cheers,
-- 
Johan

Reply via email to