Paul Eggert wrote: > > --speed-large-files is essential for users who need to reduce > > the running time from O(N²) to O(N). > > The running time is O(ND) where D is the number of differences. > One can come up with examples exhibiting O(N**2) behavior, but > they're rare enough in practice that it doesn't seem to be a problem.
I encountered such cases a couple of years ago. IIRC, the files were two mbox files that had diverged through different modifications on different machines. > As far as I know, --speed-large-files has never been implemented in > GNU diff (as -h was in the original Unix diff); it has always been > a no-op. When you pass --speed-large-files, src/diff.c sets speed_large_files to true. Then, src/analyze.c sets ctxt.heuristic to true. Then, gnulib/lib/diffseq.h function diag() applies your heuristic. It cannot be a no-op, as it has an effect on the run-time: In a contrived example (two mbox files, each ca. 18 MB large), $ time diff -u mbox1 mbox2 > /dev/null displays a run time of 3.81 seconds, whereas $ time diff -u --speed-large-files mbox1 mbox2 > /dev/null displays a run time of 3.98 seconds. Bruno
