Paul Eggert wrote:
> > --speed-large-files is essential for users who need to reduce
> > the running time from O(N²) to O(N).
> 
> The running time is O(ND) where D is the number of differences.
> One can come up with examples exhibiting O(N**2) behavior, but
> they're rare enough in practice that it doesn't seem to be a problem.

I encountered such cases a couple of years ago. IIRC, the files were
two mbox files that had diverged through different modifications on
different machines.

> As far as I know, --speed-large-files has never been implemented in
> GNU diff (as -h was in the original Unix diff); it has always been
> a no-op.

When you pass --speed-large-files, src/diff.c sets speed_large_files
to true. Then, src/analyze.c sets ctxt.heuristic to true. Then,
gnulib/lib/diffseq.h function diag() applies your heuristic.

It cannot be a no-op, as it has an effect on the run-time: In a contrived
example (two mbox files, each ca. 18 MB large),
  $ time diff -u mbox1 mbox2 > /dev/null
displays a run time of 3.81 seconds, whereas
  $ time diff -u --speed-large-files mbox1 mbox2 > /dev/null
displays a run time of 3.98 seconds.

Bruno


Reply via email to