[PATCH] Documentation/diff-options: explain different diff algorithms

Stefan Beller Mon, 23 Jul 2018 17:36:28 -0700

As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.


Signed-off-by: Stefan Beller <sbel...@google.com>
---
 Documentation/diff-options.txt | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972f..0d765482027 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,16 +94,34 @@ diff" algorithm internally.
        Choose a diff algorithm. The variants are as follows:
 +
 --
-`default`, `myers`;;
-       The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-       Spend extra time to make sure the smallest possible diff is
-       produced.
+       A diff as produced by the basic greedy algorithm described in
+       link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm 
and its Variations]
+`default`, `myers`;;
+       The same algorithm as `minimal`, extended with a heuristic to
+       reduce extensive searches. Currently, this is the default.
 `patience`;;
-       Use "patience diff" algorithm when generating patches.
+       Use "patience diff" algorithm when generating patches. This
+       matches the longest common subsequence of unique lines on
+       both sides, recursively. It obtained its name by the way the
+       longest subsequence is found, as that is a byproduct of the
+       patience sorting algorithm. If there are no unique lines left
+       it falls back to `myers`. Empirically this algorithm produces
+       a more readable output for code, but it does not garantuee
+       the shortest output.
 `histogram`;;
-       This algorithm extends the patience algorithm to "support
-       low-occurrence common elements".
+       This algorithm re-implements the `patience` algorithm with
+       "support of low-occurrence common elements" and only picks
+       one element of the LCS for the recursion. It is often the
+       fastest, but in cornercases (when there are many longest
+       common subsequences of the same length) it produces bad
+       results as seen in:
+
+               seq 1 100 >one
+               echo 99 > two
+               seq 1 2 98 >>two
+               git diff --no-index --histogram one two
+
 --
 +
 For instance, if you configured diff.algorithm variable to a
-- 
2.18.0.345.g5c9ce644c3-goog

[PATCH] Documentation/diff-options: explain different diff algorithms

Reply via email to