Hi Paul. I've seen this issue again now using diff 3.0 from Debian sid (see the attached log file output.
Any ideas? Cheers, Chris. On Fri, 2010-08-13 at 03:06 +0200, Paul Eggert wrote: > On 08/07/10 18:50, Christoph Anton Mitterer wrote: > > > I found a rather strange bug in diff (Debian version 1:3.0-1) when doing > > large backups of millions of files. > > First I've copied the data to an external HDD, then I've did an "diff -q > > -r >out 2> err" in order to find any errors. > > A wise precaution! I do that sort of thing a lot. > > > The strange unicode characters had their character codes inside which read > > as follows: > > 1CEE 1CAE > > 1CAE 1CEE > > ... > > Any ideas? > > My idea is that you've found and reported a bug that has been in GNU > diff ever since 2.8 was released back in 2002. Thanks very much for > reporting this: you've undoubtedly saved others some work and > hair-pulling. > > I've installed the following patch, which I hope fixes things. If you > can test it on your data, that'd be nice. I have verified this bug on > a much simpler test case, and this patch includes the test case to try > to make sure that a similar bug doesn't creep back in later. > > > > >From 29ac7f191c444400c6df067647e3158a1e188ddf Mon Sep 17 00:00:00 2001 > From: Paul Eggert <[email protected]> > Date: Thu, 12 Aug 2010 17:55:05 -0700 > Subject: [PATCH] diff: avoid spurious diffs when two distinct dir entries > compare equal > > Problem reported by Christoph Anton Mitterer in: > http://lists.gnu.org/archive/html/bug-diffutils/2010-08/msg00000.html > > * NEWS: Mention this bug fix. > * src/dir.c (compare_names_for_qsort): Fall back on file_name_cmp > if two distinct entries in the same directory compare equal. > (diff_dirs): Prefer a file_name_cmp match when available. > * tests/Makefile.am (TESTS): New test colliding-file-names. > * tests/colliding-file-names: New file. > --- > NEWS | 5 +++++ > src/dir.c | 41 +++++++++++++++++++++++++++++++++++++++-- > tests/Makefile.am | 1 + > tests/colliding-file-names | 20 ++++++++++++++++++++ > 4 files changed, 65 insertions(+), 2 deletions(-) > create mode 100644 tests/colliding-file-names > > diff --git a/NEWS b/NEWS > index 27b9886..1c41e9e 100644 > --- a/NEWS > +++ b/NEWS > @@ -2,6 +2,11 @@ GNU diffutils NEWS -*- > outline -*- > > * Noteworthy changes in release ?.? (????-??-??) [?] > > +** Bug fixes > + > + diff no longer reports spurious differences merely because two entries > + in the same directory have names that compare equal in the current > + locale, or compare equal because --ignore-file-name-case was given. > > * Noteworthy changes in release 3.0 (2010-05-03) [stable] > > diff --git a/src/dir.c b/src/dir.c > index 5b4eaec..1b078ab 100644 > --- a/src/dir.c > +++ b/src/dir.c > @@ -166,14 +166,16 @@ compare_names (char const *name1, char const *name2) > : file_name_cmp (name1, name2)); > } > > -/* A wrapper for compare_names suitable as an argument for qsort. */ > +/* Compare names FILE1 and FILE2 when sorting a directory. > + Prefer filtered comparison, breaking ties with file_name_cmp. */ > > static int > compare_names_for_qsort (void const *file1, void const *file2) > { > char const *const *f1 = file1; > char const *const *f2 = file2; > - return compare_names (*f1, *f2); > + int diff = compare_names (*f1, *f2); > + return diff ? diff : file_name_cmp (*f1, *f2); > } > > /* Compare the contents of two directories named in CMP. > @@ -253,6 +255,41 @@ diff_dirs (struct comparison const *cmp, > pretend the "next name" in that dir is very large. */ > int nameorder = (!*names[0] ? 1 : !*names[1] ? -1 > : compare_names (*names[0], *names[1])); > + > + /* Prefer a file_name_cmp match if available. This algorithm is > + O(N**2), where N is the number of names in a directory > + that compare_names says are all equal, but in practice N > + is so small it's not worth tuning. */ > + if (nameorder == 0) > + { > + int raw_order = file_name_cmp (*names[0], *names[1]); > + if (raw_order != 0) > + { > + int greater_side = raw_order < 0; > + int lesser_side = 1 - greater_side; > + char const **lesser = names[lesser_side]; > + char const *greater_name = *names[greater_side]; > + char const **p; > + > + for (p = lesser + 1; > + *p && compare_names (*p, greater_name) == 0; > + p++) > + { > + int cmp = file_name_cmp (*p, greater_name); > + if (0 <= cmp) > + { > + if (cmp == 0) > + { > + memmove (lesser + 1, lesser, > + (char *) p - (char *) lesser); > + *lesser = greater_name; > + } > + break; > + } > + } > + } > + } > + > int v1 = (*handle_file) (cmp, > 0 < nameorder ? 0 : *names[0]++, > nameorder < 0 ? 0 : *names[1]++); > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 6a4858c..242d501 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -1,6 +1,7 @@ > TESTS = \ > basic \ > binary \ > + colliding-file-names \ > help-version \ > function-line-vs-leading-space \ > label-vs-func \ > diff --git a/tests/colliding-file-names b/tests/colliding-file-names > new file mode 100644 > index 0000000..f14dce7 > --- /dev/null > +++ b/tests/colliding-file-names > @@ -0,0 +1,20 @@ > +#!/bin/sh > +# Check that diff responds well if a directory has multiple file names > +# that compare equal. > + > +: ${srcdir=.} > +. "$srcdir/init.sh"; path_prepend_ ../src > + > +mkdir d1 d2 || fail=1 > + > +for i in abc abC aBc aBC Abc AbC ABc ABC; do > + echo $i >d1/$i || fail=1 > +done > + > +for i in ABC ABc AbC Abc aBC aBc abC abc; do > + echo $i >d2/$i || fail=1 > +done > + > +diff -r --ignore-file-name-case d1 d2 || fail=1 > + > +Exit $fail
Files 1/foo/bar/baz/fileᳮjpeg and 2/foo/bar/baz/fileᲮjpeg differ Files 1/foo/bar/baz/fileᲮjpeg and 2/foo/bar/baz/fileᳮjpeg differ
smime.p7s
Description: S/MIME cryptographic signature
