Re: Test failures in t4034

2012-09-02 Thread Junio C Hamano
Ramsay Jones ram...@ramsay1.demon.co.uk writes:

 Yes, there was a net increase in the line count when I introduced
 die(), but the main program flow was less cluttered by error handling.
 The net result looked much better, so I thought it was worth it.

 What may not be too obvious, however, is that test-regex.c was written
 to be independent of git.

That part I was very aware of actually; it it is a bit tricky to
tell what the right thing to do, though.  Your test itself needs to
be pretty much portable without the portability help you would get
git-compat-util.h, but the point of this kind of test is to tell if
you want to define preprocessor macros that may affect the behaviour
of such compatibility layer ;-)

 Given that I'm now building it as part of git, I should have simply
 #included git-compat-util.h and used the die() routine from libgit.a
 (since I'm now *relying* on test-regex being linked with libgit.a).

OK.

 +int main(int argc, char **argv)
 +{
 +   char *pat = [^={} \t]+;
 +   char *str = ={}\nfred;
 +   regex_t r;
 +   regmatch_t m[1];
 +
 +   if (regcomp(r, pat, REG_EXTENDED | REG_NEWLINE))
 +   die(failed regcomp() for pattern '%s', pat);
 +   if (regexec(r, str, 1, m, 0))
 +   die(no match of pattern '%s' to string '%s', pat, str);
 +
 +   /* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
 +   if (m[0].rm_so == 3) /* matches '\n' when it should not */
 +   exit(1);
 
 This could be the third call site of die() that tells the user to
 build with NO_REGEX=1.  Then cd t  sh t0070-fundamental.sh -i -v would
 give that message directly to the user.

 Hmm, even without -i -v, it's *very* clear what is going on, but sure
 it wouldn't hurt either. (Also, I wanted to be able to distinguish an exit
 via die() from a test failed error return).

OK.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-09-01 Thread Ramsay Jones
Junio C Hamano wrote:
 Ramsay Jones ram...@ramsay1.demon.co.uk writes:
 

[snip]

 diff --git a/test-regex.c b/test-regex.c
 new file mode 100644
 index 000..9259985
 --- /dev/null
 +++ b/test-regex.c
 @@ -0,0 +1,35 @@
 +#include stdlib.h
 +#include stdio.h
 +#include stdarg.h
 +#include sys/types.h
 +#include regex.h
 +
 +static void die(const char *fmt, ...)
 +{
 +va_list p;
 +
 +va_start(p, fmt);
 +vfprintf(stderr, fmt, p);
 +va_end(p);
 +fputc('\n', stderr);
 +exit(128);
 +}
 
 Looks like a bit of overkill for only two call sites, whose output
 we would never see because it is behind the test, but OK.

Yes, there was a net increase in the line count when I introduced
die(), but the main program flow was less cluttered by error handling.
The net result looked much better, so I thought it was worth it.

What may not be too obvious, however, is that test-regex.c was written
to be independent of git. You should be able to compile the (single) file
on any POSIX system to determine if the system regex routines suffer this
problem. (It was also supposed to be quiet, unless it die()-ed, and
provide the result via the exit code).

Given that I'm now building it as part of git, I should have simply
#included git-compat-util.h and used the die() routine from libgit.a
(since I'm now *relying* on test-regex being linked with libgit.a).

 +int main(int argc, char **argv)
 +{
 +char *pat = [^={} \t]+;
 +char *str = ={}\nfred;
 +regex_t r;
 +regmatch_t m[1];
 +
 +if (regcomp(r, pat, REG_EXTENDED | REG_NEWLINE))
 +die(failed regcomp() for pattern '%s', pat);
 +if (regexec(r, str, 1, m, 0))
 +die(no match of pattern '%s' to string '%s', pat, str);
 +
 +/* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
 +if (m[0].rm_so == 3) /* matches '\n' when it should not */
 +exit(1);
 
 This could be the third call site of die() that tells the user to
 build with NO_REGEX=1.  Then cd t  sh t0070-fundamental.sh -i -v would
 give that message directly to the user.

Hmm, even without -i -v, it's *very* clear what is going on, but sure
it wouldn't hurt either. (Also, I wanted to be able to distinguish an exit
via die() from a test failed error return).

So, new (tested) version of the patch comming.

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-21 Thread Ramsay Jones
Junio C Hamano wrote:
 Ramsay Jones ram...@ramsay1.demon.co.uk writes:
 
 I had the same problem (or at least it *looks* like the same problem) on 
 Linux
 last year (May 2011), which turned out to be a bug in the regex routines in 
 an
 old version of glibc. 

 I don't know OS X at all, so this may not be relevent; does OS X use glibc?
 (I didn't think so, but ...)

 I sent some patches to the list which may be helpful. I can't get to gmane to
 look up a reference, but you need to search for:

 [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh 
 test failures

 sent on 3rd May 2011.
 
 Thanks; that's $gmane/172676 for people who prefer easier to read
 threading interface.
 
 Also, in the same thread, a reply to Jonathan Nieder on 7th May contains a
 test which checks whether your regex routines suffer this bug.

 These patches were not applied since I didn't think this would be a common
 problem. I simply set NO_REGEX=1 in my config.mak, since the compat/ regex
 routines don't suffer from this problem.
 
 You also said:
 
   This is an RFC because:
- A simple fix would be for me to put NO_REGEX=1 in my config.mak,
  since the compat/regex routines don't suffer this problem.
- I suspect this bug is old enough that it will not affect many users.
- I have not audited the other non-matching list expressions in
  userdiff.c
- blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
  flag, but get the regex from the user (eg from command line).
 
 I think:
 
  - the second this is old enough assumption was broken again by
Brian this week ;-)
 
  - the first Use NO_REGEX if your regexp library is broken is a
reasonable thing to do; is this something we may want to throw
into the platform specific section of the top-level Makefile?
 
  - among the fourth, blame and grep goes line by line, and even
though pickaxe is primarily meant to take multi-line pattern, I
do not think people give multi-line pattern when they use it in
the regexp mode.  So I do not think they pose a real issue even
though they get an arbitrary pattern from the user.
 
  - the third, combined with the fact that end user can define their
own pattern, is a killer.  We cannot really afford to let broken
regex library to break us.
 
 I think a sensible way to go in the longer term, while we wait these
 old regexp libraries die out, is to help people to avoid building
 git without NO_REGEX on platforms where they need it.

Agreed. Did you take a look at the second patch I mentioned above?
I've included a rebased version (onto v1.7.12) of that patch below.

NOTE: I have not even attempted to compile this version of the patch
and I can't remember how much testing I did last year, so this is
included *only* for discussion purposes ...

I think that, after some testing, this (or something like it) is the
best that we can do. What do you think?

ATB,
Ramsay Jones

-- 8 --
Subject: [PATCH] test-regex: Add a test to check for a bug in the regex routines

Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk
---
 .gitignore |  1 +
 Makefile   |  1 +
 t/t0070-fundamental.sh |  5 +
 test-regex.c   | 35 +++
 4 files changed, 42 insertions(+)
 create mode 100644 test-regex.c

diff --git a/.gitignore b/.gitignore
index bb5c91e..68fe464 100644
--- a/.gitignore
+++ b/.gitignore
@@ -189,6 +189,7 @@
 /test-mktemp
 /test-parse-options
 /test-path-utils
+/test-regex
 /test-revision-walking
 /test-run-command
 /test-sha1
diff --git a/Makefile b/Makefile
index 6b0c961..3b760d3 100644
--- a/Makefile
+++ b/Makefile
@@ -496,6 +496,7 @@ TEST_PROGRAMS_NEED_X += test-mergesort
 TEST_PROGRAMS_NEED_X += test-mktemp
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-regex
 TEST_PROGRAMS_NEED_X += test-revision-walking
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 9bee8bf..da2c504 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -25,4 +25,9 @@ test_expect_success POSIXPERM 'mktemp to unwritable directory 
prints filename' '
grep cannotwrite/test err
 '
 
+test_expect_success 'check for a bug in the regex routines' '
+   # if this test fails, re-build git with NO_REGEX=1
+   test-regex
+'
+
 test_done
diff --git a/test-regex.c b/test-regex.c
new file mode 100644
index 000..9259985
--- /dev/null
+++ b/test-regex.c
@@ -0,0 +1,35 @@
+#include stdlib.h
+#include stdio.h
+#include stdarg.h
+#include sys/types.h
+#include regex.h
+
+static void die(const char *fmt, ...)
+{
+   va_list p;
+
+   va_start(p, fmt);
+   vfprintf(stderr, fmt, p);
+   va_end(p);
+   fputc('\n', stderr);
+   exit(128);
+}
+
+int main(int argc, char **argv)
+{
+   char *pat = [^={} \t]+;
+   char *str

Re: Test failures in t4034

2012-08-21 Thread Junio C Hamano
Ramsay Jones ram...@ramsay1.demon.co.uk writes:

 I think that, after some testing, this (or something like it) is the
 best that we can do. What do you think?

 ATB,
 Ramsay Jones

 -- 8 --
 Subject: [PATCH] test-regex: Add a test to check for a bug in the regex 
 routines

 Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk
 ---
 diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
 index 9bee8bf..da2c504 100755
 --- a/t/t0070-fundamental.sh
 +++ b/t/t0070-fundamental.sh
 @@ -25,4 +25,9 @@ test_expect_success POSIXPERM 'mktemp to unwritable 
 directory prints filename' '
   grep cannotwrite/test err
  '
  
 +test_expect_success 'check for a bug in the regex routines' '
 + # if this test fails, re-build git with NO_REGEX=1
 + test-regex
 +'

OK.

  test_done
 diff --git a/test-regex.c b/test-regex.c
 new file mode 100644
 index 000..9259985
 --- /dev/null
 +++ b/test-regex.c
 @@ -0,0 +1,35 @@
 +#include stdlib.h
 +#include stdio.h
 +#include stdarg.h
 +#include sys/types.h
 +#include regex.h
 +
 +static void die(const char *fmt, ...)
 +{
 + va_list p;
 +
 + va_start(p, fmt);
 + vfprintf(stderr, fmt, p);
 + va_end(p);
 + fputc('\n', stderr);
 + exit(128);
 +}

Looks like a bit of overkill for only two call sites, whose output
we would never see because it is behind the test, but OK.

 +int main(int argc, char **argv)
 +{
 + char *pat = [^={} \t]+;
 + char *str = ={}\nfred;
 + regex_t r;
 + regmatch_t m[1];
 +
 + if (regcomp(r, pat, REG_EXTENDED | REG_NEWLINE))
 + die(failed regcomp() for pattern '%s', pat);
 + if (regexec(r, str, 1, m, 0))
 + die(no match of pattern '%s' to string '%s', pat, str);
 +
 + /* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
 + if (m[0].rm_so == 3) /* matches '\n' when it should not */
 + exit(1);

This could be the third call site of die() that tells the user to
build with NO_REGEX=1.  Then cd t  sh t0070-fundamental.sh -i -v would
give that message directly to the user.

 + exit(0);
 +}

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-19 Thread Junio C Hamano
Brian Gernhardt br...@gernhardtsoftware.com writes:

 I've been getting a couple of test failures and finally had the time to track 
 them down.

 t4034-diff-words fails tests 22 diff driver 'bibtex' and 26
 diff driver 'html'.  Bisecting shows that the file started giving
 me errors in commit 8d96e72 t4034: bulk verify builtin word regex
 sanity, which appears to introduce those tests.  I don't see
 anything obviously wrong with the tests and I'm not familiar with
 the diff-words code, so I'm not sure what's wrong.

 I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).

 Test results follow:

 -- 8 --

 expecting success: 
   cp $TEST_DIRECTORY/t4034/bibtex/pre \
   $TEST_DIRECTORY/t4034/bibtex/post \
   $TEST_DIRECTORY/t4034/bibtex/expect . 
   echo * diff=bibtex .gitattributes 
   word_diff --color-words
   
 --- expect2012-08-18 05:54:29.0 +
 +++ output.decrypted  2012-08-18 05:54:29.0 +
 @@ -8,8 +8,8 @@
author={Aldous, REDD.RESETGREENDavidRESET},
journal={Information Theory, IEEE Transactions on},RESET
volume={RED33RESETGREENBogus.RESET},
 -  number={RED2RESETGREEN4RESET},
 +  number={4},
pages={219--223},RESET
 -  year=GREEN1987,RESET
 -GREEN  note={This is in fact a rather funny read since ethernet works well 
 in practice. TheRESET {RED1987RESETGREEN\em pre} reference is the 
 right one, however.RESET}RED,RESET
 +  year=RED{1987},RESETGREEN1987,RESET
 +  note={This is in fact a rather funny read since ethernet works well in 
 practice. The {\em pre} reference is the right one, however.}
  }RESET
 not ok - 22 diff driver 'bibtex'

Thanks for a report.  Off the top of my head, there may be three
possibilities.

 (1) The compiled binary of Git is broken on your platform and not
 formatting the escape sequence correctly.  I somehow think it
 is very unlikely, as the code to do so is pretty much platform
 agonistic (color.c does not use anything fancy from system
 libraries).

 (2) The test script, the part that converts the escape sequence to
 human readable form, is broken---not written in a portable awk.

 (3) The implementation of awk on your platform was broken by your
 supplier, with the same infinite wisdom they broke the UTF-8
 pathnames on their filesystem implementation with ;-)

Can you help isolating the issue first to see if it is (1) or one of
the other two?

Run cd t  sh t4034-diff-words -i to force stop the test upon the
first breakage, and inspect the output before the awk script
test_decode_color munges it.  Does it show a red number 2 and green
number 4 on the line that begins with number= (or if you have an
access to a box on which this test passes, grab the raw output from
it by running this test, and make byte-for-byte comparison)?  
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-19 Thread Junio C Hamano
Brian Gernhardt mister.r...@gmail.com writes:

 I wonder if there is something non-portable about the bibtex
 filter?  (The HTML filter as well, since that errors on my machine
 too.)

Yeah, that I didn't think of, but is a possibility (part of (1)
above).

The HTML one is [^= \t]+ and
the Bibtex one is [={}\]|[^={}\ \t]+

and both will be used with |[^[:space:]]|[\xc0-\xff][\x80-\xbf]+
appended and given to regcomp with REG_EXTENDED|REG_NEWLINE.  

Nothing jumps at me that is common to these two but not shared by
other patterns.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-19 Thread Johannes Sixt
Am 19.08.2012 16:50, schrieb Ramsay Jones:
 Brian Gernhardt wrote:
 I've been getting a couple of test failures and finally had the
 time to track them down.
 
 t4034-diff-words fails tests 22 diff driver 'bibtex' and 26 diff
 driver 'html'.  Bisecting shows that the file started giving me
 errors in commit 8d96e72 t4034: bulk verify builtin word regex
 sanity, which appears to introduce those tests.  I don't see
 anything obviously wrong with the tests and I'm not familiar with
 the diff-words code, so I'm not sure what's wrong.
 
 I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).
 
 I had the same problem (or at least it *looks* like the same problem)
 on Linux last year (May 2011), which turned out to be a bug in the
 regex routines in an old version of glibc.

I also had the same problem, but did not remember why I don't have it
anymore. Now that you mention it: It was the same situation and I came
to the same conclusion (old glibc, bogus regex implementation). I worked
it around with NO_REGEX=YesPlease in config.mak.

-- Hannes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Test failures in t4034

2012-08-18 Thread Brian Gernhardt
I've been getting a couple of test failures and finally had the time to track 
them down.

t4034-diff-words fails tests 22 diff driver 'bibtex' and 26 diff driver 
'html'.  Bisecting shows that the file started giving me errors in commit 
8d96e72 t4034: bulk verify builtin word regex sanity, which appears to 
introduce those tests.  I don't see anything obviously wrong with the tests and 
I'm not familiar with the diff-words code, so I'm not sure what's wrong.

I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).

Test results follow:

-- 8 --

expecting success: 
cp $TEST_DIRECTORY/t4034/bibtex/pre \
$TEST_DIRECTORY/t4034/bibtex/post \
$TEST_DIRECTORY/t4034/bibtex/expect . 
echo * diff=bibtex .gitattributes 
word_diff --color-words

--- expect  2012-08-18 05:54:29.0 +
+++ output.decrypted2012-08-18 05:54:29.0 +
@@ -8,8 +8,8 @@
   author={Aldous, REDD.RESETGREENDavidRESET},
   journal={Information Theory, IEEE Transactions on},RESET
   volume={RED33RESETGREENBogus.RESET},
-  number={RED2RESETGREEN4RESET},
+  number={4},
   pages={219--223},RESET
-  year=GREEN1987,RESET
-GREEN  note={This is in fact a rather funny read since ethernet works well 
in practice. TheRESET {RED1987RESETGREEN\em pre} reference is the right 
one, however.RESET}RED,RESET
+  year=RED{1987},RESETGREEN1987,RESET
+  note={This is in fact a rather funny read since ethernet works well in 
practice. The {\em pre} reference is the right one, however.}
 }RESET
not ok - 22 diff driver 'bibtex'

-- 8 --

expecting success: 
cp $TEST_DIRECTORY/t4034/html/pre \
$TEST_DIRECTORY/t4034/html/post \
$TEST_DIRECTORY/t4034/html/expect . 
echo * diff=html .gitattributes 
word_diff --color-words

--- expect  2012-08-18 05:54:29.0 +
+++ output.decrypted2012-08-18 05:54:29.0 +
@@ -4,5 +4,5 @@
 BOLD+++ b/postRESET
 CYAN@@ -1,3 +1,3 @@RESET
 tag GREENnewattr=newvalueRESETGREENaddedRESET content/tag
-tag 
attr=REDvalueRESETGREENnewvalueRESETREDcontentRESETGREENchangedRESET/tag
-REDtagRESETGREENnewtagRESETcontent 
REDentity;RESETGREENnewentity;RESETRED/tagRESETGREEN/newtagRESET
+tag attr=newvaluechanged/tag
+newtagcontent newentity;/newtag
not ok - 26 diff driver 'html'--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html