Jim Meyering wrote: > Eric Blake wrote: >> Jim Meyering <jim <at> meyering.net> writes: >>> It adds a test to gl_REGEX that ensures that re_compiler_pattern >>> diagnoses [b-a] as invalid when using RE_SYNTAX_POSIX_EGREP. >> >> Where does POSIX state that this is invalid? > > Thanks for looking. > > I too verified (before embarking) that POSIX does not declare it invalid, > merely unspecified. However, since gnulib's regex has rejected such > ranges for a long time and sed, awk, perl, etc. act that way, I think > it's the way to go. > > Note also that glibc's code appears to try to implement the same > behavior (though conditional upon RE_NO_EMPTY_RANGES, which nearly > everyone uses), but somehow that code does not function properly: > > start_collseq = lookup_collation_sequence_value (start_elem); > end_collseq = lookup_collation_sequence_value (end_elem); > /* Check start/end collation sequence values. */ > if (BE (start_collseq == UINT_MAX || end_collseq == UINT_MAX, 0)) > return REG_ECOLLATE; > if (BE ((syntax & RE_NO_EMPTY_RANGES) && start_collseq > end_collseq, > 0)) > return REG_ERANGE; > > I've just filed this glibc bug: > > http://sourceware.org/bugzilla/show_bug.cgi?id=11244
Andreas Schwab noticed that (RE_SYNTAX_POSIX_EGREP & RE_NO_EMPTY_RANGES) == 0 which explains the problem. In regcomp.c, there are two build_range_exp functions. One for _LIBC, and one for non-_LIBC. The former contains the range test above. The latter, which is used by gnulib, does it this way (regardless of the RE_NO_EMPTY_RANGES syntax bit): if (wcscoll (cmp_buf, cmp_buf + 4) > 0) return REG_ERANGE; Since I want grep (and any other tool using gnulib's regex) to diagnose out-of-order ranges consistently, not just on x86_64-based and non-glibc systems, I can't leave this as-is. Here's what I'm planning: - Revert this patch: ensure that the regexp [b-a] is diagnosed as invalid - Modify the !_LIBC build_range_exp function to take a new argument, syntax, and use that to guard the wcscoll test above. - Ensure that grep uses the RE_NO_EMPTY_RANGES syntax bit as needed. ----- Bottom line: with the above, we'll continue to use glibc's regex on non-x86_64. Jim P.S. I was a little dismayed to see that csplit, expr and nl all use these regex syntax flags: RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES and thus do not diagnose empty ranges.