In article <mailman.1485.1337627992.855.bug-b...@gnu.org>, Linda Walsh <b...@tlinx.org> wrote: >Greg Wooledge wrote: > >> On Sun, May 20, 2012 at 11:36:35AM -0700, Linda Walsh wrote: > >> For instance, on HP-UX 10.20, in the en_US.iso88591 locale: >> A a ... B b >> Meanwhile, on Debian 6.0, in the en_US.iso88591 locale: >> a A ... b B >> >> As you can see, the two en_US.iso88591 implementations are not the same. > >---- > Great!... > >So which is correct?
Both! Isn't this fun! Current POSIX leaves this up to the implementation. I believe that the Debian order is what earlier POSIX required. >Anyone wanting to reference an upper or lower case range >[a-z] or [A-Z], is gonna hurt from this. This is why I started the Campaign For Rational Range Interpretation, now part of gawk and I believe in the most recent grep also, which returns us to the sane days of yesteryear, where [a-z] got only lowercase letters and [A-Z] got only uppercase ones. >My OS uses "en_US.UTF-8". I personally have had export LC_ALL=C in my .profile / .bashrc for many years now, to keep the behavior G-d intended. >You'd think unicode would have something to say about collation >order that wouldn't allow such randomness, but maybe not. It actually makes sense that it doesn't, since Unicode is more or less a mapping of code points to glyphs, which is language independant. The rules for collating depend upon the language. -- Aharon (Arnold) Robbins arnold AT skeeve DOT com P.O. Box 354 Home Phone: +972 8 979-0381 Nof Ayalon Cell Phone: +972 50 729-7545 D.N. Shimshon 99785 ISRAEL