I was just looking at a bug reported to fedora there where this abort()s

 $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]'

It stems from the fact that there are 56 upper and 59 lower chars in iso-8859-1.
But I also noticed an anomaly which would affect the fix, which is,
that [:upper:] and [:lower:] are extended in string 2
when there are still characters to match in string 1.
I.E. 0xDE (the last upper char) is output from:

 $ echo "_ _" | LC_ALL=en_US ./src/tr '[:lower:] ' '[:upper:]'

That seems quite inconsistent given that other classes
are not allowed in string 2 when translating:

 $ echo "ab ." | LANG=en_US tr '[:digit:]' '[:alpha:]'
 tr: when translating, the only character classes that may appear in
 string2 are `upper' and `lower'

For consistency I think it better to keep the classes
in string 2 just for case mapping, and do something like:

 $ tr '[:upper:] ' '[:lower:]'
 tr: when not truncating set1, a character class can't be
 the last entity in string2

Note BSD allows extending the above, but that's at least
consistent with any class being allowed in string2.
I.E. this is disallowed by coreutils but Ok on BSD:

 $ echo "1 2" | LC_ALL=en_US.iso-8859-1 tr ' ' '[:alpha:]'
 1A2

Is it OK to change tr like this?
I can't see anything depending on that.

cheers,
Pádraig.

Reply via email to