FRIGN said: > On Sat, 10 Jan 2015 02:52:09 +0100 > "Dmitrij D. Czarkoff" <czark...@gmail.com> wrote: > > > > +#define UPPER "A-Z" > > > +#define LOWER "a-z" > > > +#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" > > > > These definitions hugely misrepresent corresponding character classes. > > I interpreted the character classes by default for the C locale. What do > you mean by hugely misrepresenting? They are just fragments to build the > classes later on.
No, you interpret the character classes for the C locale only, not just by default. Character classes are useless for C locale ("A-Z" is easier to type then "[:upper:]" anyway); they only really make sense for scripts that are supposed to do The Right Thing™ for every locale. Also, defining ranges on systems with no locale-aware collation rules may be tricky. As I gather, sbase is supposed to ignore POSIX locales, so there is no reasonable hope that "[A-Z]" would actually match the whole alphabets of languages based on Latin script. Thus the sanest default I see here is to use isw* family of functions for matching characters against classes, delegating the problem to libc, where it actually belongs. That said, the defines in your patch appear to be fully compatible with GNU and BSD implementations of tr(1), so you may as discourage use of character classes in manual, label them as legacy compatibility syntax and be done with it. -- Dmitrij D. Czarkoff