On 6/15/17 11:22 AM, PePa wrote: > On 15/06/2560 22:03, Chet Ramey wrote: >> I don't know other languages well enough to point one out, but I can easily >> imagine that a particular character is an "alphabetic" in, say, Mandarin, >> but doesn't exist in someone's en_US character set. > > I though you were referring to a character existing in both sets> This is the > reason why I think you should only concern yourself with > characters that already have an established semantic in bash. Don't get > bogged down in distinguishing classes in myriads of character sets. Just > allow anything that isn't ASCII (but IS UTF-8 -- I'm talking about > UTF-8, otherwise this discussion becomes impossible).
Seriously: not everyone uses a UTF-8 locale. Something that uses an approach along the lines of Eduardo's patch won't have the UTF-8-only problem. If I undertake the effort to put this into bash, and commit to supporting it forever (which is how these things go), I'm not going to orphan non-UTF-8 users. And no matter which way we go here, I can't see any advantage in allowing invalid multibyte sequences in identifier names. The proposal to, essentially, use isw* functions instead of is* ctype functions to determine whether a (now wide) character is a valid identifier character is a straightforward enhancement. You have to look at every character anyway no matter what. > >> I see a number of problems with using non-alphanumerics in shell >> identifiers. The real advantage to allowing this is to allow users to >> put alphabetics from their own locales into shell identifiers. There's >> little reason to do it otherwise, and plenty of complications. > > What are those problems and complications?? Mostly portability across character sets and maintainability concerns (which, admittedly, are nobody's problems but mine). >> As for the implementation, it's much easier to use isalpha/isdigit (and >> their wide character equivalents) than to try and keep track of a blacklist >> of characters across different locales. > > I don't propose blacklists across locales, just blacklisting what > already has an established meaning in bash, ie. ASCII. All the rest is > just fair game, if someone insists on using a thumbs-up icon in a > variable name, why restrict that?? The restricting and policing is going > to make this costly in terms of developer time and CPU time. You still have to look at every character. The world isn't all UTF-8: there are character sets where multibyte characters include characters that are valid ascii (including, I suspect, `='). -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/