On Wed, Dec 18, 2019 at 11:15:46AM -0800, L A Walsh wrote: > On 2019/12/16 08:39, Greg Wooledge wrote: > > The problem is, it is *not possible* to extract the set of characters > > out of an arbitrary locale. The locale interfaces simply are not built > > to allow it. > > > > You can do it in the C locale, simply because the C locale is a known, > > fixed quantity that you can hard-code. You can't do it in any other locale.
> You can do it in Perl, JavaScript, Python, Ruby C, C++ among others, > [...] > \p{L} or \p{Letter}: any kind of letter from any language. > \p{Ll} or \p{Lowercase_Letter}: a lowercase letter > that has an uppercase variant. You misunderstood me, or perhaps I wasn't clear enough. I agree that if you are GIVEN a character as input, you can determine whether that character is a letter, or a lowercase letter (etc.) in the current locale. What you CANNOT do[1] is GENERATE all of the lowercase letters (etc.) in the current locale. To put it another way: you can write code that determines whether an input character $c matches a glob or regex like [Z-a]. (Maybe.) But, you CANNOT write code to generate all of the characters from Z to a. Since this thread is about brace expansion, which must generate characters, the feature you're looking for is simply impossible, to the best of my knowledge. (I'd be delighted for you to prove me wrong. Show me how to generate all of the :alpha: characters in the en_US.utf8 locale in perl, or python, or any other language.) [1] The only way I know to get that information would be to take as input *every conceivable character*, and, one by one, check whether each of those characters matches the :alpha: class. Such a brute force solution is not in the spirit of the mission. As such, I'll save you the time and do that part myself. wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s "$c"; fi; done; echo ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈ Obviously I did not use *every conceivable character* as input -- just a couple hundred, a completely arbitrary cut-off point, because this is just a proof of concept. Trawling the entire Unicode code point space is left as an adventure for braver souls than mine. As is comparing the different locales on a system, or the same locale between different operating systems. Sorting these characters is also possible, once they have been generated. This is (I think!) what allows things like [Z-a] to work at all: you can check whether $c is >= 'Z' and <= 'a', without knowing what all of the characters in between are. But you can't ask "what comes after Z". wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s\\n "$c"; fi; done | sort | tr -d \\n; echo aAªÁÀÂÅÄÃÆbBcCÇdDeEÈfFgGhHiIjJkKlLmMnNoOºpPqQrRsStTuUvVwWxXyYzZµ Again, this is only PART of the set, and is not intended to be a complete enumeration of the :alpha: characters in my system's locale.