Hi Karl, following your observations, i just rewrote the setlocale(3) manual page and comitted the new version.
Karl Williamson wrote on Mon, Mar 19, 2018 at 11:51:31AM -0600: > But your man page doesn't describe any of this. It doesn't say that > UTF-8 is a legal locale, for example. Fixed. > It does say that LC_CTYPE is the > only category that can be other than C or POSIX, but it doesn't say the > only other possible one is UTF-8. Fixed. > I think it should. If your replies > to me were slightly repackaged and placed into the man page, that would > help a lot. > > I still believe that in my program the setlocale() returning C for > LC_ALL is a bug. I agree, that is a bug in my code. I don't have a patch yet, but i will write one, it cannot be difficult. I doubt that it will go in before release, though. The fact that the bug went undetected for many months shows that it is not release-critical, and we are now in a phase where we want to weed out critical bugs rather than risk introducing new ones. > I don't know what would happen if one were to call setlocale(LC_ALL, > "ro_RO.UTF-8"); As expected, it sets the whole locale to "ro_RO.UTF-8" and returns "ro_RO.UTF-8". > BTW, There is some variance actually in real UTF-8 locales, which you > may not have considered. Unicode, contrary to their claims, is not > completely locale-independent in LC_CTYPE. Some Turkish locales that > are UTF-8 use alternate casing rules for the dotless and dotted i > characters. I'm aware of that, but we will not support it, making the character properties language-dependent is excessive complexity. It is safer and results in more predictable program behavious if every character has a well-defined, constant set of properties. KISS and the principle of least surprise are key in this respect. > And some, especially earlier, UTF-8 locales consider various ASCII > characters that are mandated by POSIX to be ispunct() to not be > punctuation. Not gonna happen on OpenBSD. Over my dead body. We won't change ASCII, and we *will* make sure that Unicode is treated as a strict superset of ASCII. What ASCII (or more precisely, the C locale) defines, Unicode is not free to change. Yours, Ingo