Hi Karl,

following your observations, i just rewrote the setlocale(3)
manual page and comitted the new version.

Karl Williamson wrote on Mon, Mar 19, 2018 at 11:51:31AM -0600:

> But your man page doesn't describe any of this.  It doesn't say that 
> UTF-8 is a legal locale, for example.

Fixed.

> It does say that LC_CTYPE is the 
> only category that can be other than C or POSIX, but it doesn't say the 
> only other possible one is UTF-8.

Fixed.

> I think it should.  If your replies 
> to me were slightly repackaged and placed into the man page, that would 
> help a lot.
> 
> I still believe that in my program the setlocale() returning C for 
> LC_ALL is a bug.

I agree, that is a bug in my code.  I don't have a patch yet,
but i will write one, it cannot be difficult.  I doubt that it
will go in before release, though.  The fact that the bug went
undetected for many months shows that it is not release-critical,
and we are now in a phase where we want to weed out critical bugs
rather than risk introducing new ones.

> I don't know what would happen if one were to call setlocale(LC_ALL, 
> "ro_RO.UTF-8");

As expected, it sets the whole locale to "ro_RO.UTF-8"
and returns "ro_RO.UTF-8".

> BTW, There is some variance actually in real UTF-8 locales, which you 
> may not have considered.  Unicode, contrary to their claims, is not 
> completely locale-independent in LC_CTYPE.  Some Turkish locales that 
> are UTF-8 use alternate casing rules for the dotless and dotted i 
> characters.

I'm aware of that, but we will not support it, making the character
properties language-dependent is excessive complexity.  It is safer
and results in more predictable program behavious if every character
has a well-defined, constant set of properties.  KISS and the
principle of least surprise are key in this respect.

> And some, especially earlier, UTF-8 locales consider various ASCII
> characters that are mandated by POSIX to be ispunct() to not be
> punctuation.

Not gonna happen on OpenBSD.  Over my dead body.  We won't change
ASCII, and we *will* make sure that Unicode is treated as a strict
superset of ASCII.  What ASCII (or more precisely, the C locale)
defines, Unicode is not free to change.

Yours,
  Ingo

Reply via email to