RE: LC_CTYPE=UTF-8

Schwarz, Konrad Fri, 26 Jun 2020 00:29:34 -0700

> -----Original Message-----
> From: Ingo Schwarze <schwa...@usta.de>
> Sent: Thursday, June 25, 2020 21:25
> To: Alan Coopersmith <alan.coopersm...@oracle.com>
> Cc: Hans Åberg <haber...@telia.com>; Austin Group 
> <austin-group-l@opengroup.org>
> Subject: Re: LC_CTYPE=UTF-8
> 
> Hi Alan,
> 
> Alan Coopersmith wrote on Thu, Jun 25, 2020 at 12:13:33PM -0700:
> > On 6/25/20 8:31 AM, Ingo Schwarze wrote:
> 
> >> Whether to standardize only C.UTF-8 or both C.UTF-8 and POSIX.UTF-8
> >> as synonyms looks a bit like asking for the best colour of a bikeshed.
> >> Given that the standard already contains the redundancy of requiring
> >> both "C" and "POSIX", maybe it is more consistent to also require
> >> both "C.UTF-8" and "POSIX.UTF-8", but i don't think that matters
> >> greatly.
> 
> > The only thought I had along those lines was that I thought the "C"
> > locale came from the C standard, and might be best left to the C
> > committee to standardize, while this group controls the "POSIX"
> > locale definition.  I suspect those following the POSIX standards
> > would end up implementing both, regardless of which specification
> > defines each.


My impression Is that the C standard shied away from all
concrete character-encoding issues, at least originally, where
alternatives such as EBCDIC were still quite relevant.
Although support for multibyte and wide characters were introduced,
this was done in a very abstract way;
I don't recall any mention of explicit encodings such as ASCII.

As such, I think it would be fine for POSIX to standardize
both POSIX.UTF-8 and C.UTF-8; I'd expect little
opposition from the C standard committee to such a move.

(Honestly, I don't know if the Microsoft Visual C library
support a C.UTF-8 locale at the moment -- I'm pretty
sure their system call level is still UTF-16).

TL;DR: for consistency, I'd prefer POSIX to define C.UTF-8
as well as POSIX.UTF-8, even without explicit blessing by
the C committee.  I don't think they reserved parts
of the locale namespace for themselves.

--
Konrad Schwarz

RE: LC_CTYPE=UTF-8

Reply via email to