On Jul 22 11:17, Eric Blake wrote:
> On 07/22/2013 02:12 AM, Corinna Vinschen wrote:
> 
> >>> However, please note that this behaviour, while being provided by glibc
> >>> and now by Cygwin, is *not* standards-compliant.  In the narrow sense
> >>> the characters beyond 0x7f are still invalid ASCII chars, and other
> >>> functions working with wchar_t strings won't be as forgiving when using
> >>> invalid input.
> >>>
> 
> > After some sleep, I think I now understand why the glibc devs made
> > regcomp to work this way.  This behaviour is backward compatible to non
> > locale-aware applications.  In the "C" locale, a char is just some
> > arbitrary byte between 0 and 255.  So this pattern always worked before
> > in the "C locale, therefore it makes sense that it continues to work,
> > even if it won't when using other locales/codesets.
> 
> By the way, there is currently a big debate going on in the Austin Group
> (the people responsible for POSIX) on whether the "C" locale must be
> 8-bit clean (the way glibc behaves) or whether it was intended to allow
> UTF-8 encoding by default (the way musl libc wants to behave); and
> resolution of the debate will require input from the C standards
> committee.  There may be some interesting fallout, no matter which
> solution is finally reached.  http://austingroupbugs.net/view.php?id=663

Thanks for letting us know.  This really may get interesting...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply via email to