On Sat, 3 Dec 2022 19:28:10 +0900
Takashi Yano wrote:
> On Fri, 2 Dec 2022 19:40:30 -0800
> Ilya Zakharevich wrote:
> > On Wed, Nov 16, 2022 at 04:48:25AM -0800, I wrote:
> > > De-quoting (converting the Windows’ command-line into argc/argv) does
> > > not remove double quotes if characters not fit for 8-bit (?) are present.
> > > 
> > >   Broken in: CYGWIN_NT-6.1     Bu 3.3.4(0.341/5/3) 2022-01-31 19:35 
> > > x86_64 Cygwin
> > >   Works  in: CYGWIN_NT-6.1-WOW Bu 2.2.1(0.289/5/3) 2015-08-20 11:40 i686  
> > >  Cygwin
> > > 
> > > To reproduce, do in CMD’s command line:
> > > 
> > >   D:\> D:\Programs\cygwin2022\bin\perl -wle "print for @ARGV" . "/i/" 
> > > "/и/" .
> > >   .
> > >   /i/
> > >   "/и/"
> > >   .
> > 
> > I triple-checked
> >   • with a Win10 machine (and a version of cygwin given above),
> >   • with a fresh latest(=test)-cygwin-dll installation on a Win7 (as above) 
> > machine. 
> > 
> > Same bug everywhere.
> 
> This certainly seems to be a problem of cygwin1.dll.
> 
> Though I am not sure this is the right thing, I have confirmed
> that the following patch solves the issue.
> 
> diff --git a/newlib/libc/locale/lctype.c b/newlib/libc/locale/lctype.c
> index 644669765..732d132e1 100644
> --- a/newlib/libc/locale/lctype.c
> +++ b/newlib/libc/locale/lctype.c
> @@ -25,11 +25,20 @@
>  
>  #define LCCTYPE_SIZE (sizeof(struct lc_ctype_T) / sizeof(char *))
>  
> +#ifdef __CYGWIN__
> +static char  numsix[] = { '\6', '\0'};
> +#else
>  static char  numone[] = { '\1', '\0'};
> +#endif
>  
>  const struct lc_ctype_T _C_ctype_locale = {
> +#ifdef __CYGWIN__
> +     "UTF-8",                        /* codeset */
> +     numsix                          /* mb_cur_max */
> +#else
>       "ASCII",                        /* codeset */
>       numone                          /* mb_cur_max */
> +#endif
>  #ifdef __HAVE_LOCALE_INFO_EXTENDED__
>       ,
>       { "0", "1", "2", "3", "4",      /* outdigits */

The patch above also affects __C_locale.
The patch below should be more appropriate.

diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c
index e523d2366..7485ac292 100644
--- a/newlib/libc/locale/locale.c
+++ b/newlib/libc/locale/locale.c
@@ -244,6 +244,21 @@ const struct __locale_t __C_locale =
 };
 #endif /* _MB_CAPABLE */
 
+#ifdef __CYGWIN__
+static char    numsix[] = { '\6', '\0'};
+static const struct lc_ctype_T _C_UTF8_ctype_locale = {
+       "UTF-8",                        /* codeset */
+       numsix                          /* mb_cur_max */
+#ifdef __HAVE_LOCALE_INFO_EXTENDED__
+       ,
+       { "0", "1", "2", "3", "4",      /* outdigits */
+         "5", "6", "7", "8", "9" },
+       { L"0", L"1", L"2", L"3", L"4", /* woutdigits */
+         L"5", L"6", L"7", L"8", L"9" }
+#endif
+};
+#endif
+
 struct __locale_t __global_locale =
 {
   { "C", "C", DEFAULT_LOCALE, "C", "C", "C", "C", },
@@ -272,10 +287,11 @@ struct __locale_t __global_locale =
     { NULL, NULL },                    /* LC_ALL */
 #ifdef __CYGWIN__
     { &_C_collate_locale, NULL },      /* LC_COLLATE */
+    { &_C_UTF8_ctype_locale, NULL },   /* LC_CTYPE */
 #else
     { NULL, NULL },                    /* LC_COLLATE */
-#endif
     { &_C_ctype_locale, NULL },                /* LC_CTYPE */
+#endif
     { &_C_monetary_locale, NULL },     /* LC_MONETARY */
     { &_C_numeric_locale, NULL },      /* LC_NUMERIC */
     { &_C_time_locale, NULL },         /* LC_TIME */

-- 
Takashi Yano <takashi.y...@nifty.ne.jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to