On Jun 14 22:18, IWAMURO Motonori wrote:
> 2009/6/13 Corinna Vinschen 
> > The problem appears to be that there is no standard for the handling
> > of ambiguous characters.
> 
> Yes, but the guideline exists.
> http://cygwin.com/ml/cygwin/2009-05/msg00444.html

A single mail in a single mailing list of a single project.  That's rather
a suggestion than a guideline...

> > > Ambiguous characters behave like wide or narrow characters depending
> > > on the context (language tag, script identification, associated
> > > font, source of data, or explicit markup; all can provide the
> > > context). If the context cannot be established reliably, they should
> > > be treated as narrow characters by default.
> 
> > Define the default for ja, ko, and zh to use width = 2, with a
> > @cjknarrow (or whatever) modifier to use width = 1.
> 
> I think it is good idea.

If everybody agrees to this suggestion, here's the patch.  Tested
with various combinations like

  lang=ja_jp.ut...@cjknarrow
  lang=ja...@cjknarrow
  lang=ja.ut...@cjknarrow
  lang...@cjknarrow


Corinna


        * libc/locale/locale.c (loadlocale): Add handling of "@cjknarrow"
        modifier on _MB_CAPABLE targets.  Add comment to explain.


Index: libc/locale/locale.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/locale/locale.c,v
retrieving revision 1.20
diff -u -p -r1.20 locale.c
--- libc/locale/locale.c        3 Jun 2009 19:28:22 -0000       1.20
+++ libc/locale/locale.c        15 Jun 2009 08:40:46 -0000
@@ -397,6 +397,9 @@ loadlocale(struct _reent *p, int categor
   int (*l_wctomb) (struct _reent *, char *, wchar_t, const char *, mbstate_t 
*);
   int (*l_mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
                   const char *, mbstate_t *);
+#ifdef _MB_CAPABLE
+  int cjknarrow = 0;
+#endif
   
   /* "POSIX" is translated to "C", as on Linux. */
   if (!strcmp (locale, "POSIX"))
@@ -427,10 +430,14 @@ loadlocale(struct _reent *p, int categor
       if (c[0] == '.')
        {
          /* Charset */
-         strcpy (charset, c + 1);
-         if ((c = strchr (charset, '@')))
+         char *chp;
+
+         ++c;
+         strcpy (charset, c);
+         if ((chp = strchr (charset, '@')))
            /* Strip off modifier */
-           *c = '\0';
+           *chp = '\0';
+         c += strlen (charset);
        }
       else if (c[0] == '\0' || c[0] == '@')
        /* End of string or just a modifier */
@@ -442,6 +449,17 @@ loadlocale(struct _reent *p, int categor
       else
        /* Invalid string */
        return NULL;
+#ifdef _MB_CAPABLE
+      if (c[0] == '@')
+       {
+         /* Modifier */
+         /* Only one modifier is recognized right now.  "cjknarrow" is used
+            to modify the behaviour of wcwidth() for East Asian languages.
+            For details see the comment at the end of this function. */
+         if (!strcmp (c + 1, "cjknarrow"))
+           cjknarrow = 1;
+       }
+#endif
     }
   /* We only support this subset of charsets. */
   switch (charset[0])
@@ -604,13 +622,15 @@ loadlocale(struct _reent *p, int categor
       __mbtowc = l_mbtowc;
       __set_ctype (charset);
       /* Check for the language part of the locale specifier.  In case
-         of "ja", "ko", or "zh", assume the use of CJK fonts.  This is
-        stored in lc_ctype_cjk_lang and tested in wcwidth() to figure
-        out the width to return (1 or 2) for the "CJK Ambiguous Width"
-        category of characters. */
-      lc_ctype_cjk_lang = (strncmp (locale, "ja", 2) == 0
-                          || strncmp (locale, "ko", 2) == 0
-                          || strncmp (locale, "zh", 2) == 0);
+         of "ja", "ko", or "zh", assume the use of CJK fonts, unless the
+        "@cjknarrow" modifier has been specifed.
+        The result is stored in lc_ctype_cjk_lang and tested in wcwidth()
+        to figure out the width to return (1 or 2) for the "CJK Ambiguous
+        Width" category of characters. */
+      lc_ctype_cjk_lang = !cjknarrow
+                         && ((strncmp (locale, "ja", 2) == 0
+                             || strncmp (locale, "ko", 2) == 0
+                             || strncmp (locale, "zh", 2) == 0));
 #endif
     }
   else if (category == LC_MESSAGES)


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply via email to