Ingo Schwarze wrote on Wed, Mar 28, 2018 at 03:23:30PM +0200: > Karl Williamson wrote on Mon, Mar 19, 2018 at 11:51:31AM -0600:
>> I still believe that in my program the setlocale() returning C for >> LC_ALL is a bug. LC_CTYPE should have successfully been set to Romanian >> UTF-8, and so LC_ALL isn't C. Instead, it is a combination of C for all >> the other categories, and UTF-8 for LC_CTYPE. A return of just "C" >> doesn't reflect that complexity. There is a footnote in the ANSI/ISO >> 9899-1990 C standard that the returned string must support that >> heterogeneity, and that the return value be able to be used in a future >> setlocale to get back to the original state. Your setlocale violates >> the standard therefore, and harms your portability goal. > So, here is a patch to fix that, together with two other minor bugs > that i found while looking at my code, and two minor style cleanups. Oops, i caused a regression here. changegl() still sets the LC_ALL entry to "" rather than to NULL if the environment variable LC_ALL is not defined, but the rest of the code would now expect NULL. The simplest fix is to continue using "" everywhere to "mean LC_ALL is not set". See below for the fixed patch. > OK? > Ingo > > Rationale: > * chunk 39, first part (style): > There is no need for calloc(3) because all _LC_LAST entries > in newgl will be assigned to right away. > * chunk 39, second part (main bugfix): > The problem: > In the sequence > setlocale(LC_ALL, "A"); > setlocale(LC_CTYPE, "T"); > setlocale(LC_ALL, NULL); > the last call returns "A" but ought to return "A/T/A/A/A/A". > The root cause: > setlocale(LC_ALL, "A") sets the LC_ALL entry in global_locale, > but setlocale(LC_CTYPE, "T") neglects to clear it. > The solution: > dupgl() must never copy the LC_ALL entry. > When we are setting a category other than LC_ALL, or when we > are setting all categories with the LC_ALL, "/////" syntax, > the old LC_ALL entry becomes invalid. When we are setting > LC_ALL uniformly, it will be replaced anyway. > * chunk 92 (first additonal minor bug): > If strdup(locname) fails, newgl is leaked. > * chunk 136 (style): > Resolve code duplication. > * chunk 184 (second additional minor bug): > Consider the nonsensical > setlocale(LC_ALL, "C/C/fr_FR.UTF-8/C/C/C"); > Of course, it makes no sense to request UTF-8 messages > in ASCII encoding. But *if* that is requested, the code finds > the dot in global_locname and switches the _GlobalRuneLocale > to UTF-8, which is wrong, as "C" was requested for LC_CTYPE. > Fix this by looking at the LC_CTYPE entry only when deciding > what to use for _GlobalRuneLocale. > There is no risk associated with this fix because we ignore > LC_MESSAGES etc. anyway, so we know we wont actually try to > print UTF-8 into an ASCII output stream. Index: lib/libc/locale/setlocale.c =================================================================== RCS file: /cvs/src/lib/libc/locale/setlocale.c,v retrieving revision 1.27 diff -u -p -r1.27 setlocale.c --- lib/libc/locale/setlocale.c 5 Sep 2017 03:16:13 -0000 1.27 +++ lib/libc/locale/setlocale.c 28 Mar 2018 14:28:20 -0000 @@ -39,11 +39,11 @@ dupgl(char **oldgl) char **newgl; int ic; - if ((newgl = calloc(_LC_LAST, sizeof(*newgl))) == NULL) + if ((newgl = reallocarray(NULL, _LC_LAST, sizeof(*newgl))) == NULL) return NULL; for (ic = LC_ALL; ic < _LC_LAST; ic++) { - if ((newgl[ic] = strdup(oldgl != NULL ? - oldgl[ic] : ic == LC_ALL ? "" : "C")) == NULL) { + if ((newgl[ic] = strdup(ic == LC_ALL ? "" : + oldgl == NULL ? "C" : oldgl[ic])) == NULL) { freegl(newgl); return NULL; } @@ -92,8 +92,10 @@ setlocale(int category, const char *locn if (category == LC_ALL && strchr(locname, '/') != NULL) { /* One value for each category. */ - if ((firstname = strdup(locname)) == NULL) + if ((firstname = strdup(locname)) == NULL) { + freegl(newgl); return NULL; + } nextname = firstname; for (ic = 1; ic < _LC_LAST; ic++) if (nextname == NULL || changegl(ic, @@ -136,22 +138,14 @@ setlocale(int category, const char *locn goto done; } - /* Individual category. */ - if (category > LC_ALL) { + /* Individual category, or LC_ALL uniformly set. */ + if (category > LC_ALL || newgl[LC_ALL][0] != '\0') { if (strlcpy(global_locname, newgl[category], sizeof(global_locname)) >= sizeof(global_locname)) global_locname[0] = '\0'; goto done; } - /* LC_ALL overrides everything else. */ - if (newgl[LC_ALL][0] != '\0') { - if (strlcpy(global_locname, newgl[LC_ALL], - sizeof(global_locname)) >= sizeof(global_locname)) - global_locname[0] = '\0'; - goto done; - } - /* * Check whether all categories agree and return either * the single common name for all categories or a string @@ -184,7 +178,7 @@ done: global_locale = newgl; if (category == LC_ALL || category == LC_CTYPE) _GlobalRuneLocale = - strchr(global_locname, '.') == NULL ? + strchr(newgl[LC_CTYPE], '.') == NULL ? &_DefaultRuneLocale : _Utf8RuneLocale; } else { freegl(newgl); Index: regress/lib/libc/locale/setlocale/setlocale.c =================================================================== RCS file: /cvs/src/regress/lib/libc/locale/setlocale/setlocale.c,v retrieving revision 1.3 diff -u -p -r1.3 setlocale.c --- regress/lib/libc/locale/setlocale/setlocale.c 25 Feb 2017 07:28:32 -0000 1.3 +++ regress/lib/libc/locale/setlocale/setlocale.c 28 Mar 2018 14:28:20 -0000 @@ -75,7 +75,6 @@ main(int argc, char *argv[]) /* load from env */ /* NOTE: we don't support non-C locales for some categories */ - /*test_setlocale("fr_FR.UTF-8", LC_ALL, "");*/ /* set */ test_setlocale("fr_FR.UTF-8", LC_CTYPE, ""); /* set */ test_setlocale("fr_FR.UTF-8", LC_MESSAGES, ""); /* set */ test_MB_CUR_MAX(4); @@ -113,6 +112,7 @@ main(int argc, char *argv[]) test_setlocale("C", LC_ALL, "C"); /* reset */ test_setlocale("invalid.UTF-8", LC_CTYPE, "invalid.UTF-8"); /* set */ test_setlocale("invalid.UTF-8", LC_CTYPE, NULL); + test_setlocale("C/invalid.UTF-8/C/C/C/C", LC_ALL, NULL); test_MB_CUR_MAX(4); /* with invalid codeset (is an error) */