In perl.git, the branch smoke-me/khw-testlocale has been created
<http://perl5.git.perl.org/perl.git/commitdiff/1c3d31891c814370cc89f39700b3c3f2a23dc410?hp=0000000000000000000000000000000000000000>
at 1c3d31891c814370cc89f39700b3c3f2a23dc410 (commit)
- Log -----------------------------------------------------------------
commit 1c3d31891c814370cc89f39700b3c3f2a23dc410
Author: Karl Williamson <[email protected]>
Date: Fri Jul 5 07:50:22 2013 -0600
win32 smoke
M locale.c
commit e8080ae651deddca1e8f6ac536b619bf0a5637b0
Author: Karl Williamson <[email protected]>
Date: Wed Jun 19 21:00:53 2013 -0600
PATCH: [perl #112208]: Set utf8 flag on $! appropriately
This patch sets the utf8 flag on $! if the error string passes utf8
validity tests and has some bytes with the upper bit set. (If none
have that bit set, is an ASCII string, and whether or not it is UTF-8 is
irrelevant.) This is a heuristic that could fail, but as the reference
in the comments points out this is unlikely.
One can reasonably assume that a UTF-8 locale will return a UTF-8
result. So another approach would be to look at that (but we wouldn't
want to turn the flag on for a purely ASCII string anyway, as that could
change the semantics from existing behavior by making the string follow
Unicode rules, whereas it didn't necessarily before.) To do this, we
could keep track of the utf8ness of the LC_MESSAGES locale. But until
the heuristic in this patch is shown to not be good enough, I don't see
the need to do this extra work.
M lib/locale.t
M mg.c
commit 8fd37f337b1e771891c6e688cae6e391fdf8ca12
Author: Karl Williamson <[email protected]>
Date: Tue Jul 2 10:49:04 2013 -0600
locale.c: Further checks for utf8ness of a locale
In reality, the return value of setlocale() is documented to be opaque,
so using it to determine if a locale is UTF-8 or not may not work. It
is a char*, which we treat as a name. We can safely assume that if the
name contains UTF-8 (or slight variations thereof), that it is a UTF-8
locale. But if the name doesn't contain that, it still could be one.
In fact there are currently many locales on our dromedary machine that
fall into this category. Similarly, something containing 8859 is not
going to be UTF-8.
This commit adds another test for cases where there is no nl_langinfo(),
and the locale name isn't helpful. It looks at the currency symbol,
which typically will be in the locale's script. If that is illegal
UTF-8, we know for sure that the locale isn't UTF-8 (or is corrupted).
If it is legal UTF-8 (but not ASCII) we can be pretty sure that the
locale is UTF-8. If it is ASCII, we still don't know one way or the
other, so we err on it not being UTF-8.
Originally, I was going to use the locale's error message strings,
returned from strerror(), the source for $!, to check for this.
These are supposed to be in terms of the LC_MESSAGES locale. Chances
are vanishingly small that the locale is not UTF-8 if all the messages
pass a utf8ness test, provided that the messages aren't just ASCII.
However, on dromedary, the messages for many of the exotic locales
haven't been translated, and are still in English, which doesn't help at
all. I figure that this is quite likely to be the case generally, and
the currency symbol is much more likely to have been translated.
I left the code in though, commented out for possible future use.
Note that this test will run only on systems that don't have
nl_langinfo(). The test can also be turned off by setting a C compiler
flag -DNO_LOCALE_MONETARY, (and -DNO_LOCALE_MESSAGES for the
commented-out part), corresponding to the way the other categories can
be turned off (none of which is documented).
M locale.c
M perl.h
M pod/perllocale.pod
commit 70f5d0bfe62f7a8c8d8be355d6974d9383158430
Author: Karl Williamson <[email protected]>
Date: Mon Jun 24 17:21:49 2013 -0600
locale.c: Extract out, fix, expand fcn to see if a locale is utf8
There was buggy code to see if the start-up locale is UTF-8. This
commit extracts it into a separate function.
The bugs involved looking at the name of the locale to see if that
implies a UTF-8 name. Prior to this commit, it looked at the
beginning of the locale name, whereas in reality, it is at the end, as
in "fr_FR.UTF8".
Also, it didn't look for the documented Windows name for UTF-8 locales
on those platforms.
The function is expanded to have an input category to find the utf8ness
of. Thus it now works on any non-LC_ALL category, not just LC_CTYPE.
It is possible for categories to be in different locales, so that
LC_CTYPE is in a UTF-8 locale, and LC_NUMERIC isn't. For the purposes
of PERL_UNICODE, the most applicable category is LC_CTYPE, so that is
the one used in its currently only call.
M embed.fnc
M embed.h
M locale.c
M proto.h
commit 81980bc9224b5b25773b8338647259cc7cc4254c
Author: Karl Williamson <[email protected]>
Date: Thu Jun 27 14:25:43 2013 -0600
locale.c: Compare apples to apples
Prior to this patch, one parameter to strNE would have been through a
standardizing function, while the other had not. By standardizing both
before doing the compare, we avoid false positives.
M locale.c
commit 1ad6ebfc6b6338c9edb24194020afe0fc1b28284
Author: Karl Williamson <[email protected]>
Date: Thu Jun 27 14:01:01 2013 -0600
perl.h, locale.c: White space only
This indents some nested #if's to clarify the program structure.
M locale.c
M perl.h
commit 89e06b10aa91fac8e223d1a9873c43cc039f3dd4
Author: Karl Williamson <[email protected]>
Date: Thu Jun 27 13:57:53 2013 -0600
locale.c: Add assert()
This function is static and currently isn't called with LC_ALL, nor
would it work properly if it were called with that parameter, so assert
against it
M locale.c
commit aa2f6ea1edacc932c67279defdd546fe140a9c51
Author: Karl Williamson <[email protected]>
Date: Thu Jun 27 13:57:19 2013 -0600
locale.c: Add comments
M locale.c
commit 818695da755e03e2ea5be0c5b03bf2dc88b20e23
Author: Karl Williamson <[email protected]>
Date: Wed Jun 19 13:12:49 2013 -0600
sv.h: Add comment
M sv.h
-----------------------------------------------------------------------
--
Perl5 Master Repository