The tests are bogus as they are testing undefined behavior.  The characters in 
question are not valid in the C locale and there is no such thing as UTF8 in 
that locale. I haven't looked at the encoding but my guess is that these are 
encoded in multibyte characters. 

Technically we ought not do any conversions of case for characters that are not 
part of the repertoire of the locales language but people bitched when we had 
that strict policy and so now we try to do case transforms and recognition even 
for characters that make no sense in the current language. 

This was a case of user convenience trumping standards.  IMO it's insane to 
expect any meaningful interpretation of a Cyrillic character when operating in 
a Western European locale.  This can be said for many national character sets. 

I'd actually say the same for English letters when in a non Latin character 
based locale except there are strong historical precedents for special 
treatment of English letters (ASCII or ISO646), such that it would be difficult 
or impossible to use a wide variety of UNIX tools and programming languages 
without support for English character handling. 

Sent from my iPhone

> On Jul 22, 2015, at 2:45 AM, Alexander Pyhalov <[email protected]> wrote:
> 
> Hello.
> I've started looking at updating OI Hipster Perl  version (we currently ship 
> 5.16.3) to 5.22.
> 
> Configure options are  (for both perl versions)
> 
>                        -de \
>                        -Dmksymlinks \
>                        -Ulocincpth= \
>                        -Uloclibpth= \
>                        -Dbin=/usr/perl5/$(PERL_VERSION)/bin \
>                        -Dcc="$(CC) $(CC_BITS)" \
>                        -Dcf_email="[email protected]" \
>                        -Dcf_by="perl-bugs" \
>                        -Dlibperl=libperl.so \
>                        -Dmyhostname="localhost" \
>                        -Dprefix=/usr/perl5/$(PERL_VERSION) \
>                        -Dprivlib=/usr/perl5/$(PERL_VERSION)/lib \
>                        -Dsitelib=/usr/perl5/site_perl/$(PERL_VERSION) \
>                        -Dsiteprefix=/usr/perl5/$(PERL_VERSION) \
> -Dvendorlib=/usr/perl5/vendor_perl/$(PERL_VERSION) \
>                        -Dvendorprefix=/usr/perl5/$(PERL_VERSION) \
>                        -Duse64bitint \
>                        -Duseshrplib \
>                        -Dusedtrace \
>                        -Uuselargefiles
> 
> But several tests fail on gmake check:
> 
> Failed 5 tests out of 2222, 99.77% okay.
>        ../ext/re/t/re_funcs_u.t
>        ../lib/locale.t
>        re/subst.t
>        re/substT.t
>        re/subst_wamp.t
> 
> For Perl 5.16 only  first two checks fail:
> Failed 2 tests out of 2188, 99.91% okay.
>        ../ext/re/t/re_funcs_u.t
>        ../lib/locale.t
> 
> Verbose output:
> 
> 1) from ext/re/t/re_funcs_u.t:
> 
> not ok 32
> # Failed test 32 - at ext/re/t/re_funcs_u.t line 125
> 
> this is the following test:
> 
> require POSIX;
> my $text = chr utf8::unicode_to_native(0xE4);
> my $current_locale = POSIX::setlocale( &POSIX::LC_CTYPE, 'C' );
> my $check;
> check = $text =~ /(?l)\w/;
> ok( !$check );
> 
> 2) from lib/locale.t
> 
> $ env  LD_LIBRARY_PATH=. ./perl -I. -wT -MTestInit  lib/locale.t  |grep 'not 
> ok'
> 
> not ok 437 uc("à") in C locale (use locale; not encoded in utf8) should be 
> "à", got "À"
> not ok 439 uc("à") in C locale (use locale; encoded in utf8) should be "à", 
> got "À"
> not ok 441 uc("ÿ") in C locale (use locale; not encoded in utf8) should be 
> "ÿ", got "x"
> not ok 443 uc("ÿ") in C locale (use locale; encoded in utf8) should be "ÿ", 
> got "x"
> not ok 493 ucfirst("à") in C locale (use locale; not encoded in utf8) should 
> be "à", got "À"
> not ok 495 ucfirst("à") in C locale (use locale; encoded in utf8) should be 
> "à", got "À"
> not ok 497 ucfirst("ÿ") in C locale (use locale; not encoded in utf8) should 
> be "ÿ", got "x"
> not ok 499 ucfirst("ÿ") in C locale (use locale; encoded in utf8) should be 
> "ÿ", got "x"
> not ok 549 lc("À") in C locale (use locale; not encoded in utf8) should be 
> "À", got "à"
> not ok 551 lc("À") in C locale (use locale; encoded in utf8) should be "À", 
> got "à"
> not ok 589 lcfirst("À") in C locale (use locale; not encoded in utf8) should 
> be "À", got "à"
> not ok 591 lcfirst("À") in C locale (use locale; encoded in utf8) should be 
> "À", got "à"
> not ok 629 fc("À") in C locale (use locale; not encoded in utf8) should be 
> "À", got "à"
> not ok 631 fc("À") in C locale (use locale; encoded in utf8) should be "À", 
> got "à"
> 
> 3) from t/re/subst.t
> 4)  re/substT.t
> 5)  re/subst_wamp.t
> (the same test fails, this is new one, but it seems it fails also on old 
> perl):
> 
> # Failed test 256 - \b matches Latin1 before string, mid, and end, /l at 
> t/re/subst.t line 1046
> #      got "!\x{e1}!.!\x{e8}!"
> # expected "\x{e1}.\x{e8}"
> not ok 256 - \b matches Latin1 before string, mid, and end, /l
> 
> The following test fails:
> 
> require POSIX;
> POSIX->import("locale_h");
> setlocale(&POSIX::LC_ALL, "C");
> use locale;
> # use Test::More;
> my $a_acute = chr utf8::unicode_to_native(0xE1); # LATIN SMALL LETTER A WITH 
> ACUTE
> my $egrave = chr utf8::unicode_to_native(0xE8);  # LATIN SMALL LETTER E WITH 
> GRAVE
> is("$a_acute.$egrave" =~ s/\b/!/gr, "$a_acute.$egrave", '\\b matches Latin1 
> before string, mid, and end, /l');
> 
> Do you have any thoughts on this? Is error somewhere in our locales?
> 
> --
> Best regards,
> Alexander Pyhalov,
> system administrator of Southern Federal University IT department
> 


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to