The tests are bogus as they are testing undefined behavior. The characters in question are not valid in the C locale and there is no such thing as UTF8 in that locale. I haven't looked at the encoding but my guess is that these are encoded in multibyte characters.
Technically we ought not do any conversions of case for characters that are not part of the repertoire of the locales language but people bitched when we had that strict policy and so now we try to do case transforms and recognition even for characters that make no sense in the current language. This was a case of user convenience trumping standards. IMO it's insane to expect any meaningful interpretation of a Cyrillic character when operating in a Western European locale. This can be said for many national character sets. I'd actually say the same for English letters when in a non Latin character based locale except there are strong historical precedents for special treatment of English letters (ASCII or ISO646), such that it would be difficult or impossible to use a wide variety of UNIX tools and programming languages without support for English character handling. Sent from my iPhone > On Jul 22, 2015, at 2:45 AM, Alexander Pyhalov <[email protected]> wrote: > > Hello. > I've started looking at updating OI Hipster Perl version (we currently ship > 5.16.3) to 5.22. > > Configure options are (for both perl versions) > > -de \ > -Dmksymlinks \ > -Ulocincpth= \ > -Uloclibpth= \ > -Dbin=/usr/perl5/$(PERL_VERSION)/bin \ > -Dcc="$(CC) $(CC_BITS)" \ > -Dcf_email="[email protected]" \ > -Dcf_by="perl-bugs" \ > -Dlibperl=libperl.so \ > -Dmyhostname="localhost" \ > -Dprefix=/usr/perl5/$(PERL_VERSION) \ > -Dprivlib=/usr/perl5/$(PERL_VERSION)/lib \ > -Dsitelib=/usr/perl5/site_perl/$(PERL_VERSION) \ > -Dsiteprefix=/usr/perl5/$(PERL_VERSION) \ > -Dvendorlib=/usr/perl5/vendor_perl/$(PERL_VERSION) \ > -Dvendorprefix=/usr/perl5/$(PERL_VERSION) \ > -Duse64bitint \ > -Duseshrplib \ > -Dusedtrace \ > -Uuselargefiles > > But several tests fail on gmake check: > > Failed 5 tests out of 2222, 99.77% okay. > ../ext/re/t/re_funcs_u.t > ../lib/locale.t > re/subst.t > re/substT.t > re/subst_wamp.t > > For Perl 5.16 only first two checks fail: > Failed 2 tests out of 2188, 99.91% okay. > ../ext/re/t/re_funcs_u.t > ../lib/locale.t > > Verbose output: > > 1) from ext/re/t/re_funcs_u.t: > > not ok 32 > # Failed test 32 - at ext/re/t/re_funcs_u.t line 125 > > this is the following test: > > require POSIX; > my $text = chr utf8::unicode_to_native(0xE4); > my $current_locale = POSIX::setlocale( &POSIX::LC_CTYPE, 'C' ); > my $check; > check = $text =~ /(?l)\w/; > ok( !$check ); > > 2) from lib/locale.t > > $ env LD_LIBRARY_PATH=. ./perl -I. -wT -MTestInit lib/locale.t |grep 'not > ok' > > not ok 437 uc("à") in C locale (use locale; not encoded in utf8) should be > "à", got "À" > not ok 439 uc("à") in C locale (use locale; encoded in utf8) should be "à", > got "À" > not ok 441 uc("ÿ") in C locale (use locale; not encoded in utf8) should be > "ÿ", got "x" > not ok 443 uc("ÿ") in C locale (use locale; encoded in utf8) should be "ÿ", > got "x" > not ok 493 ucfirst("à") in C locale (use locale; not encoded in utf8) should > be "à", got "À" > not ok 495 ucfirst("à") in C locale (use locale; encoded in utf8) should be > "à", got "À" > not ok 497 ucfirst("ÿ") in C locale (use locale; not encoded in utf8) should > be "ÿ", got "x" > not ok 499 ucfirst("ÿ") in C locale (use locale; encoded in utf8) should be > "ÿ", got "x" > not ok 549 lc("À") in C locale (use locale; not encoded in utf8) should be > "À", got "à" > not ok 551 lc("À") in C locale (use locale; encoded in utf8) should be "À", > got "à" > not ok 589 lcfirst("À") in C locale (use locale; not encoded in utf8) should > be "À", got "à" > not ok 591 lcfirst("À") in C locale (use locale; encoded in utf8) should be > "À", got "à" > not ok 629 fc("À") in C locale (use locale; not encoded in utf8) should be > "À", got "à" > not ok 631 fc("À") in C locale (use locale; encoded in utf8) should be "À", > got "à" > > 3) from t/re/subst.t > 4) re/substT.t > 5) re/subst_wamp.t > (the same test fails, this is new one, but it seems it fails also on old > perl): > > # Failed test 256 - \b matches Latin1 before string, mid, and end, /l at > t/re/subst.t line 1046 > # got "!\x{e1}!.!\x{e8}!" > # expected "\x{e1}.\x{e8}" > not ok 256 - \b matches Latin1 before string, mid, and end, /l > > The following test fails: > > require POSIX; > POSIX->import("locale_h"); > setlocale(&POSIX::LC_ALL, "C"); > use locale; > # use Test::More; > my $a_acute = chr utf8::unicode_to_native(0xE1); # LATIN SMALL LETTER A WITH > ACUTE > my $egrave = chr utf8::unicode_to_native(0xE8); # LATIN SMALL LETTER E WITH > GRAVE > is("$a_acute.$egrave" =~ s/\b/!/gr, "$a_acute.$egrave", '\\b matches Latin1 > before string, mid, and end, /l'); > > Do you have any thoughts on this? Is error somewhere in our locales? > > -- > Best regards, > Alexander Pyhalov, > system administrator of Southern Federal University IT department > ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
