[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #4 from paolo dot carlini at oracle dot com  2008-12-05 08:24 
---
Are you using the same glibc on x86_64 and ia64? The two failing testcases
(cons/7.cc and members/char/2.cc, the other are implied) are essentially the
same: something is different on that ia64 machine about the localedata having
to do with numeric decimal point and grouping. I'll try to further debug this
(because 38368 is a real issue and we want a fix) but since I can't reproduce
on my machines, I need to know the value of the various dp1, dp2, etc. in the
failing VERIFY (assert).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread jakub at gcc dot gnu dot org


--- Comment #5 from jakub at gcc dot gnu dot org  2008-12-05 08:36 ---
The only thing I remember was fr_FR locale changing some stuff and Fedora
backing out that change as it was done upstream too late in the cycle to get
feedback from the French community.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #6 from paolo dot carlini at oracle dot com  2008-12-05 08:47 
---
Yes, I'm not sure is the same issue. Anyway, the problem can only be in this
idea:

  _M_data-_M_thousands_sep = *(__nl_langinfo_l(THOUSANDS_SEP, 
__cloc));
  ...

  if (_M_data-_M_thousands_sep == '\0')
{
  _M_data-_M_thousands_sep = ',';

that is, we are trying to standardize on ',' (the same we have for the C
locale) in case the localedata is \0 for the thousands separator. Apparently
for some versions of glibc, it causes problems, I'm still trying to disentangle
the logic... Jakub, how does it sound to you?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #7 from paolo dot carlini at oracle dot com  2008-12-05 08:50 
---
By the way, yes the fr_FR locale is heavily used in those tests...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #8 from paolo dot carlini at oracle dot com  2008-12-05 08:55 
---
I suspect in the localedata of fr_FR, the thousands separator may have changed
in some glibc from ' ' (0x20) to '\0'. Jakub can you confirm that?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread jakub at gcc dot gnu dot org


--- Comment #9 from jakub at gcc dot gnu dot org  2008-12-05 09:10 ---
Forcing , thousands separator when none should be used is very weird.  Does
C++ standard mandate that behavior?   means thousands shouldn't be separated
by any separator.  In most cases such locales also have grouping 0;0 or -1, but
there
are buggy? locales, e.g. bg_BG, that specify empty thousands_sep, yet have
grouping 3;3.  For empty thousands_sep glibc just forces no grouping:
  if ((wide  thousands_sepwc == L'\0')
  || (! wide  *thousands_sep == '\0'))
grouping = NULL;

BTW, thousands_sep is a multibyte string, it can be multiple bytes (or none, as
discussed here).  Say ru_RU has:
U2002
which is 3 bytes:
U2002 /xe2/x80/x82 EN SPACE
and a bunch of locales are using U00A0, which is 2 bytes:
U00A0 /xc2/xa0 NO-BREAK SPACE
_NL_NUMERIC_THOUSANDS_SEP_WC is always just one wchar_t though.

thousands_sep is one of the things that changed in fr_FR this year, see
sourceware #6040.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread jakub at gcc dot gnu dot org


--- Comment #10 from jakub at gcc dot gnu dot org  2008-12-05 09:13 ---
To reply to #c8, it changed the other way around, from  to U0020 in April
2008.  But as I said, Fedora 9, which shipped glibc 2.9, was backing out these
changes (Fedora 10 has them already).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #11 from paolo dot carlini at oracle dot com  2008-12-05 09:16 
---
(In reply to comment #9)
 Forcing , thousands separator when none should be used is very weird.

Is the same C locale has. We only want consistency, see 38368.

  Does
 C++ standard mandate that behavior?   means thousands shouldn't be separated
 by any separator.  In most cases such locales also have grouping 0;0 or -1, 
 but
 there
 are buggy?

I suppose so, because we have a comment in the code (resulting from feedback me
and / or Benjamin got from glibc people clearly saying that '\0' implies no
grouping. We always worked under this hypothesis.


 locales, e.g. bg_BG, that specify empty thousands_sep, yet have
 grouping 3;3.  For empty thousands_sep glibc just forces no grouping:
   if ((wide  thousands_sepwc == L'\0')
   || (! wide  *thousands_sep == '\0'))
 grouping = NULL;

Ok...

 
 BTW, thousands_sep is a multibyte string, it can be multiple bytes

This is just a C++ standard issue, unfortunately...

Paolo.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #12 from paolo dot carlini at oracle dot com  2008-12-05 09:17 
---
(In reply to comment #10)
 To reply to #c8, it changed the other way around, from  to U0020 in 
 April
 2008.  But as I said, Fedora 9, which shipped glibc 2.9, was backing out these
 changes (Fedora 10 has them already).

Crazy. Anyway, that is the problem. I will change the tests to not rely on such
named locales.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread jakub at gcc dot gnu dot org


--- Comment #13 from jakub at gcc dot gnu dot org  2008-12-05 09:38 ---
I see you already disable grouping if it is empty, good.

If _M_thousands_sep must be a single _CharT, then for char I guess you should
transliterate it if the string is longer than one character.
Either by using glibc transliteration (more generic, but slower), or by
hardcoding
the few multibyte strings that are used in glibc locales ATM.
That's just U00A0 in current glibc (transliterate to ' ') and in some older
libcs was also that U2002 (also to ' ').  Maybe change everything longer than
one byte to ' ' ;).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #14 from paolo dot carlini at oracle dot com  2008-12-05 09:42 
---
(In reply to comment #13)
 If _M_thousands_sep must be a single _CharT, then for char I guess you 
 should
 transliterate it if the string is longer than one character.
 Either by using glibc transliteration (more generic, but slower), or by
 hardcoding
 the few multibyte strings that are used in glibc locales ATM.
 That's just U00A0 in current glibc (transliterate to ' ') and in some older
 libcs was also that U2002 (also to ' ').  Maybe change everything longer 
 than
 one byte to ' ' ;).

Ok, thanks. I think we have to think more about this issue, it's as old as v3
;) For 4.5 maybe...



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


-- 

paolo dot carlini at oracle dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |paolo dot carlini at oracle
   |dot org |dot com
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2008-12-05 09:43:10
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread jakub at gcc dot gnu dot org


--- Comment #15 from jakub at gcc dot gnu dot org  2008-12-05 09:48 ---
Well, if you take first byte from a multibyte sequence, then it is IMNSHO
something that should be solved for 4.4 too.  For say ru_RU.UTF-8 that means
you emit invalid UTF-8 if you separate digits with say '\xc2',
\x33\xc2\x33\x33\x33 is invalid UTF-8.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #16 from paolo dot carlini at oracle dot com  2008-12-05 09:51 
---
(In reply to comment #15)
 Well, if you take first byte from a multibyte sequence, then it is IMNSHO
 something that should be solved for 4.4 too.  For say ru_RU.UTF-8 that means
 you emit invalid UTF-8 if you separate digits with say '\xc2',
 \x33\xc2\x33\x33\x33 is invalid UTF-8.

Maybe, but, as I should learn from you ;) this is definitely not a regression,
the time is short and the issue is tricky. Therefore, I don't think I will
tackle it directly, unless you are willing to help, of course! 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #17 from paolo dot carlini at oracle dot com  2008-12-05 09:54 
---
Created an attachment (id=16831)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16831action=view)
Draft

Draft patch using is_IS instead of fr_FR.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #18 from paolo dot carlini at oracle dot com  2008-12-05 09:55 
---
HJ, can you test it and report? Thanks!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread tsyvarev at ispras dot ru


--- Comment #19 from tsyvarev at ispras dot ru  2008-12-05 11:08 ---
It seems that C++ standard contains contradiction about thousands separator in
C locale:

22.2.3.1, p1 says:

The instantiations required in Table 51 (22.1.1.1.1), namely numpunctwchar_t
and numpunctchar, provide classic “C” numeric formats, i.e. they contain
information equivalent to that contained in the “C” locale or their wide
character counterparts as if obtained by a call to widen.

also, 22.2.3.1.2 p.2 says:

char_type do_thousands_sep() const;

Returns: A character for use as the digit group separator. The required
instantiations return ’,’ or L’,’.

It appears, that according to C++ standard, thousands separator for C locale
is ','.

But according to the ISO standard of C(POSIX) locale (Section 7.3, Locale
Definition), thousands separator in this locale should be '\0', which means N/A
or not assigned.

Or is this reasoning wrong?


-- 

tsyvarev at ispras dot ru changed:

   What|Removed |Added

 CC||tsyvarev at ispras dot ru


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo at gcc dot gnu dot org


--- Comment #20 from paolo at gcc dot gnu dot org  2008-12-05 13:09 ---
Subject: Bug 38411

Author: paolo
Date: Fri Dec  5 13:07:53 2008
New Revision: 142472

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=142472
Log:
2008-12-05  Paolo Carlini  [EMAIL PROTECTED]

PR libstdc++/38411
* testsuite/22_locale/numpunct/members/char/2.cc: Use is_IS instead
of fr_FR.
* testsuite/22_locale/numpunct/members/wchar_t/2.cc: Likewise.
* testsuite/22_locale/locale/cons/7.cc: Likewise.


Modified:
trunk/libstdc++-v3/ChangeLog
trunk/libstdc++-v3/testsuite/22_locale/locale/cons/7.cc
trunk/libstdc++-v3/testsuite/22_locale/numpunct/members/char/2.cc
trunk/libstdc++-v3/testsuite/22_locale/numpunct/members/wchar_t/2.cc


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-05 Thread paolo dot carlini at oracle dot com


--- Comment #21 from paolo dot carlini at oracle dot com  2008-12-05 13:09 
---
Fixed.


-- 

paolo dot carlini at oracle dot com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-04 Thread hjl dot tools at gmail dot com


--- Comment #2 from hjl dot tools at gmail dot com  2008-12-05 05:25 ---
Revision 142439 is the cause.


-- 

hjl dot tools at gmail dot com changed:

   What|Removed |Added

Summary|[4.4 Regression]|[4.4 Regression] Revision
   |22_locale/locale/cons/7.cc  |142439 caused
   |execution test  |22_locale/locale/cons/7.cc
   ||execution test


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411



[Bug libstdc++/38411] [4.4 Regression] Revision 142439 caused 22_locale/locale/cons/7.cc execution test

2008-12-04 Thread hjl dot tools at gmail dot com


--- Comment #3 from hjl dot tools at gmail dot com  2008-12-05 05:34 ---
Jakub, there was a similar problem with locale on Linux before. Do you
remember it?


-- 

hjl dot tools at gmail dot com changed:

   What|Removed |Added

 CC||jakub at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38411