[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator

2008-01-16 Thread Martin Sebor (JIRA)

[ 
https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559638#action_12559638
 ] 

Martin Sebor commented on STDCXX-499:
-

I'm tempted to close this as Won't Fix since it looks like a rare bug in the 
locale definition file. On recent Linux systems there's just one locale that 
suffers from this problem: bg_BG. I couldn't find any such locales on HP-UX. We 
might want to look to see how many others besides fr_FR.ISO8859-1 there are on 
Tru64, and check other platforms to see if it's more pervasive than just one or 
two locales.

For future reference, here's an inefficient shell scrip I used to find other 
such locales:

for l in `locale -a`; do LC_NUMERIC=$l locale -ck LC_NUMERIC | grep 
thousands_sep=\\ /dev/null; if [ $? -eq 0 ]; then L=$L $l; fi; done  
for l in $L; do grp=`LC_NUMERIC=$l locale -ck LC_NUMERIC | grep grouping`; echo 
$l :  $grp; done

 std::num_put inserts NUL thousand separator
 ---

 Key: STDCXX-499
 URL: https://issues.apache.org/jira/browse/STDCXX-499
 Project: C++ Standard Library
  Issue Type: Bug
  Components: 22. Localization
Affects Versions: 4.1.2, 4.1.3, 4.1.4
Reporter: Martin Sebor
Assignee: Martin Sebor
 Fix For: 4.2.1


 Moved from Rogue Wave Bugzilla: 
 http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913
  Original Message 
 Subject: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 16:10:23 -0500
 From: Boris Gubenko [EMAIL PROTECTED]
 Reply-To: Boris Gubenko [EMAIL PROTECTED]
 Organization: Hewlett-Packard Co.
 To: Martin Sebor [EMAIL PROTECTED]
   Another locale-related issue that we fixed in rw stdlib v3.0 (and in
   v2.0 also) is making sure, that num_put does not insert null thousand
   separator character into the stream. Here is the fix in _num_put.cc
   in v3.0 :
 template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT
 */
 _TYPENAME num_put_CharT, _OutputIter::iter_type
 num_put_CharT, _OutputIter::
 _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type,
 const void *__pval) const
 {
 const numpunctchar_type __np =
 _V3_USE_FACET (numpunctchar_type, __flags.getloc ());
 // FIXME: adjust buffer dynamically as necessary
 char __buf [_RWSTD_DBL_MAX_10_EXP];
 char *__pbuf = __buf;
 const string __grouping = __np.grouping ();
 const char *__grp   = __grouping.c_str ();
 const int __prec= __flags.precision ();
 #if defined(__VMS)  defined(__DECCXX)  !defined(__DECFIXCXXL1730)
 const char __nogrouping = _RWSTD_CHAR_MAX;
 if (!__np.thousands_sep())
 __grp = __nogrouping;
 #endif
   Here is the test:
 cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1
 cosf.zko.dec.com locale -k thousands_sep
 thousands_sep=
 cosf.zko.dec.com cxx x.cxx  a.out
 null character thousand_sep was not inserted
 cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \
-I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \
-nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \
 a.out
 null character thousand_sep was inserted
 cosf.zko.dec.com
 x.cxx
 -
 #ifndef __USE_STD_IOSTREAM
 #define __USE_STD_IOSTREAM
 #endif
 #include iostream
 #include sstream
 #include string
 #include locale
 #include locale.h
 #ifdef __linux
 #define FRENCH_LOCALE fr_FR
 #else
 #define FRENCH_LOCALE fr_FR.ISO8859-1
 #endif
 using namespace std;
 int main()
 {
   ostringstream os;
   if (setlocale(LC_ALL,FRENCH_LOCALE))
   {
 setlocale(LC_ALL,C);
 os.imbue(locale(FRENCH_LOCALE));
 os  (double) 1.1  endl;
 if ( (os.str())[2] == '\0' )
   cout  null character thousand_sep was inserted  endl;
 else
   cout  null character thousand_sep was not inserted  endl;
   }
   return 0;
 }
 --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 
  Original Message 
 Subject: Re: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 15:50:06 -0700
 From: Martin Sebor [EMAIL PROTECTED]
 To: Boris Gubenko [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 Boris Gubenko wrote:
Another locale-related issue that we fixed in rw stdlib v3.0 (and in
v2.0 also) is making sure, that num_put does not insert null thousand
separator character into the stream. Here is the fix in _num_put.cc
in v3.0 :
 I don't think this fix would be quite correct in general. NUL is
 a valid character that the locale library was specifically designed
 to be able to insert and extract just like any other. In addition,
 in the code below, operator==() need not be defined for the character
 type.
  
 ...
Here is the test:
 Thanks for the helpful test case.
 My feeling is that this case points out a fundamental design
 disconnect between 

[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator

2008-01-10 Thread Martin Sebor (JIRA)

[ 
https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557810#action_12557810
 ] 

Martin Sebor commented on STDCXX-499:
-

Here's a test case that reproduces the same behavior with the bg_BG locale on 
Linux:

$ cat t.cpp  make t  cat /etc/redhat-release  ./t | od -c
#include cassert
#include iostream
#include locale
#include sstream
#include string

int main ()
{
std::stringstream strm;

strm.imbue (std::locale (bg_BG));

strm  123456;

const std::string s = strm.str ();

std::cout  s  '\n';

assert (s.npos == s.find ('\0'));
}
gcc -c -I/amd/devco/sebor/stdcxx/include/ansi -D_RWSTDDEBUG   -pthread 
-I/amd/devco/sebor/stdcxx/include -I/build/sebor/stdcxx-gcc-3.4.6_3-15D/include 
-I/amd/devco/sebor/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall 
-Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   
t.cpp
gcc t.o -o t -pthread  -L/build/sebor/stdcxx-gcc-3.4.6_3-15D/lib  
-Wl,-R/build/sebor/stdcxx-gcc-3.4.6_3-15D/lib -lstd15D -lsupc++ -lm 
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
t: t.cpp:19: int main(): Assertion `s.npos == s.find ('\0')' failed.
000   1   2   3  \0   4   5   6  \n
010


 std::num_put inserts NUL thousand separator
 ---

 Key: STDCXX-499
 URL: https://issues.apache.org/jira/browse/STDCXX-499
 Project: C++ Standard Library
  Issue Type: Bug
  Components: 22. Localization
Affects Versions: 4.1.2, 4.1.3, 4.1.4
Reporter: Martin Sebor
 Fix For: 4.2.1


 Moved from Rogue Wave Bugzilla: 
 http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913
  Original Message 
 Subject: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 16:10:23 -0500
 From: Boris Gubenko [EMAIL PROTECTED]
 Reply-To: Boris Gubenko [EMAIL PROTECTED]
 Organization: Hewlett-Packard Co.
 To: Martin Sebor [EMAIL PROTECTED]
   Another locale-related issue that we fixed in rw stdlib v3.0 (and in
   v2.0 also) is making sure, that num_put does not insert null thousand
   separator character into the stream. Here is the fix in _num_put.cc
   in v3.0 :
 template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT
 */
 _TYPENAME num_put_CharT, _OutputIter::iter_type
 num_put_CharT, _OutputIter::
 _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type,
 const void *__pval) const
 {
 const numpunctchar_type __np =
 _V3_USE_FACET (numpunctchar_type, __flags.getloc ());
 // FIXME: adjust buffer dynamically as necessary
 char __buf [_RWSTD_DBL_MAX_10_EXP];
 char *__pbuf = __buf;
 const string __grouping = __np.grouping ();
 const char *__grp   = __grouping.c_str ();
 const int __prec= __flags.precision ();
 #if defined(__VMS)  defined(__DECCXX)  !defined(__DECFIXCXXL1730)
 const char __nogrouping = _RWSTD_CHAR_MAX;
 if (!__np.thousands_sep())
 __grp = __nogrouping;
 #endif
   Here is the test:
 cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1
 cosf.zko.dec.com locale -k thousands_sep
 thousands_sep=
 cosf.zko.dec.com cxx x.cxx  a.out
 null character thousand_sep was not inserted
 cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \
-I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \
-nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \
 a.out
 null character thousand_sep was inserted
 cosf.zko.dec.com
 x.cxx
 -
 #ifndef __USE_STD_IOSTREAM
 #define __USE_STD_IOSTREAM
 #endif
 #include iostream
 #include sstream
 #include string
 #include locale
 #include locale.h
 #ifdef __linux
 #define FRENCH_LOCALE fr_FR
 #else
 #define FRENCH_LOCALE fr_FR.ISO8859-1
 #endif
 using namespace std;
 int main()
 {
   ostringstream os;
   if (setlocale(LC_ALL,FRENCH_LOCALE))
   {
 setlocale(LC_ALL,C);
 os.imbue(locale(FRENCH_LOCALE));
 os  (double) 1.1  endl;
 if ( (os.str())[2] == '\0' )
   cout  null character thousand_sep was inserted  endl;
 else
   cout  null character thousand_sep was not inserted  endl;
   }
   return 0;
 }
 --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 
  Original Message 
 Subject: Re: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 15:50:06 -0700
 From: Martin Sebor [EMAIL PROTECTED]
 To: Boris Gubenko [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 Boris Gubenko wrote:
Another locale-related issue that we fixed in rw stdlib v3.0 (and in
v2.0 also) is making sure, that num_put does not insert null thousand
separator character into the stream. Here is the fix in _num_put.cc
in v3.0 :
 I don't think this fix would be quite correct in general. NUL is
 a valid character that the locale library was specifically designed
 to be 

[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator

2008-01-10 Thread Martin Sebor (JIRA)

[ 
https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557859#action_12557859
 ] 

Martin Sebor commented on STDCXX-499:
-

The question is: is this our problem or one with the locale definition (such as 
the Bulgarian locale on Linux in the test case above). I.e., is it a valid 
locale that specifies a grouping but no thousands_sep?

Among our own locales there is only one that fits this description suggesting 
it might be a bug in the locale definition:

$ (cd ~/stdcxx  for f in `grep -l ^grouping  *[1-9] etc/nls/src/*`; do grep 
-l thousands_sep  *\\ $f; done)
etc/nls/src/bg_BG

The latest glibc bg_BG definition is the same:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/bg_BG?rev=1.7.2.2content-type=text/x-cvsweb-markupcvsroot=glibc

I opened a glibc issue to see if they agree it's a bug:
 http://sources.redhat.com/bugzilla/show_bug.cgi?id=5599

If we should decide to work around it I see two possible ways of handling it in 
punct.cpp, after retrieving the grouping and thousands_sep for the locale using 
localeconv():

When grouping is not empty and valid and thsousands_sep is NUL, either
a) set grouping to , or
b) set thousands_sep to some non-NUL value.

Solution a) seems safer because it doesn't involve inventing a thousands_sep 
that's valid for the locale but the downside is that it loses potentially 
useful information.

Solution b) leaves open the question of which thousands_sep is appropriate for 
the locale.

 std::num_put inserts NUL thousand separator
 ---

 Key: STDCXX-499
 URL: https://issues.apache.org/jira/browse/STDCXX-499
 Project: C++ Standard Library
  Issue Type: Bug
  Components: 22. Localization
Affects Versions: 4.1.2, 4.1.3, 4.1.4
Reporter: Martin Sebor
Assignee: Martin Sebor
 Fix For: 4.2.1


 Moved from Rogue Wave Bugzilla: 
 http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913
  Original Message 
 Subject: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 16:10:23 -0500
 From: Boris Gubenko [EMAIL PROTECTED]
 Reply-To: Boris Gubenko [EMAIL PROTECTED]
 Organization: Hewlett-Packard Co.
 To: Martin Sebor [EMAIL PROTECTED]
   Another locale-related issue that we fixed in rw stdlib v3.0 (and in
   v2.0 also) is making sure, that num_put does not insert null thousand
   separator character into the stream. Here is the fix in _num_put.cc
   in v3.0 :
 template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT
 */
 _TYPENAME num_put_CharT, _OutputIter::iter_type
 num_put_CharT, _OutputIter::
 _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type,
 const void *__pval) const
 {
 const numpunctchar_type __np =
 _V3_USE_FACET (numpunctchar_type, __flags.getloc ());
 // FIXME: adjust buffer dynamically as necessary
 char __buf [_RWSTD_DBL_MAX_10_EXP];
 char *__pbuf = __buf;
 const string __grouping = __np.grouping ();
 const char *__grp   = __grouping.c_str ();
 const int __prec= __flags.precision ();
 #if defined(__VMS)  defined(__DECCXX)  !defined(__DECFIXCXXL1730)
 const char __nogrouping = _RWSTD_CHAR_MAX;
 if (!__np.thousands_sep())
 __grp = __nogrouping;
 #endif
   Here is the test:
 cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1
 cosf.zko.dec.com locale -k thousands_sep
 thousands_sep=
 cosf.zko.dec.com cxx x.cxx  a.out
 null character thousand_sep was not inserted
 cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \
-I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \
-nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \
 a.out
 null character thousand_sep was inserted
 cosf.zko.dec.com
 x.cxx
 -
 #ifndef __USE_STD_IOSTREAM
 #define __USE_STD_IOSTREAM
 #endif
 #include iostream
 #include sstream
 #include string
 #include locale
 #include locale.h
 #ifdef __linux
 #define FRENCH_LOCALE fr_FR
 #else
 #define FRENCH_LOCALE fr_FR.ISO8859-1
 #endif
 using namespace std;
 int main()
 {
   ostringstream os;
   if (setlocale(LC_ALL,FRENCH_LOCALE))
   {
 setlocale(LC_ALL,C);
 os.imbue(locale(FRENCH_LOCALE));
 os  (double) 1.1  endl;
 if ( (os.str())[2] == '\0' )
   cout  null character thousand_sep was inserted  endl;
 else
   cout  null character thousand_sep was not inserted  endl;
   }
   return 0;
 }
 --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 
  Original Message 
 Subject: Re: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 15:50:06 -0700
 From: Martin Sebor [EMAIL PROTECTED]
 To: Boris Gubenko [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 Boris Gubenko wrote:
Another locale-related issue that we fixed in rw stdlib 

[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator

2008-01-10 Thread Martin Sebor (JIRA)

[ 
https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557865#action_12557865
 ] 

Martin Sebor commented on STDCXX-499:
-

It looks like GNU libstdc++ implements solution a) above. I opened an issue for 
the mismatch between the libc grouping value and what libstdc++ returns: 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34733

 std::num_put inserts NUL thousand separator
 ---

 Key: STDCXX-499
 URL: https://issues.apache.org/jira/browse/STDCXX-499
 Project: C++ Standard Library
  Issue Type: Bug
  Components: 22. Localization
Affects Versions: 4.1.2, 4.1.3, 4.1.4
Reporter: Martin Sebor
Assignee: Martin Sebor
 Fix For: 4.2.1


 Moved from Rogue Wave Bugzilla: 
 http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913
  Original Message 
 Subject: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 16:10:23 -0500
 From: Boris Gubenko [EMAIL PROTECTED]
 Reply-To: Boris Gubenko [EMAIL PROTECTED]
 Organization: Hewlett-Packard Co.
 To: Martin Sebor [EMAIL PROTECTED]
   Another locale-related issue that we fixed in rw stdlib v3.0 (and in
   v2.0 also) is making sure, that num_put does not insert null thousand
   separator character into the stream. Here is the fix in _num_put.cc
   in v3.0 :
 template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT
 */
 _TYPENAME num_put_CharT, _OutputIter::iter_type
 num_put_CharT, _OutputIter::
 _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type,
 const void *__pval) const
 {
 const numpunctchar_type __np =
 _V3_USE_FACET (numpunctchar_type, __flags.getloc ());
 // FIXME: adjust buffer dynamically as necessary
 char __buf [_RWSTD_DBL_MAX_10_EXP];
 char *__pbuf = __buf;
 const string __grouping = __np.grouping ();
 const char *__grp   = __grouping.c_str ();
 const int __prec= __flags.precision ();
 #if defined(__VMS)  defined(__DECCXX)  !defined(__DECFIXCXXL1730)
 const char __nogrouping = _RWSTD_CHAR_MAX;
 if (!__np.thousands_sep())
 __grp = __nogrouping;
 #endif
   Here is the test:
 cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1
 cosf.zko.dec.com locale -k thousands_sep
 thousands_sep=
 cosf.zko.dec.com cxx x.cxx  a.out
 null character thousand_sep was not inserted
 cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \
-I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \
-nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \
 a.out
 null character thousand_sep was inserted
 cosf.zko.dec.com
 x.cxx
 -
 #ifndef __USE_STD_IOSTREAM
 #define __USE_STD_IOSTREAM
 #endif
 #include iostream
 #include sstream
 #include string
 #include locale
 #include locale.h
 #ifdef __linux
 #define FRENCH_LOCALE fr_FR
 #else
 #define FRENCH_LOCALE fr_FR.ISO8859-1
 #endif
 using namespace std;
 int main()
 {
   ostringstream os;
   if (setlocale(LC_ALL,FRENCH_LOCALE))
   {
 setlocale(LC_ALL,C);
 os.imbue(locale(FRENCH_LOCALE));
 os  (double) 1.1  endl;
 if ( (os.str())[2] == '\0' )
   cout  null character thousand_sep was inserted  endl;
 else
   cout  null character thousand_sep was not inserted  endl;
   }
   return 0;
 }
 --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 
  Original Message 
 Subject: Re: num_put and null-character thousand separator
 Date: Tue, 11 Jan 2005 15:50:06 -0700
 From: Martin Sebor [EMAIL PROTECTED]
 To: Boris Gubenko [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 Boris Gubenko wrote:
Another locale-related issue that we fixed in rw stdlib v3.0 (and in
v2.0 also) is making sure, that num_put does not insert null thousand
separator character into the stream. Here is the fix in _num_put.cc
in v3.0 :
 I don't think this fix would be quite correct in general. NUL is
 a valid character that the locale library was specifically designed
 to be able to insert and extract just like any other. In addition,
 in the code below, operator==() need not be defined for the character
 type.
  
 ...
Here is the test:
 Thanks for the helpful test case.
 My feeling is that this case points out a fundamental design
 disconnect between the C and C++ locales. In C, NUL is not
 an ordinary character -- it's a special character that terminates
 strings. In addition, C formatted I/O is done in multibyte
 characters. In contrast, in C++, NUL is a character like any other
 and formatted I/O is always done in single chars (or wchar_t when
 char is not wide enough), but never in multibyte characters.
 In C, the thousand separator is a multibyte string so even if
 grouping is non-empty, inserting an empty string will be as good
 as inserting none at all. In C++ the