[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator
[ https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559638#action_12559638 ] Martin Sebor commented on STDCXX-499: - I'm tempted to close this as Won't Fix since it looks like a rare bug in the locale definition file. On recent Linux systems there's just one locale that suffers from this problem: bg_BG. I couldn't find any such locales on HP-UX. We might want to look to see how many others besides fr_FR.ISO8859-1 there are on Tru64, and check other platforms to see if it's more pervasive than just one or two locales. For future reference, here's an inefficient shell scrip I used to find other such locales: for l in `locale -a`; do LC_NUMERIC=$l locale -ck LC_NUMERIC | grep thousands_sep=\\ /dev/null; if [ $? -eq 0 ]; then L=$L $l; fi; done for l in $L; do grp=`LC_NUMERIC=$l locale -ck LC_NUMERIC | grep grouping`; echo $l : $grp; done std::num_put inserts NUL thousand separator --- Key: STDCXX-499 URL: https://issues.apache.org/jira/browse/STDCXX-499 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.1.2, 4.1.3, 4.1.4 Reporter: Martin Sebor Assignee: Martin Sebor Fix For: 4.2.1 Moved from Rogue Wave Bugzilla: http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913 Original Message Subject: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 16:10:23 -0500 From: Boris Gubenko [EMAIL PROTECTED] Reply-To: Boris Gubenko [EMAIL PROTECTED] Organization: Hewlett-Packard Co. To: Martin Sebor [EMAIL PROTECTED] Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT */ _TYPENAME num_put_CharT, _OutputIter::iter_type num_put_CharT, _OutputIter:: _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type, const void *__pval) const { const numpunctchar_type __np = _V3_USE_FACET (numpunctchar_type, __flags.getloc ()); // FIXME: adjust buffer dynamically as necessary char __buf [_RWSTD_DBL_MAX_10_EXP]; char *__pbuf = __buf; const string __grouping = __np.grouping (); const char *__grp = __grouping.c_str (); const int __prec= __flags.precision (); #if defined(__VMS) defined(__DECCXX) !defined(__DECFIXCXXL1730) const char __nogrouping = _RWSTD_CHAR_MAX; if (!__np.thousands_sep()) __grp = __nogrouping; #endif Here is the test: cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1 cosf.zko.dec.com locale -k thousands_sep thousands_sep= cosf.zko.dec.com cxx x.cxx a.out null character thousand_sep was not inserted cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \ -I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \ -nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \ a.out null character thousand_sep was inserted cosf.zko.dec.com x.cxx - #ifndef __USE_STD_IOSTREAM #define __USE_STD_IOSTREAM #endif #include iostream #include sstream #include string #include locale #include locale.h #ifdef __linux #define FRENCH_LOCALE fr_FR #else #define FRENCH_LOCALE fr_FR.ISO8859-1 #endif using namespace std; int main() { ostringstream os; if (setlocale(LC_ALL,FRENCH_LOCALE)) { setlocale(LC_ALL,C); os.imbue(locale(FRENCH_LOCALE)); os (double) 1.1 endl; if ( (os.str())[2] == '\0' ) cout null character thousand_sep was inserted endl; else cout null character thousand_sep was not inserted endl; } return 0; } --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 Original Message Subject: Re: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 15:50:06 -0700 From: Martin Sebor [EMAIL PROTECTED] To: Boris Gubenko [EMAIL PROTECTED] References: [EMAIL PROTECTED] Boris Gubenko wrote: Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : I don't think this fix would be quite correct in general. NUL is a valid character that the locale library was specifically designed to be able to insert and extract just like any other. In addition, in the code below, operator==() need not be defined for the character type. ... Here is the test: Thanks for the helpful test case. My feeling is that this case points out a fundamental design disconnect between
[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator
[ https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557810#action_12557810 ] Martin Sebor commented on STDCXX-499: - Here's a test case that reproduces the same behavior with the bg_BG locale on Linux: $ cat t.cpp make t cat /etc/redhat-release ./t | od -c #include cassert #include iostream #include locale #include sstream #include string int main () { std::stringstream strm; strm.imbue (std::locale (bg_BG)); strm 123456; const std::string s = strm.str (); std::cout s '\n'; assert (s.npos == s.find ('\0')); } gcc -c -I/amd/devco/sebor/stdcxx/include/ansi -D_RWSTDDEBUG -pthread -I/amd/devco/sebor/stdcxx/include -I/build/sebor/stdcxx-gcc-3.4.6_3-15D/include -I/amd/devco/sebor/stdcxx/examples/include -pedantic -nostdinc++ -g -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align t.cpp gcc t.o -o t -pthread -L/build/sebor/stdcxx-gcc-3.4.6_3-15D/lib -Wl,-R/build/sebor/stdcxx-gcc-3.4.6_3-15D/lib -lstd15D -lsupc++ -lm Red Hat Enterprise Linux AS release 4 (Nahant Update 4) t: t.cpp:19: int main(): Assertion `s.npos == s.find ('\0')' failed. 000 1 2 3 \0 4 5 6 \n 010 std::num_put inserts NUL thousand separator --- Key: STDCXX-499 URL: https://issues.apache.org/jira/browse/STDCXX-499 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.1.2, 4.1.3, 4.1.4 Reporter: Martin Sebor Fix For: 4.2.1 Moved from Rogue Wave Bugzilla: http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913 Original Message Subject: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 16:10:23 -0500 From: Boris Gubenko [EMAIL PROTECTED] Reply-To: Boris Gubenko [EMAIL PROTECTED] Organization: Hewlett-Packard Co. To: Martin Sebor [EMAIL PROTECTED] Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT */ _TYPENAME num_put_CharT, _OutputIter::iter_type num_put_CharT, _OutputIter:: _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type, const void *__pval) const { const numpunctchar_type __np = _V3_USE_FACET (numpunctchar_type, __flags.getloc ()); // FIXME: adjust buffer dynamically as necessary char __buf [_RWSTD_DBL_MAX_10_EXP]; char *__pbuf = __buf; const string __grouping = __np.grouping (); const char *__grp = __grouping.c_str (); const int __prec= __flags.precision (); #if defined(__VMS) defined(__DECCXX) !defined(__DECFIXCXXL1730) const char __nogrouping = _RWSTD_CHAR_MAX; if (!__np.thousands_sep()) __grp = __nogrouping; #endif Here is the test: cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1 cosf.zko.dec.com locale -k thousands_sep thousands_sep= cosf.zko.dec.com cxx x.cxx a.out null character thousand_sep was not inserted cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \ -I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \ -nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \ a.out null character thousand_sep was inserted cosf.zko.dec.com x.cxx - #ifndef __USE_STD_IOSTREAM #define __USE_STD_IOSTREAM #endif #include iostream #include sstream #include string #include locale #include locale.h #ifdef __linux #define FRENCH_LOCALE fr_FR #else #define FRENCH_LOCALE fr_FR.ISO8859-1 #endif using namespace std; int main() { ostringstream os; if (setlocale(LC_ALL,FRENCH_LOCALE)) { setlocale(LC_ALL,C); os.imbue(locale(FRENCH_LOCALE)); os (double) 1.1 endl; if ( (os.str())[2] == '\0' ) cout null character thousand_sep was inserted endl; else cout null character thousand_sep was not inserted endl; } return 0; } --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 Original Message Subject: Re: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 15:50:06 -0700 From: Martin Sebor [EMAIL PROTECTED] To: Boris Gubenko [EMAIL PROTECTED] References: [EMAIL PROTECTED] Boris Gubenko wrote: Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : I don't think this fix would be quite correct in general. NUL is a valid character that the locale library was specifically designed to be
[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator
[ https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557859#action_12557859 ] Martin Sebor commented on STDCXX-499: - The question is: is this our problem or one with the locale definition (such as the Bulgarian locale on Linux in the test case above). I.e., is it a valid locale that specifies a grouping but no thousands_sep? Among our own locales there is only one that fits this description suggesting it might be a bug in the locale definition: $ (cd ~/stdcxx for f in `grep -l ^grouping *[1-9] etc/nls/src/*`; do grep -l thousands_sep *\\ $f; done) etc/nls/src/bg_BG The latest glibc bg_BG definition is the same: http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/bg_BG?rev=1.7.2.2content-type=text/x-cvsweb-markupcvsroot=glibc I opened a glibc issue to see if they agree it's a bug: http://sources.redhat.com/bugzilla/show_bug.cgi?id=5599 If we should decide to work around it I see two possible ways of handling it in punct.cpp, after retrieving the grouping and thousands_sep for the locale using localeconv(): When grouping is not empty and valid and thsousands_sep is NUL, either a) set grouping to , or b) set thousands_sep to some non-NUL value. Solution a) seems safer because it doesn't involve inventing a thousands_sep that's valid for the locale but the downside is that it loses potentially useful information. Solution b) leaves open the question of which thousands_sep is appropriate for the locale. std::num_put inserts NUL thousand separator --- Key: STDCXX-499 URL: https://issues.apache.org/jira/browse/STDCXX-499 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.1.2, 4.1.3, 4.1.4 Reporter: Martin Sebor Assignee: Martin Sebor Fix For: 4.2.1 Moved from Rogue Wave Bugzilla: http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913 Original Message Subject: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 16:10:23 -0500 From: Boris Gubenko [EMAIL PROTECTED] Reply-To: Boris Gubenko [EMAIL PROTECTED] Organization: Hewlett-Packard Co. To: Martin Sebor [EMAIL PROTECTED] Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT */ _TYPENAME num_put_CharT, _OutputIter::iter_type num_put_CharT, _OutputIter:: _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type, const void *__pval) const { const numpunctchar_type __np = _V3_USE_FACET (numpunctchar_type, __flags.getloc ()); // FIXME: adjust buffer dynamically as necessary char __buf [_RWSTD_DBL_MAX_10_EXP]; char *__pbuf = __buf; const string __grouping = __np.grouping (); const char *__grp = __grouping.c_str (); const int __prec= __flags.precision (); #if defined(__VMS) defined(__DECCXX) !defined(__DECFIXCXXL1730) const char __nogrouping = _RWSTD_CHAR_MAX; if (!__np.thousands_sep()) __grp = __nogrouping; #endif Here is the test: cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1 cosf.zko.dec.com locale -k thousands_sep thousands_sep= cosf.zko.dec.com cxx x.cxx a.out null character thousand_sep was not inserted cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \ -I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \ -nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \ a.out null character thousand_sep was inserted cosf.zko.dec.com x.cxx - #ifndef __USE_STD_IOSTREAM #define __USE_STD_IOSTREAM #endif #include iostream #include sstream #include string #include locale #include locale.h #ifdef __linux #define FRENCH_LOCALE fr_FR #else #define FRENCH_LOCALE fr_FR.ISO8859-1 #endif using namespace std; int main() { ostringstream os; if (setlocale(LC_ALL,FRENCH_LOCALE)) { setlocale(LC_ALL,C); os.imbue(locale(FRENCH_LOCALE)); os (double) 1.1 endl; if ( (os.str())[2] == '\0' ) cout null character thousand_sep was inserted endl; else cout null character thousand_sep was not inserted endl; } return 0; } --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 Original Message Subject: Re: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 15:50:06 -0700 From: Martin Sebor [EMAIL PROTECTED] To: Boris Gubenko [EMAIL PROTECTED] References: [EMAIL PROTECTED] Boris Gubenko wrote: Another locale-related issue that we fixed in rw stdlib
[jira] Commented: (STDCXX-499) std::num_put inserts NUL thousand separator
[ https://issues.apache.org/jira/browse/STDCXX-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557865#action_12557865 ] Martin Sebor commented on STDCXX-499: - It looks like GNU libstdc++ implements solution a) above. I opened an issue for the mismatch between the libc grouping value and what libstdc++ returns: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34733 std::num_put inserts NUL thousand separator --- Key: STDCXX-499 URL: https://issues.apache.org/jira/browse/STDCXX-499 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.1.2, 4.1.3, 4.1.4 Reporter: Martin Sebor Assignee: Martin Sebor Fix For: 4.2.1 Moved from Rogue Wave Bugzilla: http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=1913 Original Message Subject: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 16:10:23 -0500 From: Boris Gubenko [EMAIL PROTECTED] Reply-To: Boris Gubenko [EMAIL PROTECTED] Organization: Hewlett-Packard Co. To: Martin Sebor [EMAIL PROTECTED] Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : template class _CharT, class _OutputIter /* = ostreambuf_iterator_CharT */ _TYPENAME num_put_CharT, _OutputIter::iter_type num_put_CharT, _OutputIter:: _C_put (iter_type __it, ios_base __flags, char_type __fill, int __type, const void *__pval) const { const numpunctchar_type __np = _V3_USE_FACET (numpunctchar_type, __flags.getloc ()); // FIXME: adjust buffer dynamically as necessary char __buf [_RWSTD_DBL_MAX_10_EXP]; char *__pbuf = __buf; const string __grouping = __np.grouping (); const char *__grp = __grouping.c_str (); const int __prec= __flags.precision (); #if defined(__VMS) defined(__DECCXX) !defined(__DECFIXCXXL1730) const char __nogrouping = _RWSTD_CHAR_MAX; if (!__np.thousands_sep()) __grp = __nogrouping; #endif Here is the test: cosf.zko.dec.com setenv LANG fr_FR.ISO8859-1 cosf.zko.dec.com locale -k thousands_sep thousands_sep= cosf.zko.dec.com cxx x.cxx a.out null character thousand_sep was not inserted cosf.zko.dec.com cxx x.cxx -D_RWSTD_USE_CONFIG -D_RWSTDDEBUG \ -I/usr/cxx1/boris/CXXL_1886-2/stdlib-4.0/stdlib/include/ \ -nocxxstd -L/usr/cxx1/boris/CXXL_1886-2/result/lib -lstd11s \ a.out null character thousand_sep was inserted cosf.zko.dec.com x.cxx - #ifndef __USE_STD_IOSTREAM #define __USE_STD_IOSTREAM #endif #include iostream #include sstream #include string #include locale #include locale.h #ifdef __linux #define FRENCH_LOCALE fr_FR #else #define FRENCH_LOCALE fr_FR.ISO8859-1 #endif using namespace std; int main() { ostringstream os; if (setlocale(LC_ALL,FRENCH_LOCALE)) { setlocale(LC_ALL,C); os.imbue(locale(FRENCH_LOCALE)); os (double) 1.1 endl; if ( (os.str())[2] == '\0' ) cout null character thousand_sep was inserted endl; else cout null character thousand_sep was not inserted endl; } return 0; } --- Additional Comments From [EMAIL PROTECTED] 2005-01-11 14:50:44 Original Message Subject: Re: num_put and null-character thousand separator Date: Tue, 11 Jan 2005 15:50:06 -0700 From: Martin Sebor [EMAIL PROTECTED] To: Boris Gubenko [EMAIL PROTECTED] References: [EMAIL PROTECTED] Boris Gubenko wrote: Another locale-related issue that we fixed in rw stdlib v3.0 (and in v2.0 also) is making sure, that num_put does not insert null thousand separator character into the stream. Here is the fix in _num_put.cc in v3.0 : I don't think this fix would be quite correct in general. NUL is a valid character that the locale library was specifically designed to be able to insert and extract just like any other. In addition, in the code below, operator==() need not be defined for the character type. ... Here is the test: Thanks for the helpful test case. My feeling is that this case points out a fundamental design disconnect between the C and C++ locales. In C, NUL is not an ordinary character -- it's a special character that terminates strings. In addition, C formatted I/O is done in multibyte characters. In contrast, in C++, NUL is a character like any other and formatted I/O is always done in single chars (or wchar_t when char is not wide enough), but never in multibyte characters. In C, the thousand separator is a multibyte string so even if grouping is non-empty, inserting an empty string will be as good as inserting none at all. In C++ the