Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
On Fri, May 10, 2013 at 11:49:18AM -0400, Tom Lane wrote: > Patryk Kordylewski writes: > > SET lc_numeric TO 'de_DE.UTF-8'; > > SET > > > SELECT > >TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), > >TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), > > 'FM99G999G999G999G999G999G999D000'); > > to_char | to_number > > -+--- > > 123.456.789,123 | 123.456 > > (1 row) > > I looked into this, and find that the reason it misbehaves is that > NUM_numpart_from_char() will treat a '.' as being a decimal point > *without any regard to locale considerations*. So even if we have > a locale-dependent format string and a locale that says '.' is a > thousands separator, it does the wrong thing. > > It's a bit surprising nobody's complained of this before. > > I propose the attached patch. I'm slightly worried though about whether > this might break any existing applications that are (incorrectly) > depending on a D format specifier being able to match '.' regardless of > locale. Perhaps we should only apply this to HEAD and not back-patch? I never trust to_char() changes for backpatching. ;-) -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
On 13.05.2013 17:09, Tom Lane wrote: Heikki Linnakangas writes: Would it be possible to be lenient, and also accept . as the decimal separator, when there is no ambiguity? Ie. when . is not the thousands separator. I originally coded it that way, but concluded that it was probably a waste of code space. How many locales can you point to where neither the decimal point nor thousands_sep is "."? On my laptop, there are eight locales that use "," as the decimal separator and " " as the thousands separator. $ grep -l "^thousands_sep.*U00A0" /usr/share/i18n/locales/* | xargs grep -l "^decimal_point.*U002C" /usr/share/i18n/locales/cs_CZ /usr/share/i18n/locales/et_EE /usr/share/i18n/locales/fi_FI /usr/share/i18n/locales/lv_LV /usr/share/i18n/locales/nb_NO /usr/share/i18n/locales/ru_RU /usr/share/i18n/locales/sk_SK /usr/share/i18n/locales/uk_UA Out of these, ru_RU actually uses "." as the LC_MONETARY decimal point, even though it uses "," as the LC_NUMERIC decimal point. I think that strengthens the argument for accepting both. I don't speak Russian, but if you pass a monetary value to TO_NUMBER in ru_RU locale, using "." as the decimal separator, you probably would expect it to work. According to http://en.wikipedia.org/wiki/Decimal_separator#Examples_of_use, many countries accept either "1 234 567,89" or "1.234.567,89" style, but looking at the locale files installed on my system, the latter style is the one actually used (e.g Germany). - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
Heikki Linnakangas writes: > Would it be possible to be lenient, and also accept . as the decimal > separator, when there is no ambiguity? Ie. when . is not the thousands > separator. I originally coded it that way, but concluded that it was probably a waste of code space. How many locales can you point to where neither the decimal point nor thousands_sep is "."? It certainly wouldn't be enough to noticeably reduce the potential pain from this change, so I decided that it'd be better to keep the behavior straightforward and as-documented. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
On 11.05.2013 01:17, Euler Taveira wrote: On 10-05-2013 13:09, Thomas Kellerer wrote: Tom Lane wrote on 10.05.2013 17:49: I looked into this, and find that the reason it misbehaves is that NUM_numpart_from_char() will treat a '.' as being a decimal point *without any regard to locale considerations*. So even if we have a locale-dependent format string and a locale that says '.' is a thousands separator, it does the wrong thing. It's a bit surprising nobody's complained of this before. I propose the attached patch. I'm slightly worried though about whether this might break any existing applications that are (incorrectly) depending on a D format specifier being able to match '.' regardless of locale. Perhaps we should only apply this to HEAD and not back-patch? +1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't always produce the "wrong" answer (only in some specific situation) and (c) it has been like that for years without a complain. For those reasons, it is better to continue with this "wrong" behavior in back branches than prevent important security updates to be applied (without applying a patch to preserve the "wrong" answer). This argument is only valid for legacy closed-source apps but seems to have more weight than the bug scenario. +1 for HEAD-only. The Finnish language and locale uses comma (,) as the decimal separator, and it's a real pain in the ass. And if something goes wrong there, it can be *really* subtle. I once had to debug an application where all prices were suddenly rounded down to the nearest euro. And it only happened on some servers (those with locale set to Finnish). It was not a PostgreSQL application, but it turned out to be a bug in the JDBC driver of another DBMS. Would it be possible to be lenient, and also accept . as the decimal separator, when there is no ambiguity? Ie. when . is not the thousands separator. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
On 10-05-2013 13:09, Thomas Kellerer wrote: > Tom Lane wrote on 10.05.2013 17:49: >> I looked into this, and find that the reason it misbehaves is that >> NUM_numpart_from_char() will treat a '.' as being a decimal point >> *without any regard to locale considerations*. So even if we have >> a locale-dependent format string and a locale that says '.' is a >> thousands separator, it does the wrong thing. >> >> It's a bit surprising nobody's complained of this before. >> >> I propose the attached patch. I'm slightly worried though about whether >> this might break any existing applications that are (incorrectly) >> depending on a D format specifier being able to match '.' regardless of >> locale. Perhaps we should only apply this to HEAD and not back-patch? > +1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't always produce the "wrong" answer (only in some specific situation) and (c) it has been like that for years without a complain. For those reasons, it is better to continue with this "wrong" behavior in back branches than prevent important security updates to be applied (without applying a patch to preserve the "wrong" answer). This argument is only valid for legacy closed-source apps but seems to have more weight than the bug scenario. > The manual claims that 'D' is locale dependent (whereas '.' is not), so > _theoretically_ a back patch would make sense I guess. > I would consider a documentation bug in back branches because fix it means break apps. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
Tom Lane wrote on 10.05.2013 17:49: I looked into this, and find that the reason it misbehaves is that NUM_numpart_from_char() will treat a '.' as being a decimal point *without any regard to locale considerations*. So even if we have a locale-dependent format string and a locale that says '.' is a thousands separator, it does the wrong thing. It's a bit surprising nobody's complained of this before. I propose the attached patch. I'm slightly worried though about whether this might break any existing applications that are (incorrectly) depending on a D format specifier being able to match '.' regardless of locale. Perhaps we should only apply this to HEAD and not back-patch? The manual claims that 'D' is locale dependent (whereas '.' is not), so _theoretically_ a back patch would make sense I guess. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
Patryk Kordylewski writes: > SET lc_numeric TO 'de_DE.UTF-8'; > SET > SELECT >TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), >TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), > 'FM99G999G999G999G999G999G999D000'); > to_char | to_number > -+--- > 123.456.789,123 | 123.456 > (1 row) I looked into this, and find that the reason it misbehaves is that NUM_numpart_from_char() will treat a '.' as being a decimal point *without any regard to locale considerations*. So even if we have a locale-dependent format string and a locale that says '.' is a thousands separator, it does the wrong thing. It's a bit surprising nobody's complained of this before. I propose the attached patch. I'm slightly worried though about whether this might break any existing applications that are (incorrectly) depending on a D format specifier being able to match '.' regardless of locale. Perhaps we should only apply this to HEAD and not back-patch? regards, tom lane diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index db5dfca51d477d3e9b33b8d2c264495b3b2ec433..81e3329ef60ce4f835fedba50208b8d0f4b19d63 100644 *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *** NUM_numpart_from_char(NUMProc *Np, int i *** 4131,4137 #endif /* ! * read digit */ if (isdigit((unsigned char) *Np->inout_p)) { --- 4131,4137 #endif /* ! * read digit or decimal point */ if (isdigit((unsigned char) *Np->inout_p)) { *** NUM_numpart_from_char(NUMProc *Np, int i *** 4151,4190 #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "Read digit (%c)", *Np->inout_p); #endif - - /* - * read decimal point - */ } else if (IS_DECIMAL(Np->Num) && Np->read_dec == FALSE) { #ifdef DEBUG_TO_FROM_CHAR ! elog(DEBUG_elog_output, "Try read decimal point (%c)", *Np->inout_p); #endif ! if (*Np->inout_p == '.') { *Np->number_p = '.'; Np->number_p++; Np->read_dec = TRUE; isread = TRUE; } - else - { - int x = strlen(Np->decimal); - - #ifdef DEBUG_TO_FROM_CHAR - elog(DEBUG_elog_output, "Try read locale point (%c)", - *Np->inout_p); - #endif - if (x && AMOUNT_TEST(x) && strncmp(Np->inout_p, Np->decimal, x) == 0) - { - Np->inout_p += x - 1; - *Np->number_p = '.'; - Np->number_p++; - Np->read_dec = TRUE; - isread = TRUE; - } - } } if (OVERLOAD_TEST) --- 4151,4178 #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "Read digit (%c)", *Np->inout_p); #endif } else if (IS_DECIMAL(Np->Num) && Np->read_dec == FALSE) { + /* + * We need not test IS_LDECIMAL(Np->Num) explicitly here, because + * Np->decimal is always just "." if we don't have a D format token. + * So we just unconditionally match to Np->decimal. + */ + int x = strlen(Np->decimal); + #ifdef DEBUG_TO_FROM_CHAR ! elog(DEBUG_elog_output, "Try read decimal point (%c)", ! *Np->inout_p); #endif ! if (x && AMOUNT_TEST(x) && strncmp(Np->inout_p, Np->decimal, x) == 0) { + Np->inout_p += x - 1; *Np->number_p = '.'; Np->number_p++; Np->read_dec = TRUE; isread = TRUE; } } if (OVERLOAD_TEST) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
Am 06.05.2013 16:26, schrieb Tom Lane: Patryk Kordylewski writes: Hi, i have the following test-case: http://psql.privatepaste.com/b3b431851a Just for the record: this is a completely unacceptable method of submitting a bug report. After a month from now, when that paste has expired, nobody looking at the PG archives will have any way to find out what you were talking about. Even without the archival consideration, you've added an extra step to fetch the test case for anyone reading your mail. Having said that, though, this does look wrong ... regards, tom lane Hmpf, you're right. I'm sorry. Here is the test-case: PostgreSQL 9.2.4 (also tested on 9.1 and 8.2) ### WORKS SET lc_numeric TO 'en_US.UTF-8'; SET SELECT TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 'FM99G999G999G999G999G999G999D000'); to_char | to_number -+--- 123,456,789.123 | 123456789.123 (1 row) ### DOES NOT WORK SET lc_numeric TO 'de_DE.UTF-8'; SET SELECT TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 'FM99G999G999G999G999G999G999D000'); to_char | to_number -+--- 123.456.789,123 | 123.456 (1 row) ### FORMAT STRING WITHOUT FM PREFIX AND EXACT NUMBER OF DIGITS WORKS SELECT TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), '999G999G999D000'); to_char | to_number -+--- 123.456.789,123 | 123456789.123 (1 row) Thank you. Best regards, Patryk -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
Patryk Kordylewski writes: > Hi, > i have the following test-case: > http://psql.privatepaste.com/b3b431851a Just for the record: this is a completely unacceptable method of submitting a bug report. After a month from now, when that paste has expired, nobody looking at the PG archives will have any way to find out what you were talking about. Even without the archival consideration, you've added an extra step to fetch the test case for anyone reading your mail. Having said that, though, this does look wrong ... regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
Hi, i have the following test-case: http://psql.privatepaste.com/b3b431851a We want to convert a TO_CHAR() formated numeric back to numeric using TO_NUMBER() with the same format string. This does not work with a german locale. By removing the FM prefix and using the exact amount of digits in the format string for TO_NUMBER() it starts to work. Switching the locale back to en_US does work too. Is this a bug or am i missing something? Thanks! Best regards, Patryk -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs