Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-16 Thread Bruce Momjian
On Fri, May 10, 2013 at 11:49:18AM -0400, Tom Lane wrote:
 Patryk Kordylewski p...@fooby.de writes:
  SET lc_numeric TO 'de_DE.UTF-8';
  SET
 
  SELECT
 TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
 TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 
  'FM99G999G999G999G999G999G999D000');
to_char | to_number
  -+---
123.456.789,123 |   123.456
  (1 row)
 
 I looked into this, and find that the reason it misbehaves is that
 NUM_numpart_from_char() will treat a '.' as being a decimal point
 *without any regard to locale considerations*.  So even if we have
 a locale-dependent format string and a locale that says '.' is a
 thousands separator, it does the wrong thing.
 
 It's a bit surprising nobody's complained of this before.
 
 I propose the attached patch.  I'm slightly worried though about whether
 this might break any existing applications that are (incorrectly)
 depending on a D format specifier being able to match '.' regardless of
 locale.  Perhaps we should only apply this to HEAD and not back-patch?

I never trust to_char() changes for backpatching.  ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-13 Thread Heikki Linnakangas

On 11.05.2013 01:17, Euler Taveira wrote:

On 10-05-2013 13:09, Thomas Kellerer wrote:

Tom Lane wrote on 10.05.2013 17:49:

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*.  So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch.  I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale.  Perhaps we should only apply this to HEAD and not back-patch?



+1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't
always produce the wrong answer (only in some specific situation) and
(c) it has been like that for years without a complain. For those
reasons, it is better to continue with this wrong behavior in back
branches than prevent important security updates to be applied (without
applying a patch to preserve the wrong answer). This argument is only
valid for legacy closed-source apps but seems to have more weight than
the bug scenario.


+1 for HEAD-only. The Finnish language and locale uses comma (,) as the 
decimal separator, and it's a real pain in the ass. And if something 
goes wrong there, it can be *really* subtle. I once had to debug an 
application where all prices were suddenly rounded down to the nearest 
euro. And it only happened on some servers (those with locale set to 
Finnish). It was not a PostgreSQL application, but it turned out to be a 
bug in the JDBC driver of another DBMS.


Would it be possible to be lenient, and also accept . as the decimal 
separator, when there is no ambiguity? Ie. when . is not the thousands 
separator.


- Heikki


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Would it be possible to be lenient, and also accept . as the decimal 
 separator, when there is no ambiguity? Ie. when . is not the thousands 
 separator.

I originally coded it that way, but concluded that it was probably a
waste of code space.  How many locales can you point to where neither
the decimal point nor thousands_sep is .?  It certainly wouldn't be
enough to noticeably reduce the potential pain from this change, so
I decided that it'd be better to keep the behavior straightforward
and as-documented.

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-13 Thread Heikki Linnakangas

On 13.05.2013 17:09, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

Would it be possible to be lenient, and also accept . as the decimal
separator, when there is no ambiguity? Ie. when . is not the thousands
separator.


I originally coded it that way, but concluded that it was probably a
waste of code space.  How many locales can you point to where neither
the decimal point nor thousands_sep is .?


On my laptop, there are eight locales that use , as the decimal 
separator and   as the thousands separator.


$ grep -l ^thousands_sep.*U00A0 /usr/share/i18n/locales/* | xargs grep 
-l ^decimal_point.*U002C

/usr/share/i18n/locales/cs_CZ
/usr/share/i18n/locales/et_EE
/usr/share/i18n/locales/fi_FI
/usr/share/i18n/locales/lv_LV
/usr/share/i18n/locales/nb_NO
/usr/share/i18n/locales/ru_RU
/usr/share/i18n/locales/sk_SK
/usr/share/i18n/locales/uk_UA

Out of these, ru_RU actually uses . as the LC_MONETARY decimal point, 
even though it uses , as the LC_NUMERIC decimal point. I think that 
strengthens the argument for accepting both. I don't speak Russian, but 
if you pass a monetary value to TO_NUMBER in ru_RU locale, using . as 
the decimal separator, you probably would expect it to work.


According to 
http://en.wikipedia.org/wiki/Decimal_separator#Examples_of_use, many 
countries accept either 1 234 567,89 or 1.234.567,89 style, but 
looking at the locale files installed on my system, the latter style is 
the one actually used (e.g Germany).


- Heikki


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-10 Thread Tom Lane
Patryk Kordylewski p...@fooby.de writes:
 SET lc_numeric TO 'de_DE.UTF-8';
 SET

 SELECT
TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 
 'FM99G999G999G999G999G999G999D000');
   to_char | to_number
 -+---
   123.456.789,123 |   123.456
 (1 row)

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*.  So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch.  I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale.  Perhaps we should only apply this to HEAD and not back-patch?

regards, tom lane

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index db5dfca51d477d3e9b33b8d2c264495b3b2ec433..81e3329ef60ce4f835fedba50208b8d0f4b19d63 100644
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
*** NUM_numpart_from_char(NUMProc *Np, int i
*** 4131,4137 
  #endif
  
  	/*
! 	 * read digit
  	 */
  	if (isdigit((unsigned char) *Np-inout_p))
  	{
--- 4131,4137 
  #endif
  
  	/*
! 	 * read digit or decimal point
  	 */
  	if (isdigit((unsigned char) *Np-inout_p))
  	{
*** NUM_numpart_from_char(NUMProc *Np, int i
*** 4151,4190 
  #ifdef DEBUG_TO_FROM_CHAR
  		elog(DEBUG_elog_output, Read digit (%c), *Np-inout_p);
  #endif
- 
- 		/*
- 		 * read decimal point
- 		 */
  	}
  	else if (IS_DECIMAL(Np-Num)  Np-read_dec == FALSE)
  	{
  #ifdef DEBUG_TO_FROM_CHAR
! 		elog(DEBUG_elog_output, Try read decimal point (%c), *Np-inout_p);
  #endif
! 		if (*Np-inout_p == '.')
  		{
  			*Np-number_p = '.';
  			Np-number_p++;
  			Np-read_dec = TRUE;
  			isread = TRUE;
  		}
- 		else
- 		{
- 			int			x = strlen(Np-decimal);
- 
- #ifdef DEBUG_TO_FROM_CHAR
- 			elog(DEBUG_elog_output, Try read locale point (%c),
-  *Np-inout_p);
- #endif
- 			if (x  AMOUNT_TEST(x)  strncmp(Np-inout_p, Np-decimal, x) == 0)
- 			{
- Np-inout_p += x - 1;
- *Np-number_p = '.';
- Np-number_p++;
- Np-read_dec = TRUE;
- isread = TRUE;
- 			}
- 		}
  	}
  
  	if (OVERLOAD_TEST)
--- 4151,4178 
  #ifdef DEBUG_TO_FROM_CHAR
  		elog(DEBUG_elog_output, Read digit (%c), *Np-inout_p);
  #endif
  	}
  	else if (IS_DECIMAL(Np-Num)  Np-read_dec == FALSE)
  	{
+ 		/*
+ 		 * We need not test IS_LDECIMAL(Np-Num) explicitly here, because
+ 		 * Np-decimal is always just . if we don't have a D format token.
+ 		 * So we just unconditionally match to Np-decimal.
+ 		 */
+ 		int			x = strlen(Np-decimal);
+ 
  #ifdef DEBUG_TO_FROM_CHAR
! 		elog(DEBUG_elog_output, Try read decimal point (%c),
! 			 *Np-inout_p);
  #endif
! 		if (x  AMOUNT_TEST(x)  strncmp(Np-inout_p, Np-decimal, x) == 0)
  		{
+ 			Np-inout_p += x - 1;
  			*Np-number_p = '.';
  			Np-number_p++;
  			Np-read_dec = TRUE;
  			isread = TRUE;
  		}
  	}
  
  	if (OVERLOAD_TEST)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-10 Thread Thomas Kellerer

Tom Lane wrote on 10.05.2013 17:49:

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*.  So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch.  I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale.  Perhaps we should only apply this to HEAD and not back-patch?


The manual claims that 'D' is locale dependent (whereas '.' is not), so
_theoretically_ a back patch would make sense I guess.






--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-10 Thread Euler Taveira
On 10-05-2013 13:09, Thomas Kellerer wrote:
 Tom Lane wrote on 10.05.2013 17:49:
 I looked into this, and find that the reason it misbehaves is that
 NUM_numpart_from_char() will treat a '.' as being a decimal point
 *without any regard to locale considerations*.  So even if we have
 a locale-dependent format string and a locale that says '.' is a
 thousands separator, it does the wrong thing.

 It's a bit surprising nobody's complained of this before.

 I propose the attached patch.  I'm slightly worried though about whether
 this might break any existing applications that are (incorrectly)
 depending on a D format specifier being able to match '.' regardless of
 locale.  Perhaps we should only apply this to HEAD and not back-patch?
 
+1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't
always produce the wrong answer (only in some specific situation) and
(c) it has been like that for years without a complain. For those
reasons, it is better to continue with this wrong behavior in back
branches than prevent important security updates to be applied (without
applying a patch to preserve the wrong answer). This argument is only
valid for legacy closed-source apps but seems to have more weight than
the bug scenario.

 The manual claims that 'D' is locale dependent (whereas '.' is not), so
 _theoretically_ a back patch would make sense I guess.
 
I would consider a documentation bug in back branches because fix it
means break apps.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-06 Thread Patryk Kordylewski

Hi,

i have the following test-case:

http://psql.privatepaste.com/b3b431851a

We want to convert a TO_CHAR() formated numeric back to numeric using 
TO_NUMBER() with the same format string. This does not work with a 
german locale.


By removing the FM prefix and using the exact amount of digits in the 
format string for TO_NUMBER() it starts to work. Switching the locale 
back to en_US does work too.


Is this a bug or am i missing something?

Thanks!

Best regards,
Patryk


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-06 Thread Tom Lane
Patryk Kordylewski p...@fooby.de writes:
 Hi,
 i have the following test-case:

 http://psql.privatepaste.com/b3b431851a

Just for the record: this is a completely unacceptable method of
submitting a bug report.  After a month from now, when that paste has
expired, nobody looking at the PG archives will have any way to find out
what you were talking about.  Even without the archival consideration,
you've added an extra step to fetch the test case for anyone reading
your mail.

Having said that, though, this does look wrong ...

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()

2013-05-06 Thread Patryk Kordylewski

Am 06.05.2013 16:26, schrieb Tom Lane:

Patryk Kordylewski p...@fooby.de writes:

Hi,
i have the following test-case:



http://psql.privatepaste.com/b3b431851a


Just for the record: this is a completely unacceptable method of
submitting a bug report.  After a month from now, when that paste has
expired, nobody looking at the PG archives will have any way to find out
what you were talking about.  Even without the archival consideration,
you've added an extra step to fetch the test case for anyone reading
your mail.

Having said that, though, this does look wrong ...

regards, tom lane


Hmpf, you're right. I'm sorry. Here is the test-case:

PostgreSQL 9.2.4 (also tested on 9.1 and 8.2)

### WORKS

SET lc_numeric TO 'en_US.UTF-8';
SET

SELECT
  TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
  TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 
'FM99G999G999G999G999G999G999D000');

 to_char |   to_number
-+---
 123,456,789.123 | 123456789.123
(1 row)

### DOES NOT WORK

SET lc_numeric TO 'de_DE.UTF-8';
SET

SELECT
  TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
  TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 
'FM99G999G999G999G999G999G999D000');

 to_char | to_number
-+---
 123.456.789,123 |   123.456
(1 row)


### FORMAT STRING WITHOUT FM PREFIX AND EXACT NUMBER OF DIGITS WORKS

SELECT
  TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
  TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'), 
'999G999G999D000');

 to_char |   to_number
-+---
 123.456.789,123 | 123456789.123
(1 row)

Thank you.

Best regards,
Patryk


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs