Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-04-02 Thread Bruce Momjian
On Wed, Apr  1, 2015 at 11:48:37AM -0400, Bruce Momjian wrote:
 This should return something like 16.000... (per Oracle
 output at the URL above, float4 has 6 significant digits on my compiler)
 but I can't seem to figure how to get printf() to round non-fractional
 parts.  I am afraid the only solution is to use printf's %e format and
 place the decimal point myself.

Hearing nothing, I went with the %e approach;  patch attached.  The new
output looks right:

test= SELECT to_char(float4 
'15.9123456789123456789000',
  repeat('9', 50) || '.' || repeat('9', 50));
to_char


  
16.00
(1 row)

 The fact I still don't have a complete solution suggests this is 9.6
 material but I still want to work on it so it is ready.

I will keep this patch for 9.6 unless I hear otherwise.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
new file mode 100644
index 40a353f..d283cd2
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
***
*** 113,125 
  #define DCH_MAX_ITEM_SIZ	   12		/* max localized day name		*/
  #define NUM_MAX_ITEM_SIZ		8		/* roman number (RN has 15 chars)	*/
  
- /* --
-  * More is in float.c
-  * --
-  */
- #define MAXFLOATWIDTH	60
- #define MAXDOUBLEWIDTH	500
- 
  
  /* --
   * Format parser structs
--- 113,118 
*** static DCHCacheEntry *DCH_cache_getnew(c
*** 989,994 
--- 982,989 
  static NUMCacheEntry *NUM_cache_search(char *str);
  static NUMCacheEntry *NUM_cache_getnew(char *str);
  static void NUM_cache_remove(NUMCacheEntry *ent);
+ static char *add_pre_zero_padding(char *num_str, int decimal_shift);
+ static char *add_post_zero_padding(char *num_str, int pad_digits);
  
  
  /* --
*** do { \
*** 5016,5021 
--- 5011,5103 
  	SET_VARSIZE(result, len + VARHDRSZ); \
  } while (0)
  
+ /*
+  * add_pre_zero_padding
+  *
+  * printf() can only round non-fractional parts of a number if the number
+  * is in exponential format, so this function takes a number in that format
+  * and adds pre-decimal zero padding.
+  */
+ static char *
+ add_pre_zero_padding(char *num_str, int decimal_shift)
+ {
+ 	/* one for decimal point, one for trailing null byte */
+ 	char *out = palloc(strlen(num_str) + decimal_shift + 1 + 1), *out_p = out;
+ 	char *num_str_p = num_str;
+ 	bool decimal_found = false;
+ 
+ 	/* copy the number before 'e' and shift the decimal point */
+ 	while (*num_str_p  *num_str_p != 'e')
+ 	{
+ 		if (*num_str_p == '.')
+ 		{
+ 			num_str_p++;
+ 			decimal_found = true;
+ 		}
+ 		else
+ 		{
+ 			*(out_p++) = *(num_str_p++);
+ 			if (decimal_found)
+ decimal_shift--;
+ 		}
+ 	}
+ 
+ 	/* add zero pad digits */
+ 	while (decimal_shift--  0)
+ 		*(out_p++) = '0';
+ 
+ 	*(out_p++) = '\0';
+ 
+ 	pfree(num_str);
+ 	return out;
+ }
+ 
+ /*
+  * add_post_zero_padding
+  *
+  * Some sprintf() implementations have a 512-digit precision limit, and we
+  * need sprintf() to round to the internal precision, so this function adds
+  * zero padding between the mantissa and exponent of an exponential-format
+  * number, or after the supplied string for non-exponent strings.
+  */
+ static char *
+ add_post_zero_padding(char *num_str, int pad_digits)
+ {
+ 	/* one for decimal point, one for trailing null byte */
+ 	char *out = palloc(strlen(num_str) + pad_digits + 1 + 1), *out_p = out;
+ 	char *num_str_p = num_str;
+ 	bool decimal_found = false;
+ 
+ 	/* copy the number before 'e', or the entire string if no 'e' */
+ 	while (*num_str_p  *num_str_p != 'e')
+ 	{
+ 		if (*num_str_p == '.')
+ 			decimal_found = true;
+ 		*(out_p++) = *(num_str_p++);
+ 	}
+ 
+ 	/* add zero pad digits */
+ 	while (pad_digits--  0)
+ 	{
+ 		if (!decimal_found)
+ 		{
+ 			*(out_p++) = '.';
+ 			decimal_found = true;
+ 		}
+ 
+ 		*(out_p++) = '0';
+ 	}
+ 
+ 	/* copy 'e' and everything after */
+ 	while (*num_str_p)
+ 		*(out_p++) = *(num_str_p++);
+ 
+ 	*(out_p++) = '\0';
+ 
+ 	pfree(num_str);
+ 	return out;	
+ }
+ 
  /* ---
   * NUMERIC to_number() (convert string to numeric)
   * ---
*** int4_to_char(PG_FUNCTION_ARGS)
*** 5214,5221 
  		/* we can do it easily because float8 won't lose any precision */
  		float8		val = (float8) value;
  
! 		orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
! 		snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val);
  
  		/*
  		 * Swap a leading positive sign for a space.
--- 5296,5307 
  		

Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-04-01 Thread Bruce Momjian
On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
 On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
  On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
   On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
This junk digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, 
'9.9') as x from dual;
--
1.12345678912345680

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', 
'9.9');
--
1.12345678912345000
 
   These outputs show Oracle treating 17 digits as significant while 
   PostgreSQL
   treats 15 digits as significant.  Should we match Oracle in this respect 
   while
   we're breaking compatibility anyway?  I tend to think yes.
  
  Uh, I am hesistant to adjust our precision to match Oracle as I don't
  know what they are using internally.
 
 http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
 float8 and 9 digits for float4.

I was able to get proper rounding with the attached patch.

test= SELECT to_char(float8 '1.123456789123456789123456789', 
'9.9');
 to_char
--
  1.12345678912346000
(1 row)

Handling rounding for exponent-format values turned out to be simple. 
What has me stuck now is how to do rounding in the non-decimal part of
the number, e.g.

test= SELECT to_char(float4 
'15.9123456789123456789000',
  repeat('9', 50) || '.' || repeat('9', 50));
to_char


  
1555753984.00
(1 row)

This should return something like 16.000... (per Oracle
output at the URL above, float4 has 6 significant digits on my compiler)
but I can't seem to figure how to get printf() to round non-fractional
parts.  I am afraid the only solution is to use printf's %e format and
place the decimal point myself.

The fact I still don't have a complete solution suggests this is 9.6
material but I still want to work on it so it is ready.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
new file mode 100644
index 40a353f..2b5a440
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
***
*** 113,125 
  #define DCH_MAX_ITEM_SIZ	   12		/* max localized day name		*/
  #define NUM_MAX_ITEM_SIZ		8		/* roman number (RN has 15 chars)	*/
  
- /* --
-  * More is in float.c
-  * --
-  */
- #define MAXFLOATWIDTH	60
- #define MAXDOUBLEWIDTH	500
- 
  
  /* --
   * Format parser structs
--- 113,118 
*** static DCHCacheEntry *DCH_cache_getnew(c
*** 989,994 
--- 982,988 
  static NUMCacheEntry *NUM_cache_search(char *str);
  static NUMCacheEntry *NUM_cache_getnew(char *str);
  static void NUM_cache_remove(NUMCacheEntry *ent);
+ static char *add_zero_padding(char *num_str, int pad_digits);
  
  
  /* --
*** do { \
*** 5016,5021 
--- 5010,5056 
  	SET_VARSIZE(result, len + VARHDRSZ); \
  } while (0)
  
+ /*
+  * add_zero_padding
+  *
+  * Some sprintf() implementations have a 512-digit precision limit, and we
+  * need sprintf() to round to the internal precision, so this function adds
+  * zero padding between the mantissa and exponent of an exponential-format
+  * number, or after the supplied string for non-exponent strings.
+  */
+ static char *
+ add_zero_padding(char *num_str, int pad_digits)
+ {
+ 	/* one for decimal point, one for trailing null byte */
+ 	char *out = palloc(strlen(num_str) + pad_digits + 1 + 1), *out_p = out;
+ 	char *num_str_p = num_str;
+ 	bool found_decimal = false;
+ 
+ 	/* copy the number before 'e', or the entire string if no 'e' */
+ 	while (*num_str_p  *num_str_p != 'e'  *num_str_p != 'E')
+ 	{
+ 		if (*num_str_p == '.')
+ 			found_decimal = true;
+ 		*(out_p++) = *(num_str_p++);
+ 	}
+ 
+ 	if (!found_decimal)
+ 		*(out_p++) = '.';
+ 
+ 	/* add zero pad digits */
+ 	while (pad_digits--  0)
+ 		*(out_p++) = '0';
+ 
+ 	/* copy 'e' and everything after */
+ 	while (*num_str_p)
+ 		*(out_p++) = *(num_str_p++);
+ 
+ 	*(out_p++) = '\0';
+ 
+ 	pfree(num_str);
+ 	return out;	
+ }
+ 
  /* ---
   * NUMERIC to_number() (convert string to numeric)
   * 

Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Noah Misch
On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
 On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
  On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
   This junk digit zeroing matches the Oracle behavior:
   
 SELECT to_char(1.123456789123456789123456789d, 
   '9.9') as x from dual;
 --
 1.12345678912345680
   
   Our output with the patch would be:
   
 SELECT to_char(float8 '1.123456789123456789123456789', 
   '9.9');
 --
 1.12345678912345000

  These outputs show Oracle treating 17 digits as significant while PostgreSQL
  treats 15 digits as significant.  Should we match Oracle in this respect 
  while
  we're breaking compatibility anyway?  I tend to think yes.
 
 Uh, I am hesistant to adjust our precision to match Oracle as I don't
 know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Bruce Momjian
On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
 On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
  On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
   On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
This junk digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, 
'9.9') as x from dual;
--
1.12345678912345680

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', 
'9.9');
--
1.12345678912345000
 
   These outputs show Oracle treating 17 digits as significant while 
   PostgreSQL
   treats 15 digits as significant.  Should we match Oracle in this respect 
   while
   we're breaking compatibility anyway?  I tend to think yes.
  
  Uh, I am hesistant to adjust our precision to match Oracle as I don't
  know what they are using internally.
 
 http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
 float8 and 9 digits for float4.

OK, I am fine in using those values if you can find them as compiler
defines, but I don't see how we can grab those values from a user test
on Oracle.

There are some invisible float digits that don't appear in %f but can
be shown if desired --- I think we used to do that in the regression
tests, but found they added too much platform-specific randomness.  Do
we want to go in that direction?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Noah Misch
On Tue, Mar 24, 2015 at 03:08:26PM -0400, Bruce Momjian wrote:
 On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:
  On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
   On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
 On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
  This junk digit zeroing matches the Oracle behavior:
  
  SELECT to_char(1.123456789123456789123456789d, 
  '9.9') as x from dual;
  --
  1.12345678912345680
  
  Our output with the patch would be:
  
  SELECT to_char(float8 '1.123456789123456789123456789', 
  '9.9');
  --
  1.12345678912345000
   
 These outputs show Oracle treating 17 digits as significant while 
 PostgreSQL
 treats 15 digits as significant.  Should we match Oracle in this 
 respect while
 we're breaking compatibility anyway?  I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.
   
   http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits 
   for
   float8 and 9 digits for float4.
  
  OK, I am fine in using those values if you can find them as compiler
  defines, but I don't see how we can grab those values from a user test
  on Oracle.

We encounter no authority higher than the test results, so it would be wrong
to seek out and use a define that just happens to match a test result.  Adding
#define TO_CHAR_DBL_DIG 17 and #define TO_CHAR_FLT_DIG 9 is good.

  There are some invisible float digits that don't appear in %f but can
  be shown if desired --- I think we used to do that in the regression
  tests, but found they added too much platform-specific randomness.  Do
  we want to go in that direction?

Bare %f simply prints all digits before the decimal point and exactly six
digits after the decimal point.  Whether implementation-defined digits appear
in that output depends on the number's magnitude.  However, float8out and
float4out do behave along the lines of your description.

I do recommend pushing TO_CHAR in that direction, to make it more like Oracle
while we're already breaking compatibility with PostgreSQL 9.4.

 How about if we have to_char() honor our extra_float_digits GUC, so
 users who want those digits can get them?

It's not my first choice; for one thing, no value of extra_float_digits would
yield 17 digits for float8 and 9 digits for float4.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Noah Misch
On Tue, Mar 24, 2015 at 09:26:10AM -0700, Jeff Anton wrote:
 The issue of significant (decimal) digits to and from floating point
 representation is a complex one.
 
 What 'significant' means may depend upon the intent.

True.  I meant simply that Oracle TO_CHAR emits no more than 17 nonzero digits
for a BINARY_DOUBLE and no more than 9 nonzero digits for a BINARY_FLOAT.

 BTW:  This is my first posting to this list.  I should introduce myself.
 I'm Jeff Anton.  I was the first Postgres project lead programmer working
 for Michael Stonebraker at U.C. Berkeley a very long time ago.
 The first version was never released.  I've since worked for several db
 companies.

Welcome back.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Andrew Gierth
 Jeff == Jeff Anton antonpg...@hesiod.org writes:

 Jeff Postgresql seems to be using the first interpretation and
 Jeff reporting fewer digits.  I've noticed this with pg_dump.  That a
 Jeff dump and restore of floating point values does not produce the
 Jeff same floating point values.  To me, that is inexcusable.  Using
 Jeff the -Fc format, real values are preserved.  I have a large
 Jeff database of security prices.  I want accuracy above all.

-Fc doesn't do anything different with floats than text dumps do.

pg_dump (in all modes) uses the extra_float_digits setting to try and
get a value that restores exactly. There have been issues with this in
the past (the limit of extra_float_digits had to be raised from 2 to 3,
iirc), but if it's happening now then that should probably be considered
a bug.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Jeff Anton
The issue of significant (decimal) digits to and from floating point 
representation is a complex one.


What 'significant' means may depend upon the intent.

There are (at least) two different tests which may need to be used.

* How many digits can be stored and then accurately returned?
or
* How many decimal digits are needed to recreate a floating point value? 
 Or in longer form, if you have a floating point value, you may want to 
print it in decimal form and then later scan that to recreate the exact 
bit pattern from the original floating point value.  How many decimal 
digits do you need?


The first question produces a smaller number of digits then the second one!

The idea of zero padding is, IMO, a bad idea all together.  It makes 
people feel better, but it adds inaccuracy.  I've lost this 
interpretation so many times now that I only mention it for the real 
number geeks out there.


Postgresql seems to be using the first interpretation and reporting 
fewer digits.  I've noticed this with pg_dump.  That a dump and restore 
of floating point values does not produce the same floating point 
values.  To me, that is inexcusable.  Using the -Fc format, real values 
are preserved.  I have a large database of security prices.  I want 
accuracy above all.


I do not have time right now to produce the needed evidence for all 
these cases of floating point values.  If there is interest I can 
produce this in a day or so.


Jeff Anton

BTW:  This is my first posting to this list.  I should introduce myself.
I'm Jeff Anton.  I was the first Postgres project lead programmer 
working for Michael Stonebraker at U.C. Berkeley a very long time ago.
The first version was never released.  I've since worked for several db 
companies.



On 03/24/15 06:47, Noah Misch wrote:

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This junk digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, 
'9.9') as x from dual;
--
1.12345678912345680

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', 
'9.9');
--
1.12345678912345000



These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant.  Should we match Oracle in this respect while
we're breaking compatibility anyway?  I tend to think yes.


Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.


http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Bruce Momjian
On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:
 On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
  On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
   On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
 This junk digit zeroing matches the Oracle behavior:
 
   SELECT to_char(1.123456789123456789123456789d, 
 '9.9') as x from dual;
   --
   1.12345678912345680
 
 Our output with the patch would be:
 
   SELECT to_char(float8 '1.123456789123456789123456789', 
 '9.9');
   --
   1.12345678912345000
  
These outputs show Oracle treating 17 digits as significant while 
PostgreSQL
treats 15 digits as significant.  Should we match Oracle in this 
respect while
we're breaking compatibility anyway?  I tend to think yes.
   
   Uh, I am hesistant to adjust our precision to match Oracle as I don't
   know what they are using internally.
  
  http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
  float8 and 9 digits for float4.
 
 OK, I am fine in using those values if you can find them as compiler
 defines, but I don't see how we can grab those values from a user test
 on Oracle.
 
 There are some invisible float digits that don't appear in %f but can
 be shown if desired --- I think we used to do that in the regression
 tests, but found they added too much platform-specific randomness.  Do
 we want to go in that direction?

How about if we have to_char() honor our extra_float_digits GUC, so
users who want those digits can get them?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-24 Thread Baker, Keith [OCDUS Non-JJ]
I like the idea of using extra_float_digits  (which I always set this to 3 to 
ensure I don't lose precision).
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1251
Theorem 15
When a binary IEEE single precision number is converted to the closest 
eight digit decimal number, it is not always possible to uniquely recover the 
binary number from the decimal one. However, if nine decimal digits are used, 
then converting the decimal number to the closest binary number will recover 
the original floating-point number.

Keith Baker

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Bruce Momjian
 Sent: Tuesday, March 24, 2015 3:08 PM
 To: Noah Misch
 Cc: PostgreSQL-development
 Subject: Re: [HACKERS] Zero-padding and zero-masking fixes for
 to_char(float)
 
 On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:
  On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
   On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
 On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
  This junk digit zeroing matches the Oracle behavior:
 
  SELECT to_char(1.123456789123456789123456789d,
 '9.9') as x from dual;
  --
  1.12345678912345680
 
  Our output with the patch would be:
 
  SELECT to_char(float8 '1.123456789123456789123456789',
 '9.9');
  --
  1.12345678912345000
  
 These outputs show Oracle treating 17 digits as significant
 while PostgreSQL treats 15 digits as significant.  Should we
 match Oracle in this respect while we're breaking compatibility
 anyway?  I tend to think yes.
   
Uh, I am hesistant to adjust our precision to match Oracle as I
don't know what they are using internally.
  
   http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant
   digits for
   float8 and 9 digits for float4.
 
  OK, I am fine in using those values if you can find them as compiler
  defines, but I don't see how we can grab those values from a user test
  on Oracle.
 
  There are some invisible float digits that don't appear in %f but
  can be shown if desired --- I think we used to do that in the
  regression tests, but found they added too much platform-specific
  randomness.  Do we want to go in that direction?
 
 How about if we have to_char() honor our extra_float_digits GUC, so users
 who want those digits can get them?
 
 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + Everyone has their own god. +
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
 changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-23 Thread Bruce Momjian
On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
  I recommend adding a configure test to use our snprintf.c replacements if
  sprintf(%.*f, 65536, 999.0) gives unexpected output.
 
  Do we really want to go to our /port snprintf just to handle 512+
  digits?
 
 I'd rather not go that direction (that is, to using a configure test).
 It assumes that all copies of libc on a particular platform behave the
 same, which seems like a bad bet to me.  I think we'd be better off to
 avoid asking libc to do anything that might not work everywhere.
 
 On the other hand, this line of thought might end up having you
 reimplement in formatting.c the same logic I put into snprintf.c
 recently, which seems a bit silly.

If we can't trust libc for 512+ precision (and I don't think a startup
check is warranted), I think we should either use our internal snprintf
for to_char() in all cases, or in cases where the precision is 512+.  Of
course, this hinges on the assumption that only to_char() cares about
512+ digits --- if not, a startup check seems a requirement.

However, even if we have a working snprintf, the bigger problem is
having to do rounding, e.g. this is the worst case:

SELECT to_char(float4 '5.', '9D' || 
repeat('9', 1000) || '');

What we have to do here is to round to the specified precision, i.e. we
can't just pad with zeros, which was my original approach.  I am afraid
that means we have to keep the odd coding where we call snprintf with
zero decimal digits, get its length, then figure out the decimal
precision, then call snprintf again to format the string with rounding. 
Then, we need to separate the mantissa from the exponent and add the
desired number of zeros and copy it into a new longer string.

So, instead of the code being cleaner, it will be even more messy, and
will  duplicate some of what we already do in our port/snprintf.c. 
Unfortunately, there is no way to call snprintf and tell it we want a
specific number of decimal digits _and_ pad the rest with zeros.  I
think trying to do the rounding on the output string will never work
well, e.g. 9. rounded to one decimal digit is 10.0.

Yuck.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Bruce Momjian
On Sun, Mar 22, 2015 at 01:42:56AM -0400, Tom Lane wrote:
 David Rowley dgrowle...@gmail.com writes:
  This seems to have broken jacana.  Looks like MSVC by default has a 3 digit
  exponent.
 
 jacana was broken before this patch; but some other Windows critters
 are now unhappy as well.
 
  Going by this:
  https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
  that it can quite easily be set back to 2.
 
  I've attached a patch which seems to fix the issue.
 
 That seems likely to have side-effects far beyond what's appropriate.
 We have gone out of our way to accommodate 3-digit exponents in other
 tests.  What I want to know is why this patch created a 3-digit output
 where there was none before.

I was wondering the same thing too, but when I grep'ed the regression
output files looking for exponents, I found float4-exp-three-digits.out
and int8-exp-three-digits.out.  I think this means we have had this
issue before, and we are going to have to either create a
numeric-exp-three-digits.out alternate output file, move the test to one
of the existing exp-three-digits files, or remove the tests.  Sorry, I
didn't expect any problems with this patch as it was so small and
localized.

What has me more concerned is the Solaris 10 failure.  This query:

SELECT to_char(float8 '999', 'D' || repeat('9', 
1000));

expects this:

999.000...

but on Solaris 10 gets this:

.00

Yes, the nines are gone, and only this query is failing.  Oddly, this
query did not fail, though the only difference is fewer decimal digits:

SELECT to_char(float8 '999', 'D');

This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Noah Misch
On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote:
 What has me more concerned is the Solaris 10 failure.  This query:
 
   SELECT to_char(float8 '999', 'D' || repeat('9', 
 1000));
 
 expects this:
 
   999.000...
 
 but on Solaris 10 gets this:
 
   .00
 
 Yes, the nines are gone, and only this query is failing.  Oddly, this
 query did not fail, though the only difference is fewer decimal digits:
 
   SELECT to_char(float8 '999', 'D');
 
 This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

Use of the f conversion specifier with precision greater than 512 is not
portable; I get a similar diff on AIX 7.  Until this patch, PostgreSQL would
not use arbitrarily-large precisions on that conversion specifier.  (Who would
guess, but the e conversion specifier is not affected.)

I recommend adding a configure test to use our snprintf.c replacements if
sprintf(%.*f, 65536, 999.0) gives unexpected output.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Noah Misch
When you posted this, I made a note to review it.

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
 This junk digit zeroing matches the Oracle behavior:
 
   SELECT to_char(1.123456789123456789123456789d, 
 '9.9') as x from dual;
   --
   1.12345678912345680
 
 Our output with the patch would be:
 
   SELECT to_char(float8 '1.123456789123456789123456789', 
 '9.9');
   --
   1.12345678912345000
 
 which is pretty close.

PostgreSQL 9.4 returns 1.12345678912346.  Your patch truncates digits past
DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG
digits.  PostgreSQL must continue to round.

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant.  Should we match Oracle in this respect while
we're breaking compatibility anyway?  I tend to think yes.

 *** int4_to_char(PG_FUNCTION_ARGS)
 *** 5214,5221 
   /* we can do it easily because float8 won't lose any precision 
 */
   float8  val = (float8) value;
   
 ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
 ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val);
   
   /*
* Swap a leading positive sign for a space.
 --- 5207,5213 
   /* we can do it easily because float8 won't lose any precision 
 */
   float8  val = (float8) value;
   
 ! orgnum = psprintf(%+.*e, Num.post, val);

Neither the submission notes nor the commit messages mentioned it, but this
improves behavior for to_char(int4, text).  Specifically, it helps 
formats with more than about 500 decimal digits:

  SELECT length(to_char(1, '9D' || repeat('9', 800) || ''));


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Bruce Momjian
On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
 On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote:
  What has me more concerned is the Solaris 10 failure.  This query:
  
  SELECT to_char(float8 '999', 'D' || repeat('9', 
  1000));
  
  expects this:
  
  999.000...
  
  but on Solaris 10 gets this:
  
  .00
  
  Yes, the nines are gone, and only this query is failing.  Oddly, this
  query did not fail, though the only difference is fewer decimal digits:
  
  SELECT to_char(float8 '999', 'D');
  
  This smells like a libc bug, e.g. OmniOS 5.11 passed the test.
 
 Use of the f conversion specifier with precision greater than 512 is not
 portable; I get a similar diff on AIX 7.  Until this patch, PostgreSQL would
 not use arbitrarily-large precisions on that conversion specifier.  (Who would
 guess, but the e conversion specifier is not affected.)

Yeah.

 I recommend adding a configure test to use our snprintf.c replacements if
 sprintf(%.*f, 65536, 999.0) gives unexpected output.

Do we really want to go to our /port snprintf just to handle 512+
digits?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Bruce Momjian
On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
 When you posted this, I made a note to review it.
 
 On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
  This junk digit zeroing matches the Oracle behavior:
  
  SELECT to_char(1.123456789123456789123456789d, 
  '9.9') as x from dual;
  --
  1.12345678912345680
  
  Our output with the patch would be:
  
  SELECT to_char(float8 '1.123456789123456789123456789', 
  '9.9');
  --
  1.12345678912345000
  
  which is pretty close.
 
 PostgreSQL 9.4 returns 1.12345678912346.  Your patch truncates digits past
 DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG
 digits.  PostgreSQL must continue to round.

Ah, that rounding is a big problem.  I can't just round because if the
digit to be rounded up is '9', I have to set that to zero and increment
the previous digit, and that could cascade and shift the entire string
one over.  I think I have to go back to the original code that does the
weird computations to figure out how many digits are on the left of the
decimal point, then set the format string after.  I was hoping my patch
could clean that all up, but we now need snprintf to do that rounding
for us.  :-(

 These outputs show Oracle treating 17 digits as significant while PostgreSQL
 treats 15 digits as significant.  Should we match Oracle in this respect while
 we're breaking compatibility anyway?  I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

  *** int4_to_char(PG_FUNCTION_ARGS)
  *** 5214,5221 
  /* we can do it easily because float8 won't lose any precision 
  */
  float8  val = (float8) value;

  !   orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
  !   snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val);

  /*
   * Swap a leading positive sign for a space.
  --- 5207,5213 
  /* we can do it easily because float8 won't lose any precision 
  */
  float8  val = (float8) value;

  !   orgnum = psprintf(%+.*e, Num.post, val);
 
 Neither the submission notes nor the commit messages mentioned it, but this
 improves behavior for to_char(int4, text).  Specifically, it helps 
 formats with more than about 500 decimal digits:
 
   SELECT length(to_char(1, '9D' || repeat('9', 800) || ''));

Wow, that is surprising.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Bruce Momjian
On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
   *** int4_to_char(PG_FUNCTION_ARGS)
   *** 5214,5221 
 /* we can do it easily because float8 won't lose any 
   precision */
 float8  val = (float8) value;
 
   ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
   ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, 
   val);
 
 /*
  * Swap a leading positive sign for a space.
   --- 5207,5213 
 /* we can do it easily because float8 won't lose any 
   precision */
 float8  val = (float8) value;
 
   ! orgnum = psprintf(%+.*e, Num.post, val);
  
  Neither the submission notes nor the commit messages mentioned it, but this
  improves behavior for to_char(int4, text).  Specifically, it helps 
  formats with more than about 500 decimal digits:
  
SELECT length(to_char(1, '9D' || repeat('9', 800) || ''));
 
 Wow, that is surprising.

Ah, yes.  There was a problem where were clipping off int4  output
to MAXDOUBLEWIDTH, which made no sense, so there was the fix for that as
well.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-22 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
 I recommend adding a configure test to use our snprintf.c replacements if
 sprintf(%.*f, 65536, 999.0) gives unexpected output.

 Do we really want to go to our /port snprintf just to handle 512+
 digits?

I'd rather not go that direction (that is, to using a configure test).
It assumes that all copies of libc on a particular platform behave the
same, which seems like a bad bet to me.  I think we'd be better off to
avoid asking libc to do anything that might not work everywhere.

On the other hand, this line of thought might end up having you
reimplement in formatting.c the same logic I put into snprintf.c
recently, which seems a bit silly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-21 Thread Bruce Momjian
On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
 In September, while researching the to_char() buffer overflow bugs fixed
 in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
 inconsistency in how to_char() does zero-padding for float4/8 values. 
 Now that 9.4.1 is released and I am home for a while, I am ready to
 address this.
...
 float4/8 are padding to the internal precision, while int4/numeric are
 padding based on the requested precision.  This is inconsistent.
 
 The first attached patch fixes this, and also zeros the junk digits
 which exceed the precision of the underlying type:

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-21 Thread David Rowley
On 22 March 2015 at 14:46, Bruce Momjian br...@momjian.us wrote:

 On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
  In September, while researching the to_char() buffer overflow bugs fixed
  in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
  inconsistency in how to_char() does zero-padding for float4/8 values.
  Now that 9.4.1 is released and I am home for a while, I am ready to
  address this.
 ...
  float4/8 are padding to the internal precision, while int4/numeric are
  padding based on the requested precision.  This is inconsistent.
 
  The first attached patch fixes this, and also zeros the junk digits
  which exceed the precision of the underlying type:

 Patch applied.


This seems to have broken jacana.  Looks like MSVC by default has a 3 digit
exponent.

Going by this:
https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
that it can quite easily be set back to 2.

I've attached a patch which seems to fix the issue.

Regards

David Rowley
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 2f07a58..51f0884 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -262,8 +262,16 @@ startup_hacks(const char *progname)
/* In case of general protection fault, don't show GUI popup 
box */
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
}
+
 #endif   /* WIN32 */
 
+#ifdef _MSC_VER
+
+   /* By default MSVC has a 3 digit exponent. */
+   _set_output_format(_TWO_DIGIT_EXPONENT);
+
+#endif   /* _MSC_VER */
+
/*
 * Initialize dummy_spinlock, in case we are on a platform where we have
 * to use the fallback implementation of pg_memory_barrier().

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)

2015-03-21 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 This seems to have broken jacana.  Looks like MSVC by default has a 3 digit
 exponent.

jacana was broken before this patch; but some other Windows critters
are now unhappy as well.

 Going by this:
 https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
 that it can quite easily be set back to 2.

 I've attached a patch which seems to fix the issue.

That seems likely to have side-effects far beyond what's appropriate.
We have gone out of our way to accommodate 3-digit exponents in other
tests.  What I want to know is why this patch created a 3-digit output
where there was none before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers