Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Wed, Apr 1, 2015 at 11:48:37AM -0400, Bruce Momjian wrote: This should return something like 16.000... (per Oracle output at the URL above, float4 has 6 significant digits on my compiler) but I can't seem to figure how to get printf() to round non-fractional parts. I am afraid the only solution is to use printf's %e format and place the decimal point myself. Hearing nothing, I went with the %e approach; patch attached. The new output looks right: test= SELECT to_char(float4 '15.9123456789123456789000', repeat('9', 50) || '.' || repeat('9', 50)); to_char 16.00 (1 row) The fact I still don't have a complete solution suggests this is 9.6 material but I still want to work on it so it is ready. I will keep this patch for 9.6 unless I hear otherwise. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c new file mode 100644 index 40a353f..d283cd2 *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *** *** 113,125 #define DCH_MAX_ITEM_SIZ 12 /* max localized day name */ #define NUM_MAX_ITEM_SIZ 8 /* roman number (RN has 15 chars) */ - /* -- - * More is in float.c - * -- - */ - #define MAXFLOATWIDTH 60 - #define MAXDOUBLEWIDTH 500 - /* -- * Format parser structs --- 113,118 *** static DCHCacheEntry *DCH_cache_getnew(c *** 989,994 --- 982,989 static NUMCacheEntry *NUM_cache_search(char *str); static NUMCacheEntry *NUM_cache_getnew(char *str); static void NUM_cache_remove(NUMCacheEntry *ent); + static char *add_pre_zero_padding(char *num_str, int decimal_shift); + static char *add_post_zero_padding(char *num_str, int pad_digits); /* -- *** do { \ *** 5016,5021 --- 5011,5103 SET_VARSIZE(result, len + VARHDRSZ); \ } while (0) + /* + * add_pre_zero_padding + * + * printf() can only round non-fractional parts of a number if the number + * is in exponential format, so this function takes a number in that format + * and adds pre-decimal zero padding. + */ + static char * + add_pre_zero_padding(char *num_str, int decimal_shift) + { + /* one for decimal point, one for trailing null byte */ + char *out = palloc(strlen(num_str) + decimal_shift + 1 + 1), *out_p = out; + char *num_str_p = num_str; + bool decimal_found = false; + + /* copy the number before 'e' and shift the decimal point */ + while (*num_str_p *num_str_p != 'e') + { + if (*num_str_p == '.') + { + num_str_p++; + decimal_found = true; + } + else + { + *(out_p++) = *(num_str_p++); + if (decimal_found) + decimal_shift--; + } + } + + /* add zero pad digits */ + while (decimal_shift-- 0) + *(out_p++) = '0'; + + *(out_p++) = '\0'; + + pfree(num_str); + return out; + } + + /* + * add_post_zero_padding + * + * Some sprintf() implementations have a 512-digit precision limit, and we + * need sprintf() to round to the internal precision, so this function adds + * zero padding between the mantissa and exponent of an exponential-format + * number, or after the supplied string for non-exponent strings. + */ + static char * + add_post_zero_padding(char *num_str, int pad_digits) + { + /* one for decimal point, one for trailing null byte */ + char *out = palloc(strlen(num_str) + pad_digits + 1 + 1), *out_p = out; + char *num_str_p = num_str; + bool decimal_found = false; + + /* copy the number before 'e', or the entire string if no 'e' */ + while (*num_str_p *num_str_p != 'e') + { + if (*num_str_p == '.') + decimal_found = true; + *(out_p++) = *(num_str_p++); + } + + /* add zero pad digits */ + while (pad_digits-- 0) + { + if (!decimal_found) + { + *(out_p++) = '.'; + decimal_found = true; + } + + *(out_p++) = '0'; + } + + /* copy 'e' and everything after */ + while (*num_str_p) + *(out_p++) = *(num_str_p++); + + *(out_p++) = '\0'; + + pfree(num_str); + return out; + } + /* --- * NUMERIC to_number() (convert string to numeric) * --- *** int4_to_char(PG_FUNCTION_ARGS) *** 5214,5221 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val); /* * Swap a leading positive sign for a space. --- 5296,5307
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. I was able to get proper rounding with the attached patch. test= SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); to_char -- 1.12345678912346000 (1 row) Handling rounding for exponent-format values turned out to be simple. What has me stuck now is how to do rounding in the non-decimal part of the number, e.g. test= SELECT to_char(float4 '15.9123456789123456789000', repeat('9', 50) || '.' || repeat('9', 50)); to_char 1555753984.00 (1 row) This should return something like 16.000... (per Oracle output at the URL above, float4 has 6 significant digits on my compiler) but I can't seem to figure how to get printf() to round non-fractional parts. I am afraid the only solution is to use printf's %e format and place the decimal point myself. The fact I still don't have a complete solution suggests this is 9.6 material but I still want to work on it so it is ready. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c new file mode 100644 index 40a353f..2b5a440 *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *** *** 113,125 #define DCH_MAX_ITEM_SIZ 12 /* max localized day name */ #define NUM_MAX_ITEM_SIZ 8 /* roman number (RN has 15 chars) */ - /* -- - * More is in float.c - * -- - */ - #define MAXFLOATWIDTH 60 - #define MAXDOUBLEWIDTH 500 - /* -- * Format parser structs --- 113,118 *** static DCHCacheEntry *DCH_cache_getnew(c *** 989,994 --- 982,988 static NUMCacheEntry *NUM_cache_search(char *str); static NUMCacheEntry *NUM_cache_getnew(char *str); static void NUM_cache_remove(NUMCacheEntry *ent); + static char *add_zero_padding(char *num_str, int pad_digits); /* -- *** do { \ *** 5016,5021 --- 5010,5056 SET_VARSIZE(result, len + VARHDRSZ); \ } while (0) + /* + * add_zero_padding + * + * Some sprintf() implementations have a 512-digit precision limit, and we + * need sprintf() to round to the internal precision, so this function adds + * zero padding between the mantissa and exponent of an exponential-format + * number, or after the supplied string for non-exponent strings. + */ + static char * + add_zero_padding(char *num_str, int pad_digits) + { + /* one for decimal point, one for trailing null byte */ + char *out = palloc(strlen(num_str) + pad_digits + 1 + 1), *out_p = out; + char *num_str_p = num_str; + bool found_decimal = false; + + /* copy the number before 'e', or the entire string if no 'e' */ + while (*num_str_p *num_str_p != 'e' *num_str_p != 'E') + { + if (*num_str_p == '.') + found_decimal = true; + *(out_p++) = *(num_str_p++); + } + + if (!found_decimal) + *(out_p++) = '.'; + + /* add zero pad digits */ + while (pad_digits-- 0) + *(out_p++) = '0'; + + /* copy 'e' and everything after */ + while (*num_str_p) + *(out_p++) = *(num_str_p++); + + *(out_p++) = '\0'; + + pfree(num_str); + return out; + } + /* --- * NUMERIC to_number() (convert string to numeric) *
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. OK, I am fine in using those values if you can find them as compiler defines, but I don't see how we can grab those values from a user test on Oracle. There are some invisible float digits that don't appear in %f but can be shown if desired --- I think we used to do that in the regression tests, but found they added too much platform-specific randomness. Do we want to go in that direction? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Tue, Mar 24, 2015 at 03:08:26PM -0400, Bruce Momjian wrote: On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote: On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. OK, I am fine in using those values if you can find them as compiler defines, but I don't see how we can grab those values from a user test on Oracle. We encounter no authority higher than the test results, so it would be wrong to seek out and use a define that just happens to match a test result. Adding #define TO_CHAR_DBL_DIG 17 and #define TO_CHAR_FLT_DIG 9 is good. There are some invisible float digits that don't appear in %f but can be shown if desired --- I think we used to do that in the regression tests, but found they added too much platform-specific randomness. Do we want to go in that direction? Bare %f simply prints all digits before the decimal point and exactly six digits after the decimal point. Whether implementation-defined digits appear in that output depends on the number's magnitude. However, float8out and float4out do behave along the lines of your description. I do recommend pushing TO_CHAR in that direction, to make it more like Oracle while we're already breaking compatibility with PostgreSQL 9.4. How about if we have to_char() honor our extra_float_digits GUC, so users who want those digits can get them? It's not my first choice; for one thing, no value of extra_float_digits would yield 17 digits for float8 and 9 digits for float4. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Tue, Mar 24, 2015 at 09:26:10AM -0700, Jeff Anton wrote: The issue of significant (decimal) digits to and from floating point representation is a complex one. What 'significant' means may depend upon the intent. True. I meant simply that Oracle TO_CHAR emits no more than 17 nonzero digits for a BINARY_DOUBLE and no more than 9 nonzero digits for a BINARY_FLOAT. BTW: This is my first posting to this list. I should introduce myself. I'm Jeff Anton. I was the first Postgres project lead programmer working for Michael Stonebraker at U.C. Berkeley a very long time ago. The first version was never released. I've since worked for several db companies. Welcome back. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
Jeff == Jeff Anton antonpg...@hesiod.org writes: Jeff Postgresql seems to be using the first interpretation and Jeff reporting fewer digits. I've noticed this with pg_dump. That a Jeff dump and restore of floating point values does not produce the Jeff same floating point values. To me, that is inexcusable. Using Jeff the -Fc format, real values are preserved. I have a large Jeff database of security prices. I want accuracy above all. -Fc doesn't do anything different with floats than text dumps do. pg_dump (in all modes) uses the extra_float_digits setting to try and get a value that restores exactly. There have been issues with this in the past (the limit of extra_float_digits had to be raised from 2 to 3, iirc), but if it's happening now then that should probably be considered a bug. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
The issue of significant (decimal) digits to and from floating point representation is a complex one. What 'significant' means may depend upon the intent. There are (at least) two different tests which may need to be used. * How many digits can be stored and then accurately returned? or * How many decimal digits are needed to recreate a floating point value? Or in longer form, if you have a floating point value, you may want to print it in decimal form and then later scan that to recreate the exact bit pattern from the original floating point value. How many decimal digits do you need? The first question produces a smaller number of digits then the second one! The idea of zero padding is, IMO, a bad idea all together. It makes people feel better, but it adds inaccuracy. I've lost this interpretation so many times now that I only mention it for the real number geeks out there. Postgresql seems to be using the first interpretation and reporting fewer digits. I've noticed this with pg_dump. That a dump and restore of floating point values does not produce the same floating point values. To me, that is inexcusable. Using the -Fc format, real values are preserved. I have a large database of security prices. I want accuracy above all. I do not have time right now to produce the needed evidence for all these cases of floating point values. If there is interest I can produce this in a day or so. Jeff Anton BTW: This is my first posting to this list. I should introduce myself. I'm Jeff Anton. I was the first Postgres project lead programmer working for Michael Stonebraker at U.C. Berkeley a very long time ago. The first version was never released. I've since worked for several db companies. On 03/24/15 06:47, Noah Misch wrote: On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote: On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. OK, I am fine in using those values if you can find them as compiler defines, but I don't see how we can grab those values from a user test on Oracle. There are some invisible float digits that don't appear in %f but can be shown if desired --- I think we used to do that in the regression tests, but found they added too much platform-specific randomness. Do we want to go in that direction? How about if we have to_char() honor our extra_float_digits GUC, so users who want those digits can get them? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
I like the idea of using extra_float_digits (which I always set this to 3 to ensure I don't lose precision). http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1251 Theorem 15 When a binary IEEE single precision number is converted to the closest eight digit decimal number, it is not always possible to uniquely recover the binary number from the decimal one. However, if nine decimal digits are used, then converting the decimal number to the closest binary number will recover the original floating-point number. Keith Baker -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Bruce Momjian Sent: Tuesday, March 24, 2015 3:08 PM To: Noah Misch Cc: PostgreSQL-development Subject: Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float) On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote: On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4. OK, I am fine in using those values if you can find them as compiler defines, but I don't see how we can grab those values from a user test on Oracle. There are some invisible float digits that don't appear in %f but can be shown if desired --- I think we used to do that in the regression tests, but found they added too much platform-specific randomness. Do we want to go in that direction? How about if we have to_char() honor our extra_float_digits GUC, so users who want those digits can get them? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: I recommend adding a configure test to use our snprintf.c replacements if sprintf(%.*f, 65536, 999.0) gives unexpected output. Do we really want to go to our /port snprintf just to handle 512+ digits? I'd rather not go that direction (that is, to using a configure test). It assumes that all copies of libc on a particular platform behave the same, which seems like a bad bet to me. I think we'd be better off to avoid asking libc to do anything that might not work everywhere. On the other hand, this line of thought might end up having you reimplement in formatting.c the same logic I put into snprintf.c recently, which seems a bit silly. If we can't trust libc for 512+ precision (and I don't think a startup check is warranted), I think we should either use our internal snprintf for to_char() in all cases, or in cases where the precision is 512+. Of course, this hinges on the assumption that only to_char() cares about 512+ digits --- if not, a startup check seems a requirement. However, even if we have a working snprintf, the bigger problem is having to do rounding, e.g. this is the worst case: SELECT to_char(float4 '5.', '9D' || repeat('9', 1000) || ''); What we have to do here is to round to the specified precision, i.e. we can't just pad with zeros, which was my original approach. I am afraid that means we have to keep the odd coding where we call snprintf with zero decimal digits, get its length, then figure out the decimal precision, then call snprintf again to format the string with rounding. Then, we need to separate the mantissa from the exponent and add the desired number of zeros and copy it into a new longer string. So, instead of the code being cleaner, it will be even more messy, and will duplicate some of what we already do in our port/snprintf.c. Unfortunately, there is no way to call snprintf and tell it we want a specific number of decimal digits _and_ pad the rest with zeros. I think trying to do the rounding on the output string will never work well, e.g. 9. rounded to one decimal digit is 10.0. Yuck. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Sun, Mar 22, 2015 at 01:42:56AM -0400, Tom Lane wrote: David Rowley dgrowle...@gmail.com writes: This seems to have broken jacana. Looks like MSVC by default has a 3 digit exponent. jacana was broken before this patch; but some other Windows critters are now unhappy as well. Going by this: https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems that it can quite easily be set back to 2. I've attached a patch which seems to fix the issue. That seems likely to have side-effects far beyond what's appropriate. We have gone out of our way to accommodate 3-digit exponents in other tests. What I want to know is why this patch created a 3-digit output where there was none before. I was wondering the same thing too, but when I grep'ed the regression output files looking for exponents, I found float4-exp-three-digits.out and int8-exp-three-digits.out. I think this means we have had this issue before, and we are going to have to either create a numeric-exp-three-digits.out alternate output file, move the test to one of the existing exp-three-digits files, or remove the tests. Sorry, I didn't expect any problems with this patch as it was so small and localized. What has me more concerned is the Solaris 10 failure. This query: SELECT to_char(float8 '999', 'D' || repeat('9', 1000)); expects this: 999.000... but on Solaris 10 gets this: .00 Yes, the nines are gone, and only this query is failing. Oddly, this query did not fail, though the only difference is fewer decimal digits: SELECT to_char(float8 '999', 'D'); This smells like a libc bug, e.g. OmniOS 5.11 passed the test. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote: What has me more concerned is the Solaris 10 failure. This query: SELECT to_char(float8 '999', 'D' || repeat('9', 1000)); expects this: 999.000... but on Solaris 10 gets this: .00 Yes, the nines are gone, and only this query is failing. Oddly, this query did not fail, though the only difference is fewer decimal digits: SELECT to_char(float8 '999', 'D'); This smells like a libc bug, e.g. OmniOS 5.11 passed the test. Use of the f conversion specifier with precision greater than 512 is not portable; I get a similar diff on AIX 7. Until this patch, PostgreSQL would not use arbitrarily-large precisions on that conversion specifier. (Who would guess, but the e conversion specifier is not affected.) I recommend adding a configure test to use our snprintf.c replacements if sprintf(%.*f, 65536, 999.0) gives unexpected output. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
When you posted this, I made a note to review it. On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 which is pretty close. PostgreSQL 9.4 returns 1.12345678912346. Your patch truncates digits past DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG digits. PostgreSQL must continue to round. These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. *** int4_to_char(PG_FUNCTION_ARGS) *** 5214,5221 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val); /* * Swap a leading positive sign for a space. --- 5207,5213 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = psprintf(%+.*e, Num.post, val); Neither the submission notes nor the commit messages mentioned it, but this improves behavior for to_char(int4, text). Specifically, it helps formats with more than about 500 decimal digits: SELECT length(to_char(1, '9D' || repeat('9', 800) || '')); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote: What has me more concerned is the Solaris 10 failure. This query: SELECT to_char(float8 '999', 'D' || repeat('9', 1000)); expects this: 999.000... but on Solaris 10 gets this: .00 Yes, the nines are gone, and only this query is failing. Oddly, this query did not fail, though the only difference is fewer decimal digits: SELECT to_char(float8 '999', 'D'); This smells like a libc bug, e.g. OmniOS 5.11 passed the test. Use of the f conversion specifier with precision greater than 512 is not portable; I get a similar diff on AIX 7. Until this patch, PostgreSQL would not use arbitrarily-large precisions on that conversion specifier. (Who would guess, but the e conversion specifier is not affected.) Yeah. I recommend adding a configure test to use our snprintf.c replacements if sprintf(%.*f, 65536, 999.0) gives unexpected output. Do we really want to go to our /port snprintf just to handle 512+ digits? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: When you posted this, I made a note to review it. On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: This junk digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9') as x from dual; -- 1.12345678912345680 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9'); -- 1.12345678912345000 which is pretty close. PostgreSQL 9.4 returns 1.12345678912346. Your patch truncates digits past DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG digits. PostgreSQL must continue to round. Ah, that rounding is a big problem. I can't just round because if the digit to be rounded up is '9', I have to set that to zero and increment the previous digit, and that could cascade and shift the entire string one over. I think I have to go back to the original code that does the weird computations to figure out how many digits are on the left of the decimal point, then set the format string after. I was hoping my patch could clean that all up, but we now need snprintf to do that rounding for us. :-( These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. *** int4_to_char(PG_FUNCTION_ARGS) *** 5214,5221 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val); /* * Swap a leading positive sign for a space. --- 5207,5213 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = psprintf(%+.*e, Num.post, val); Neither the submission notes nor the commit messages mentioned it, but this improves behavior for to_char(int4, text). Specifically, it helps formats with more than about 500 decimal digits: SELECT length(to_char(1, '9D' || repeat('9', 800) || '')); Wow, that is surprising. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: *** int4_to_char(PG_FUNCTION_ARGS) *** 5214,5221 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, %+.*e, Num.post, val); /* * Swap a leading positive sign for a space. --- 5207,5213 /* we can do it easily because float8 won't lose any precision */ float8 val = (float8) value; ! orgnum = psprintf(%+.*e, Num.post, val); Neither the submission notes nor the commit messages mentioned it, but this improves behavior for to_char(int4, text). Specifically, it helps formats with more than about 500 decimal digits: SELECT length(to_char(1, '9D' || repeat('9', 800) || '')); Wow, that is surprising. Ah, yes. There was a problem where were clipping off int4 output to MAXDOUBLEWIDTH, which made no sense, so there was the fix for that as well. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
Bruce Momjian br...@momjian.us writes: On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: I recommend adding a configure test to use our snprintf.c replacements if sprintf(%.*f, 65536, 999.0) gives unexpected output. Do we really want to go to our /port snprintf just to handle 512+ digits? I'd rather not go that direction (that is, to using a configure test). It assumes that all copies of libc on a particular platform behave the same, which seems like a bad bet to me. I think we'd be better off to avoid asking libc to do anything that might not work everywhere. On the other hand, this line of thought might end up having you reimplement in formatting.c the same logic I put into snprintf.c recently, which seems a bit silly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: In September, while researching the to_char() buffer overflow bugs fixed in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an inconsistency in how to_char() does zero-padding for float4/8 values. Now that 9.4.1 is released and I am home for a while, I am ready to address this. ... float4/8 are padding to the internal precision, while int4/numeric are padding based on the requested precision. This is inconsistent. The first attached patch fixes this, and also zeros the junk digits which exceed the precision of the underlying type: Patch applied. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
On 22 March 2015 at 14:46, Bruce Momjian br...@momjian.us wrote: On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: In September, while researching the to_char() buffer overflow bugs fixed in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an inconsistency in how to_char() does zero-padding for float4/8 values. Now that 9.4.1 is released and I am home for a while, I am ready to address this. ... float4/8 are padding to the internal precision, while int4/numeric are padding based on the requested precision. This is inconsistent. The first attached patch fixes this, and also zeros the junk digits which exceed the precision of the underlying type: Patch applied. This seems to have broken jacana. Looks like MSVC by default has a 3 digit exponent. Going by this: https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems that it can quite easily be set back to 2. I've attached a patch which seems to fix the issue. Regards David Rowley diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 2f07a58..51f0884 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -262,8 +262,16 @@ startup_hacks(const char *progname) /* In case of general protection fault, don't show GUI popup box */ SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); } + #endif /* WIN32 */ +#ifdef _MSC_VER + + /* By default MSVC has a 3 digit exponent. */ + _set_output_format(_TWO_DIGIT_EXPONENT); + +#endif /* _MSC_VER */ + /* * Initialize dummy_spinlock, in case we are on a platform where we have * to use the fallback implementation of pg_memory_barrier(). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Zero-padding and zero-masking fixes for to_char(float)
David Rowley dgrowle...@gmail.com writes: This seems to have broken jacana. Looks like MSVC by default has a 3 digit exponent. jacana was broken before this patch; but some other Windows critters are now unhappy as well. Going by this: https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems that it can quite easily be set back to 2. I've attached a patch which seems to fix the issue. That seems likely to have side-effects far beyond what's appropriate. We have gone out of our way to accommodate 3-digit exponents in other tests. What I want to know is why this patch created a 3-digit output where there was none before. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers