Re: [HACKERS] Fix number skipping in to_number
On Monday, 25 September 2017, Nathan Wagner wrote: > On Thu, Aug 17, 2017 at 12:33:02PM +0100, Oliver Ford wrote: > > > Ok I've made that change in the attached v3. I'm not sure as I'm on > > en_US.UTF-8 locale too. Maybe something Windows specific? > > This patch applies against master (8485a25a), compiles, and > passes a make check. > > I tested both on my mac laptop, and my linux server. > > If we want this patch, I'd say it's ready for committer. We may want > (and I can't believe I'm saying this) more discussion as to exactly what > the strategy for to_number() (and friends) is. Do we want to duplicate > Oracle's functionality, or do we want a similar function to do similar > things, without necessarily having a goal of identical behavior to > oracle? > > For myself, I pretty much never use the to_date, to_number, or > to_timestamp functions except when porting oracle code. I do use the > to_char functions on occasion. If strftime were available, I probably > wouldn't use them. > > I would commit this patch and update the TODO with a goal of making > to_number as Oracle compatible as is reasonable. > > Thanks for your review. The issue is that Oracle throws errors on many more input cases than Postgres does, so making it exactly like Oracle could break a lot of existing users. E.g. to_number ('123,000', '999') returns '123' on Postgres, but throws an error on Oracle. So making it exactly Oracle-like could break existing users who might rely on the current behavior. My view is that we shouldn't deliberately introduce errors in order to be exactly like Oracle if we don't currently and there's a sane use case for current behavior. Do you have any examples of results that are different between Oracle and Postgres, and you think the Oracle result makes more sense?
Re: [HACKERS] Add Roman numeral conversion to to_number
I'll fix the brace, but there are two other patches in the first email for tests and docs. For some reason the commitfest app didn't pick them up. On Friday, 15 September 2017, Doug Doole wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation:not tested > > Code looks fine, but one niggly complaint at line 146 of the patch file > ("while (*cp) {"). A K&R style brace slipped in, which doesn't match the > formatting of the file. > > Given that this is providing new formatting options, there should be new > tests added that validate the options and error handling. > > There's also the "do we want this?" debate from the discussion thread that > still needs to be resolved. (I don't have an opinion either way.) > > I'm sending this back to the author to address the first two issues. > > The new status of this patch is: Waiting on Author > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Fix number skipping in to_number
On Thu, Aug 17, 2017 at 11:55 AM, Thomas Munro wrote: > Hmm. Just in case my macOS laptop (CC=Apple's clang, > LANG=en_NZ.UTF-8) was unduly affected by cosmic rays I tried on a > couple of nearby servers running FreeBSD 11.1 (CC=clang, LANG=) > and CentOS 7.3 (CC=gcc, LANG=en_US.utf-8) and got the same result: > int8 has lost some whitespace in to_char output. > > It looks a bit like it has lost a leading space for every ',' that was > present in the format string but which didn't finish up getting output > (because the number was too small). It looks a bit like that doesn't > happen for 'G', so let's compare the NUM_COMMA and NUM_G cases. Ahh.. > I'm not sure, but I think you might have a close-brace in the wrong > place here: > > @@ -4879,6 +4883,10 @@ NUM_processor(FormatNode *node, NUMDesc *Num, > char *inout, > continue; > } > }<--- this guy here might > need to move after "noadd = true"? > + if > (strncmp(Np->inout_p, Np->L_thousands_sep, separator_len) == 0) > + Np->inout_p += > separator_len - 1; > + else > + noadd = true; > break; > > case NUM_G: > > With that change the test passes for me. But why do we get different results? > > -- > Thomas Munro > http://www.enterprisedb.com Ok I've made that change in the attached v3. I'm not sure as I'm on en_US.UTF-8 locale too. Maybe something Windows specific? 0001-apply-number-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix number skipping in to_number
> The tests you added pass for me but the int8 test now fails with the > following (this is from regression.diff after running > 'PG_REGRESS_DIFF_OPTS="-dU10" make check'). It looks like some new > whitespace is appearing on the right in the output of to_char(). I > didn't try to understand why. > > @@ -453,34 +453,34 @@ > --+-- > 4567890123456789 | 4567890123456789 > (1 row) > > -- TO_CHAR() > -- > SELECT '' AS to_char_1, to_char(q1, '9G999G999G999G999G999'), > to_char(q2, '9,999,999,999,999,999') > FROM INT8_TBL; > to_char_1 |to_char |to_char > ---++ > - |123 |456 > + |123 | 456 > |123 | 4,567,890,123,456,789 > - | 4,567,890,123,456,789 |123 > + | 4,567,890,123,456,789 | 123 > | 4,567,890,123,456,789 | 4,567,890,123,456,789 > | 4,567,890,123,456,789 | -4,567,890,123,456,789 > (5 rows) > > SELECT '' AS to_char_2, to_char(q1, '9G999G999G999G999G999D999G999'), > to_char(q2, '9,999,999,999,999,999.999,999') > FROM INT8_TBL; > to_char_2 |to_char |to_char > ---++ > - |123.000,000 |456.000,000 > + |123.000,000 | 456.000,000 > |123.000,000 | 4,567,890,123,456,789.000,000 > - | 4,567,890,123,456,789.000,000 |123.000,000 > + | 4,567,890,123,456,789.000,000 | 123.000,000 > | 4,567,890,123,456,789.000,000 | 4,567,890,123,456,789.000,000 > | 4,567,890,123,456,789.000,000 | -4,567,890,123,456,789.000,000 > (5 rows) > > SELECT '' AS to_char_3, to_char( (q1 * -1), 'PR'), > to_char( (q2 * -1), '.999PR') > FROM INT8_TBL; > to_char_3 | to_char |to_char > ---++ > | <123> | <456.000> > | <123> | <4567890123456789.000> > That's strange, I can't replicate that issue on my Windows build. I've tried with 'PG_REGRESS_DIFF_OPTS="-dU10" make check' and all the tests (including int8) pass fine. The spacing in the results is perfectly normal. I wonder if something else on your build is causing this? I've also tried several "make check" options for different locales mentioned in the docs and they pass fine. > > One superficial comment after first glimpse at the patch: > > +if(!strncmp(Np->inout_p, Np->L_thousands_sep, separator_len)) > > I believe the usual coding around here would be if (strncmp(...) == 0) > Yes you're right, that is the coding standard. I've changed it to that in the attached v2. 0001-apply-number-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix number skipping in to_number
Prevents an issue where numbers can be skipped in the to_number() function when the format mask contains a "G" or a "," but the input string doesn't contain a separator. This resolves the TODO item "Fix to_number() handling for values not matching the format string". == Change == Currently, if the format mask in to_number() has a "G" or a ",", it will assume that the input string contains a separator character in the same place. If however a number is there instead, that number will be silently skipped and not appear in the output. So we get: select to_number('34,50','999,99'); to_number --- 340 (1 row) This patch checks the input string when it encounters a "G" or "," in the format mask. If the separator character is found, the code moves over it as normal. If it's not found, then the code no longer increments the relevant pointer so as not to skip the character. After the patch, we get the correct result: select to_number('34,50','999,99'); to_number --- 3450 (1 row) This is in line with Oracle's result. == Rationale == This patch is a small change, which leaves PostgreSQL behavior different from Oracle behavior in related cases. Oracle's implementation seems to read from right-to-left, and raises an "ORA-01722: invalid number" error if there are digits in the input string which don't have corresponding characters in the format mask. I have chosen not to throw such errors, because there are use cases for only returning part of a number string. For example, the following is an error on Oracle: select to_number('123,000', '999G') from dual; Error report - SQL Error: ORA-01722: invalid number But if you wanted to only take the characters before the comma, and discard the thousands part, you can do so on PostgreSQL with: select to_number('123,000', '999G'); to_number --- 123 (1 row) This is the current behavior. Which is why I think it makes more sense to do what PostgreSQL currently does and read from left-to-right. The only change, as mentioned above, is that the current behavior can skip a digit: select to_number('123456', '999G999'); to_number --- 12356 (1 row) After the patch, this returns all the digits: select to_number('123456', '999G999'); to_number --- 123456 (1 row) == Testing == Tested on Windows with MinGW using the latest checkout from master. Added regression tests to check for this new behavior. All existing tests pass. 0001-apply-number-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add Roman numeral conversion to to_number
On Thursday, 3 August 2017, Robert Haas wrote: > On Thu, Aug 3, 2017 at 9:25 AM, Oliver Ford > wrote: > > Adds to the to_number() function the ability to convert Roman numerals > > to a number. This feature is on the formatting.c TODO list. It is not > > currently implemented in either Oracle, MSSQL or MySQL so gives > > PostgreSQL an edge :-) > > I kind of put my head in my hands when I saw this. I'm not really > sure it's worth complicating the code for something that has so little > practical utility, but maybe other people will feel differently. I > can't deny the profound advantages associated with having a leg up on > Oracle. The formatting.c file specifies it as a TODO, so I thought implementing it would be worthwhile. As there is a to_roman conversion having it the other way is good for completeness. > > The error reporting is a little wonky, although maybe no wonkier than > anything else about these conversion routines. > > rhaas=# select to_number('q', 'rn'); > ERROR: invalid character "q" > > (hmm, no position) > > rhaas=# select to_number('dd', 'rn'); > ERROR: invalid character "D" at position 1 > > (now i get a position, but it's not really the right position; and the > problem isn't really that the character is invalid but that you don't > like me including it twice, and I said 'd' not 'D') > > rhaas=# select to_number('à', 'rn'); > ERROR: invalid character "?" > > (eh?) > > How much call is there for a format that can only represent values up to > 3999? > > The existing int_to_roman code goes up to 3999 so this patch is consistent with that. I could extend both to handle Unicode values for large numbers? > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >