Re: to_date() and to_timestamp() with negative years

2021-11-04 Thread Adrian Klaver

On 11/4/21 10:29 AM, Bryn Llewellyn wrote:

/adrian.kla...@aklaver.com  wrote:/

Not sure how this can be handled in a totally predictable way given 
the unpredictable ways in which datetime strings are formatted?


The only thing I can say it is it points out that when working with 
datetimes settling on a standard format is your best defense against 
unpredictable results.


Thank you very much, again, for your help with my seemingly endless 
nitpicking questions on this matter, Adrian. Here's the most favorable 
conclusion that I can draw:




3. The rules are hard to understand and the PG doc gives insufficient 
detail to allow the outcomes in corner cases like you just showed us to 
be predicted confidently. Some users seek to understand the rules by 
reading PG's source code.


I would say that is because datetimes in string formats are often hard 
to understand as a result of folks inventing their own formats.


As an example a commit message of mine from some years ago:

"
Fix issue with date parsing of Javascript dates coming from browser
on Windows. This occurred in both Firefox and Chrome. The issue being
that the date had a timezone of Pacific Standard Time instead of PST 
like it does from a Linux machine. dateutils choked on that timezone and 
therefore the date was not parsed. The fix was to add fuzzy=True to the 
dateutils.parse(). This allows dateutils to skip over anything it does 
not understand and parse the rest. This works as the date does include 
the correct tz offset. For the record the date format of concern is-Tue 
Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)

"

Can't remember what version of Windows this was. The dates ended up in a 
Postgres database via Python code in a Django app. This fix is one of 
the reasons I really like the Python dateutils library. The solution being:


from dateutil.parser import parse

parse('Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)', 
fuzzy=True)

datetime.datetime(2015, 12, 8, 0, 0, tzinfo=tzoffset(None, 28800))

To see what it is doing:

parse('Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)', 
fuzzy_with_tokens=True)


(datetime.datetime(2015, 12, 8, 0, 0, tzinfo=tzoffset(None, 28800)),
 (' ', ' ', ' ', ' (Pacific Standard Time)'))

where everything after the datetime are tokens it ignored.

I include this mainly as illustration that data clean up maybe more 
practical before it ever hits the database and in a library that is 
specialized for the task at hand.




4. Certainly, the rules set a significant parsing challenge. You hint 
that they might even prevent a complete practical solution to be 
implemented.


Yes that would depend on someone coding an AI that can fully understand 
people and what there intentions where from a string value. Given the 
answers I get when asking people directly what they intended, I'm not 
holding my breath.




5. None of this matters when the app designer has the freedom to define 
how date-time values will be provided, as text values, by user 
interfaces or external systems. In these cases, the complexity can be 
controlled by edict and correct solutions can be painlessly implemented 
and tested. Not a day goes by that I don't have to enter a date value at 
a UI. And in every case, a UI gadget constrains my input and makes its 
individual fields available to the programmer without the need for 
parsing—so there's reason to be optimistic. The programmer can easily 
build the text representation of the date-time value to conform to the 
simple rules that the overall application design specified.


Yep, constraining the imagination of the end user solves a lot of problems.



6. In some cases text data that's to be converted arrives in a format 
that cannot be influenced. And it might contain locutions like we've 
been discussing ("zero" meaning "1 BC", unfavorable paradigms for 
separating tokens, and the like). In these cases, the diligent 
programmer might, just, be able to use the full arsenal of available 
tools to implement a scheme that faultlessly parses the input. But the 
likelihood of bugs seems to be pretty big.


From my experience that pretty much defines all aspects of programming.


I'll say "case closed, now" — from my side, at least.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: to_date() and to_timestamp() with negative years

2021-11-04 Thread Bryn Llewellyn
> adrian.kla...@aklaver.com wrote:
> 
> Not sure how this can be handled in a totally predictable way given the 
> unpredictable ways in which datetime strings are formatted?
> 
> The only thing I can say it is it points out that when working with datetimes 
> settling on a standard format is your best defense against unpredictable 
> results.

Thank you very much, again, for your help with my seemingly endless nitpicking 
questions on this matter, Adrian. Here's the most favorable conclusion that I 
can draw:

1. A precedent has been set by The SQL Standard folks together with the 
PostgreSQL implementation and other implementations like Oracle Database. All 
this stretches back a long time—to more than four decades ago.

2. This has brought us in PG to the rules that "Table 9.26. Template Patterns 
for Date/Time Formatting", "Table 9.27. Template Pattern Modifiers for 
Date/Time Formatting", and "Usage notes for date/time formatting" set out and 
interpret.

3. The rules are hard to understand and the PG doc gives insufficient detail to 
allow the outcomes in corner cases like you just showed us to be predicted 
confidently. Some users seek to understand the rules by reading PG's source 
code.

4. Certainly, the rules set a significant parsing challenge. You hint that they 
might even prevent a complete practical solution to be implemented.

5. None of this matters when the app designer has the freedom to define how 
date-time values will be provided, as text values, by user interfaces or 
external systems. In these cases, the complexity can be controlled by edict and 
correct solutions can be painlessly implemented and tested. Not a day goes by 
that I don't have to enter a date value at a UI. And in every case, a UI gadget 
constrains my input and makes its individual fields available to the programmer 
without the need for parsing—so there's reason to be optimistic. The programmer 
can easily build the text representation of the date-time value to conform to 
the simple rules that the overall application design specified.

6. In some cases text data that's to be converted arrives in a format that 
cannot be influenced. And it might contain locutions like we've been discussing 
("zero" meaning "1 BC", unfavorable paradigms for separating tokens, and the 
like). In these cases, the diligent programmer might, just, be able to use the 
full arsenal of available tools to implement a scheme that faultlessly parses 
the input. But the likelihood of bugs seems to be pretty big.

I'll say "case closed, now" — from my side, at least.

Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Adrian Klaver

On 11/3/21 19:12, Bryn Llewellyn wrote:

/adrian.kla...@aklaver.com  wrote:/




Back to the point about separators, the "Current" doc has this bullet:

«
A separator (a space or non-letter/non-digit character) in the template 
string of to_timestamp and to_date matches any single separator in 
the input string or is skipped, unless the FX option is used...

»

(There's no such explanation in the Version 11 doc—but never mind that.) 
I read this to mean that a space IS a viable separator. And yet Tom's 
"nope, the space doesn't count [as a separator]" claims the opposite. 
The bullet's wording, by the way, says that the minus sign is a 
separator. But yet it can also be taken to confer the meaning "BC" to a 
year. No wonder I'm confused.


Elsewhere the "Current" doc says that runs of two or more spaces have 
the same effect as a single space (in the absence of FX or FM complexity).


No wonder that examples like I showed sometimes produce the wrong 
results, even after more than one round of tinkering with the C 
implementation. Try this test (the to-be-converted text has runs of five 
spaces, and the template has runs of ten spaces):


It comes down to determining what is a separator and what is the 
negative sign.




select to_date('     1950     02     14', '          MM          DD');
select to_date('     -1950     02     14', '          MM          DD');


The row above returns:

select to_date(' -1950 02 14', '  MM  DD');
to_date
---
 1950-02-14 BC

As you state below.

Change it to:

select to_date(' -1950 02 14', '   MM  DD');
  to_date

 1950-02-14

and you don't get the BC as -1950 is taken as separator(-)1950 not 
negative(-)1950.




select to_date('     14     02     1950', 'DD          MM          ');
select to_date('     14     02     -1950', 'DD          MM          ');

The above returns:

select to_date(' 14 02 -1950', 'DD  MM  ');
  to_date

 1950-02-14

Change it to:

select to_date(' 14 02 -1950', 'DD  MM');
to_date
---
 1950-02-14 BC

and you get BC as -1950 is taken as negative(-)1950 not separator(-)1950.


Not sure how this can be handled in a totally predictable way given the 
unpredictable ways in which datetime strings are formatted?


The only thing I can say it is it points out that when working with 
datetimes settling on a standard format is your best defense against 
unpredictable results.




"-1950" is taken as "BC" in the second to_date() but it is not so taken 
in the fourth to_date().





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Bryn Llewellyn
> adrian.kla...@aklaver.com wrote:
> 
> On 11/3/21 17:00, Adrian Klaver wrote:
>> On 11/3/21 15:56, Bryn Llewellyn wrote:
>>>> t...@sss.pgh.pa.us wrote:
>>>> 
> 
>>> And you have to have some kind of separator between the years substring and 
>>> the adjacent one(s) even to succeed with years that have more than four 
>>> digits. Another usage note stresses that while this is OK:
>>> 
>>> select to_date('12340101', 'MMDD');
>>> 
>>> this isn't:
>>> 
>>> select to_date('123450101', 'MMDD');
>> It does with a tweak:
>> select to_date('12345 0101', 'FMMMDD');
>>to_date
>> -
>>  12345-01-01
> 
> Well that was just plain wrong. I was not paying attention.

I'm missing your point, Adrian. But I must confess that I'm guilty of not 
expressing myself clearly with my reference to years with more than four 
digits. I meant that reference only to make the point that, in that use-case, a 
space CAN act as a separator.

By the way, I'd been studying the "Usage notes for date/time formatting" 
section in the Version 11 PG docs because that's the PG version that YugabyteDB 
uses. It has eleven bullets. The corresponding section in the "Current" PG docs 
has just eight bullets and the overall wording of the section is quite 
different. However, the bullet, in the Version 11 docs, that started me on the 
testing that spurred my first email in this thread is identically worded in 
both versions. It says this:

«
In to_timestamp and to_date, negative years are treated as signifying BC. If 
you write both a negative year and an explicit BC field, you get AD again. An 
input of year zero is treated as 1 BC.
»

Back to the point about separators, the "Current" doc has this bullet:

«
A separator (a space or non-letter/non-digit character) in the template string 
of to_timestamp and to_date matches any single separator in the input string or 
is skipped, unless the FX option is used...
»

(There's no such explanation in the Version 11 doc—but never mind that.) I read 
this to mean that a space IS a viable separator. And yet Tom's "nope, the space 
doesn't count [as a separator]" claims the opposite. The bullet's wording, by 
the way, says that the minus sign is a separator. But yet it can also be taken 
to confer the meaning "BC" to a year. No wonder I'm confused.

Elsewhere the "Current" doc says that runs of two or more spaces have the same 
effect as a single space (in the absence of FX or FM complexity).

No wonder that examples like I showed sometimes produce the wrong results, even 
after more than one round of tinkering with the C implementation. Try this test 
(the to-be-converted text has runs of five spaces, and the template has runs of 
ten spaces):

select to_date(' 1950 02 14', '  MM  DD');
select to_date(' -1950 02 14', '  MM  DD');

select to_date(' 14 02 1950', 'DD  MM  ');
select to_date(' 14 02 -1950', 'DD  MM  ');

"-1950" is taken as "BC" in the second to_date() but it is not so taken in the 
fourth to_date().



Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Adrian Klaver

On 11/3/21 17:00, Adrian Klaver wrote:

On 11/3/21 15:56, Bryn Llewellyn wrote:

t...@sss.pgh.pa.us wrote:



And you have to have some kind of separator between the years 
substring and the adjacent one(s) even to succeed with years that have 
more than four digits. Another usage note stresses that while this is OK:


select to_date('12340101', 'MMDD');

this isn't:

select to_date('123450101', 'MMDD');


It does with a tweak:

select to_date('12345 0101', 'FMMMDD');
    to_date
-
  12345-01-01


Well that was just plain wrong. I was not paying attention.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Adrian Klaver

On 11/3/21 15:56, Bryn Llewellyn wrote:

t...@sss.pgh.pa.us wrote:

Adrian Klaver  writes:

Haven't had time to work through what the above is actually doing.


I think the first two are explained by 489c9c340:

Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD.  This is how
the negative-century case works, so it seems sane to do likewise.

The last two look like a parsing issue: with no field separator (nope, the 
space doesn't count), the code is taking the dash as a field separator.


Thanks, Adrian, for trying those tests. So if PG is aiming for one consistent 
story for years that are less than one, in all three APIs (to_date() and its 
cousins, make_timestamp[tz](), and text-to-date-time typecasts), then work 
still remains.

The parsing problem seems to be a separable annoyance. I assume that Tom’s 
"nope, the space doesn't count” is a narrow comment on this corner of the 
implementation. It definitely counts here:

select to_date('12345 01 01', ' MM DD');

And you have to have some kind of separator between the years substring and the 
adjacent one(s) even to succeed with years that have more than four digits. 
Another usage note stresses that while this is OK:

select to_date('12340101', 'MMDD');

this isn't:

select to_date('123450101', 'MMDD');


It does with a tweak:

select to_date('12345 0101', 'FMMMDD');
   to_date
-
 12345-01-01




and nor is anything else that you might dream up that does not have a separator 
as mentioned above.

Tom asked about Oracle Database. I have a still-working Version 18.0 in a VM on my 
laptop. The query can't be so terse there because there's no implicit typecast from date 
to text. And there's the famous annoyance of "dual".


If you go here:

https://www.postgresql.org/docs/current/functions-formatting.html

and search on Oracle you will see that there are other exceptions. Like 
most things there is not complete agreement on how closely to follow 
someone else's code.




I tried this first:

select to_char(to_date('1234/01/01', '/MM/DD'), '/MM/DD') from dual;

It gets the same output back as the input you gave. So far so good. Then I 
changed the input to '-1234/01/01'. It caused this error:

ORA-01841: (full) year must be between -4713 and +, and not be 0

So it seems that Oracle Database has its own problems. But at least the wording 
"must… not be 0" is clear—and not what PG wants to support.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Bryn Llewellyn
> t...@sss.pgh.pa.us wrote:
> 
> Adrian Klaver  writes:
>> Haven't had time to work through what the above is actually doing.
> 
> I think the first two are explained by 489c9c340:
> 
>Also, arrange for the combination of a negative year and an
>explicit "BC" marker to cancel out and produce AD.  This is how
>the negative-century case works, so it seems sane to do likewise.
> 
> The last two look like a parsing issue: with no field separator (nope, the 
> space doesn't count), the code is taking the dash as a field separator.

Thanks, Adrian, for trying those tests. So if PG is aiming for one consistent 
story for years that are less than one, in all three APIs (to_date() and its 
cousins, make_timestamp[tz](), and text-to-date-time typecasts), then work 
still remains.

The parsing problem seems to be a separable annoyance. I assume that Tom’s 
"nope, the space doesn't count” is a narrow comment on this corner of the 
implementation. It definitely counts here:

select to_date('12345 01 01', ' MM DD');

And you have to have some kind of separator between the years substring and the 
adjacent one(s) even to succeed with years that have more than four digits. 
Another usage note stresses that while this is OK:

select to_date('12340101', 'MMDD');

this isn't:

select to_date('123450101', 'MMDD');

and nor is anything else that you might dream up that does not have a separator 
as mentioned above.

Tom asked about Oracle Database. I have a still-working Version 18.0 in a VM on 
my laptop. The query can't be so terse there because there's no implicit 
typecast from date to text. And there's the famous annoyance of "dual".

I tried this first:

select to_char(to_date('1234/01/01', '/MM/DD'), '/MM/DD') from dual;

It gets the same output back as the input you gave. So far so good. Then I 
changed the input to '-1234/01/01'. It caused this error:

ORA-01841: (full) year must be between -4713 and +, and not be 0

So it seems that Oracle Database has its own problems. But at least the wording 
"must… not be 0" is clear—and not what PG wants to support.





Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Tom Lane
Adrian Klaver  writes:
> Haven't had time to work through what the above is actually doing.

I think the first two are explained by 489c9c340:

Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD.  This is how
the negative-century case works, so it seems sane to do likewise.

The last two look like a parsing issue: with no field separator
(nope, the space doesn't count), the code is taking the dash
as a field separator.

regards, tom lane




Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Adrian Klaver

On 11/3/21 1:39 PM, Bryn Llewellyn wrote:

/adrian.kla...@aklaver.com  wrote:/


/Bryn wrote:/




In what PG version did you run your test?
I'm not surprised that grandfathered-in "solution" that I described 
will never go away—despite its questionable conceptual basis.


Thanks, Adrian. What results do these queries give in Version 14? 
(Forgive me, I'll make time to install v14 very soon.)


 select version();
  version 



 PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 
7.5.0, 64-bit




select make_timestamp(0,1,1,0,0,0);

ERROR:  date field value out of range: 0-01-01


select make_timestamp(-0,1,1,0,0,0);

ERROR:  date field value out of range: 0-01-01



select '-0001-01-01'::date; -- and similar for typecasts to timestamp 
and timestamptz

ERROR:  invalid input syntax for type date: "-0001-01-01"
LINE 1: select '-0001-01-01'::date;



select '-01-01'::date;

ERROR:  date/time field value out of range: "-01-01"
LINE 1: select '-01-01'::date;



select '--01-01'::date;

ERROR:  invalid input syntax for type date: "--01-01"
LINE 1: select '--01-01'::date;



select
   to_date( '15/06/-2021',    'DD/MM/'    ) as a1,
   to_date( '15/06/-2021 BC', 'DD/MM//AD' ) as a2,
   ''                                           as "-",
   to_date( '15 06 -2021',    'DD MM '    ) as b1,
   to_date( '15 06 -2021 BC', 'DD MM  AD' ) as b2;



  a1   | a2 | - | b1 |  b2
---++---++---
 2021-06-15 BC | 2021-06-15 |   | 2021-06-15 | 2021-06-15 BC

Haven't had time to work through what the above is actually doing.

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Bryn Llewellyn
adrian.kla...@aklaver.com wrote:

> Bryn wrote:
> 
>>> t...@sss.pgh.pa.us wrote:
>>> 
>>> Bryn Llewellyn  writes:
 Is there any chance that you might be bold and simply make negative "year" 
 values illegal in "to_date()" and "to_timestamp()" — just as they already 
 are in "make_timestamp()", "make_timestamptz()", and the "from text" 
 typecasts to date-time moment values?
>>> 
>>> Uh, what?
>>> 
>>> regression=# select make_timestamp(-44,3,15,0,0,0);
>>> make_timestamp
>>> 
>>> 0044-03-15 00:00:00 BC
>>> (1 row)
>>> 
>>> The other stuff you are talking about looks like confusion around which
>>> characters are minus signs in the data and which ones are field
>>> separators.  Given the very squishy definitions of to_date/to_timestamp,
>>> I'm not surprised if that works only with carefully chosen field
>>> layouts --- but it does work for me with all of these cases:
>>> 
>>> regression=# select to_date('-0044-03-15', '-MM-DD');
>>>to_date
>>> ---
>>> 0044-03-15 BC
>>> (1 row)
>>> 
>>> regression=# select to_date('03-15--0044', 'MM-DD-');
>>>to_date
>>> ---
>>> 0044-03-15 BC
>>> (1 row)
>>> 
>>> regression=# select to_date('03/15/-0044', 'MM/DD/');
>>>to_date
>>> ---
>>> 0044-03-15 BC
>>> (1 row)
>>> 
>>> I'd be the first to agree that that code is a mess and could stand to
>>> be rewritten --- but I seriously doubt that we'd take a patch that
>>> intentionally breaks cases that work fine today.  There's also the
>>> angle that these are supposed to be Oracle-compatible, so I wonder
>>> what Oracle does with such input.
>> I just ran your test:
>> select make_timestamp(-44,3,15,0,0,0);
>> in each of the three environments that I mentioned—but especially, 
>> therefore, in PG 13.4 (on macOS).
>> It cased the error that I mentioned:
>> ERROR:  22008: date field value out of range: -44-03-15
>> It's the same with "make_timestamp()".
> 
> https://www.postgresql.org/docs/14/release-14.html#id-1.11.6.5.3
> 
> "
> Allow make_timestamp()/make_timestamptz() to accept negative years (Peter 
> Eisentraut)
> 
> Negative values are interpreted as BC years.
> "
> 
>> In what PG version did you run your test?
>> I'm not surprised that grandfathered-in "solution" that I described will 
>> never go away—despite its questionable conceptual basis.

Thanks, Adrian. What results do these queries give in Version 14? (Forgive me, 
I'll make time to install v14 very soon.)

select make_timestamp(0,1,1,0,0,0);

select make_timestamp(-0,1,1,0,0,0);

select '-0001-01-01'::date; -- and similar for typecasts to timestamp and 
timestamptz

select '-01-01'::date;

select '--01-01'::date;

select
  to_date( '15/06/-2021','DD/MM/') as a1,
  to_date( '15/06/-2021 BC', 'DD/MM//AD' ) as a2,
  ''   as "-",
  to_date( '15 06 -2021','DD MM ') as b1,
  to_date( '15 06 -2021 BC', 'DD MM  AD' ) as b2;



Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Adrian Klaver

On 11/3/21 11:18, Bryn Llewellyn wrote:

t...@sss.pgh.pa.us wrote:

Bryn Llewellyn  writes:

Is there any chance that you might be bold and simply make negative "year" values illegal in "to_date()" and 
"to_timestamp()" — just as they already are in "make_timestamp()", "make_timestamptz()", and the "from 
text" typecasts to date-time moment values?


Uh, what?

regression=# select make_timestamp(-44,3,15,0,0,0);
 make_timestamp

0044-03-15 00:00:00 BC
(1 row)

The other stuff you are talking about looks like confusion around which
characters are minus signs in the data and which ones are field
separators.  Given the very squishy definitions of to_date/to_timestamp,
I'm not surprised if that works only with carefully chosen field
layouts --- but it does work for me with all of these cases:

regression=# select to_date('-0044-03-15', '-MM-DD');
to_date
---
0044-03-15 BC
(1 row)

regression=# select to_date('03-15--0044', 'MM-DD-');
to_date
---
0044-03-15 BC
(1 row)

regression=# select to_date('03/15/-0044', 'MM/DD/');
to_date
---
0044-03-15 BC
(1 row)

I'd be the first to agree that that code is a mess and could stand to
be rewritten --- but I seriously doubt that we'd take a patch that
intentionally breaks cases that work fine today.  There's also the
angle that these are supposed to be Oracle-compatible, so I wonder
what Oracle does with such input.


I just ran your test:

select make_timestamp(-44,3,15,0,0,0);

in each of the three environments that I mentioned—but especially, therefore, 
in PG 13.4 (on macOS).

It cased the error that I mentioned:

ERROR:  22008: date field value out of range: -44-03-15

It's the same with "make_timestamp()".


https://www.postgresql.org/docs/14/release-14.html#id-1.11.6.5.3

"
Allow make_timestamp()/make_timestamptz() to accept negative years 
(Peter Eisentraut)


Negative values are interpreted as BC years.
"


In what PG version did you run your test?

I'm not surprised that grandfathered-in "solution" that I described will never 
go away—despite its questionable conceptual basis.







--
Adrian Klaver
adrian.kla...@aklaver.com




Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Bryn Llewellyn
> t...@sss.pgh.pa.us wrote:
> 
> Bryn Llewellyn  writes:
>> Is there any chance that you might be bold and simply make negative "year" 
>> values illegal in "to_date()" and "to_timestamp()" — just as they already 
>> are in "make_timestamp()", "make_timestamptz()", and the "from text" 
>> typecasts to date-time moment values?
> 
> Uh, what?
> 
> regression=# select make_timestamp(-44,3,15,0,0,0);
> make_timestamp 
> 
> 0044-03-15 00:00:00 BC
> (1 row)
> 
> The other stuff you are talking about looks like confusion around which
> characters are minus signs in the data and which ones are field
> separators.  Given the very squishy definitions of to_date/to_timestamp,
> I'm not surprised if that works only with carefully chosen field
> layouts --- but it does work for me with all of these cases:
> 
> regression=# select to_date('-0044-03-15', '-MM-DD');
>to_date
> ---
> 0044-03-15 BC
> (1 row)
> 
> regression=# select to_date('03-15--0044', 'MM-DD-');
>to_date
> ---
> 0044-03-15 BC
> (1 row)
> 
> regression=# select to_date('03/15/-0044', 'MM/DD/');
>to_date
> ---
> 0044-03-15 BC
> (1 row)
> 
> I'd be the first to agree that that code is a mess and could stand to
> be rewritten --- but I seriously doubt that we'd take a patch that
> intentionally breaks cases that work fine today.  There's also the
> angle that these are supposed to be Oracle-compatible, so I wonder
> what Oracle does with such input.

I just ran your test:

select make_timestamp(-44,3,15,0,0,0);

in each of the three environments that I mentioned—but especially, therefore, 
in PG 13.4 (on macOS).

It cased the error that I mentioned:

ERROR:  22008: date field value out of range: -44-03-15

It's the same with "make_timestamp()".

In what PG version did you run your test?

I'm not surprised that grandfathered-in "solution" that I described will never 
go away—despite its questionable conceptual basis.






Re: to_date() and to_timestamp() with negative years

2021-11-03 Thread Tom Lane
Bryn Llewellyn  writes:
> Is there any chance that you might be bold and simply make negative "year" 
> values illegal in "to_date()" and "to_timestamp()" — just as they already are 
> in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts to 
> date-time moment values?

Uh, what?

regression=# select make_timestamp(-44,3,15,0,0,0);
 make_timestamp 

 0044-03-15 00:00:00 BC
(1 row)

The other stuff you are talking about looks like confusion around which
characters are minus signs in the data and which ones are field
separators.  Given the very squishy definitions of to_date/to_timestamp,
I'm not surprised if that works only with carefully chosen field
layouts --- but it does work for me with all of these cases:

regression=# select to_date('-0044-03-15', '-MM-DD');
to_date
---
 0044-03-15 BC
(1 row)

regression=# select to_date('03-15--0044', 'MM-DD-');
to_date
---
 0044-03-15 BC
(1 row)

regression=# select to_date('03/15/-0044', 'MM/DD/');
to_date
---
 0044-03-15 BC
(1 row)

I'd be the first to agree that that code is a mess and could stand to
be rewritten --- but I seriously doubt that we'd take a patch that
intentionally breaks cases that work fine today.  There's also the
angle that these are supposed to be Oracle-compatible, so I wonder
what Oracle does with such input.

regards, tom lane




to_date() and to_timestamp() with negative years

2021-11-03 Thread Bryn Llewellyn
I'm quoting here from "Usage notes for date/time formatting" just below "Table 
9.25. Template Pattern Modifiers for Date/Time Formatting" here:

https://www.postgresql.org/docs/current/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIMEMOD-TABLE

on the page "9.8. Data Type Formatting Functions". Find this:

«
In to_timestamp and to_date, negative years are treated as signifying BC. If 
you write both a negative year and an explicit BC field, you get AD again. An 
input of year zero is treated as 1 BC.
»

This seems to be a suspect solution looking for a problem for these reasons:

1. Nobody ever talks about dates by saying "the year minus 42". It's always "42 
BC". Nor do they talk about "year zero" in the Gregorian calendar 'cos there's 
no such year.

2. If you invoke "make_timestamp()" or "make_timestamptz()" with a negative (or 
zero) argument for "year", then you get the "22008" error.

3. The text-to-date typecast « '-2021-06-15'::date » gets the "22009" error, as 
it does with "".

4. The unary "minus" operator is shorthand for "subtract the operand from zero" 
— i.e. "-x" means "0 - x". But there is no year zero. And anyway, the 
difference between two "date" values is an "integer" value; and the difference 
between two "timestamp[tz]" values is an "interval" value.

Finally, the implementation seems to be buggy (as I observed it in PG 13.4 on 
my MacBook with up-to-date macOS). Try this:

\x on
select
  to_date( '15/06/-2021','DD/MM/') as a1,
  to_date( '15/06/-2021 BC', 'DD/MM//AD' ) as a2,
  ''   as "-",
  to_date( '15 06 -2021','DD MM ') as b1,
  to_date( '15 06 -2021 BC', 'DD MM  AD' ) as b2;
\x off

Notice that the difference between the first two expressions (that produce the 
values "a1" and "a2") and the second two expressions (that produce the values 
"b1" and "b2") is how the to-be-converted substrings for "DD", "MM", and "" 
are separated. Otherwise, they express the same intention. So "b1" should be 
the same as "a1" and "b2" should be the same as "a2".

This is the result:

a3 | 2021-06-15 BC
a4 | 2021-06-15
-  | 
b3 | 2021-06-15
b4 | 2021-06-15 BC

The "a" values are right (by the rule in the PG doc)—and the "b" values are 
wrong.

If you express the same intention without the complication of using "-2021", 
then you get the right results throughout (and for any other way that you care 
to separate the to-be-interpreted substrings).

Moreover, I have an old PG 11.9 in a Ubuntu VM. The same test gets these 
results:

a1 | 2022-06-15 BC
a2 | 2022-06-15 BC
-  | 
b1 | 2022-06-15 BC
b2 | 2022-06-15 BC

Also buggy. But in a different way. And moreover (again) — I tride the test in 
our current latest YugabyteDB. This uses the PG 11.2 C code as is. (You'll have 
to trust me when I say that the test executes entirely here and goes nowhere 
near our own code,) This is the result (again on my MacBook).

a1 | 2022-06-15 BC
a2 | 2022-06-15 BC
-  | 
b1 | 2021-06-15
b2 | 2021-06-15 BC

Also buggy. But in YET a different way.

It looks like you have a tranche of baroque C code that wrongly implements a 
misguided idea. And that it's been tinkered around with between at least a 
couple of release boundaries without proper testing.

Is there any chance that you might be bold and simply make negative "year" 
values illegal in "to_date()" and "to_timestamp()" — just as they already are 
in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts to 
date-time moment values?