Re: Date-Time dangling unit fix

2023-03-16 Thread Tom Lane
Hearing no further comments on this, I adjusted DecodeTimeOnly to
look more like DecodeDateTime as I recommended upthread, and pushed.

regards, tom lane




Re: Date-Time dangling unit fix

2023-03-10 Thread Tom Lane
Alexander Lakhin  writes:
> I also wonder how the units affect time zone parsing.
> With the patch:
> SELECT time with time zone '010203m+3';
> ERROR:  invalid input syntax for type time with time zone: "010203m+3"
> But without the patch:
> SELECT time with time zone '010203m+3';
>   01:02:03+03

Yeah, I think this is the ptype-still-set-at-end-of-loop check.
I'm fine with throwing an error for that.

> Though with "non-unit" spec:
> SELECT time with time zone '010203mmm+3';
>   01:02:03-03
> (With or without the patch.)
> It seems like "units" were just ignored in a time zone specification,
> but now they are rejected.

I think it's reading "mmm+3" as a POSIX timezone spec.  From memory,
POSIX allows any sequence of 3 or more letters as a zone abbreviation.
It looks like we're being lax and not enforcing the "3 or more" part:

regression=# set time zone 'foobar+3';
SET
regression=# select timeofday();
   timeofday

 Fri Mar 10 12:08:24.484853 2023 FOOBAR
(1 row)

regression=# set time zone 'fo+3';
SET
regression=# select timeofday();
 timeofday  

 Fri Mar 10 12:08:38.207311 2023 FO
(1 row)

regards, tom lane




Re: Date-Time dangling unit fix

2023-03-10 Thread Peter Eisentraut

On 04.03.23 22:05, Tom Lane wrote:

Joseph Koshakow  writes:

On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow  wrote:

I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.

Attached is the described patch.

I started to look at this, and soon noticed that while we have test cases
matching this sort of date input, there is no documentation for it.  The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals (durations).
But I don't have a copy of ISO 8601, and some googling fails to find any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1],
but nothing that looks like this.


There are additional formats in (the lesser known) ISO 8601-2, one of 
which looks like this:


'1985Y4M12D', calendar year 1985, April 12th

But that is entirely incompatible with the above example, because it has 
the units after the numbers.


Even more reason not to support the earlier example.





Re: Date-Time dangling unit fix

2023-03-09 Thread Alexander Lakhin

10.03.2023 03:26, Tom Lane wrote:

Joseph Koshakow  writes:

Also I removed some dead code from the previous patch.

That's a little weird, or maybe even a lot weird, but it's not
inherently nonsensical so I'm hesitant to stop accepting it.
However, if UNITS acts that way, then why is ISOTIME different?
So I'm inclined to remove ISOTIME's lookahead check

 if (i >= nf - 1 ||
 (ftype[i + 1] != DTK_NUMBER &&
  ftype[i + 1] != DTK_TIME &&
  ftype[i + 1] != DTK_DATE))
 return DTERR_BAD_FORMAT;

and rely on the ptype-still-set error at the bottom of the loop
to complain about nonsensical cases.


I also wonder how the units affect time zone parsing.
With the patch:
SELECT time with time zone '010203m+3';
ERROR:  invalid input syntax for type time with time zone: "010203m+3"
But without the patch:
SELECT time with time zone '010203m+3';
 01:02:03+03

Though with "non-unit" spec:
SELECT time with time zone '010203mmm+3';
 01:02:03-03
(With or without the patch.)
It seems like "units" were just ignored in a time zone specification,
but now they are rejected.

At the same time, I see that the time zone specification allows for any
letters with the +/- sign following:
SELECT time with time zone '010203anyletters+3';
 01:02:03-03

It's definitely a separate issue, I just want to note a new erroneous
condition.

Best regards,
Alexander




Re: Date-Time dangling unit fix

2023-03-09 Thread Tom Lane
Joseph Koshakow  writes:
> Also I removed some dead code from the previous patch.

This needs a rebase over bcc704b52, so I did that and made a
couple of other adjustments.

I'm inclined to think that you removed too much from DecodeTimeOnly.
That does accept date specs (at least for timetz), and I see no very
good reason for it not to accept a Julian date spec.  I also wonder
why you removed the UNITS: case there.  Seems like we want these
functions to accept the same syntax as much as possible.

I think the code is still a bit schizophrenic about placement of
ptype specs.  In the UNITS: case, we don't insist that a unit
apply to exactly the very next field; instead it applies to the next
one where it disambiguates.  So for instead this is accepted:

regression=# select 'J PM 1234567 1:23'::timestamp;
   timestamp

 1333-01-11 13:23:00 BC

That's a little weird, or maybe even a lot weird, but it's not
inherently nonsensical so I'm hesitant to stop accepting it.
However, if UNITS acts that way, then why is ISOTIME different?
So I'm inclined to remove ISOTIME's lookahead check

if (i >= nf - 1 ||
(ftype[i + 1] != DTK_NUMBER &&
 ftype[i + 1] != DTK_TIME &&
 ftype[i + 1] != DTK_DATE))
return DTERR_BAD_FORMAT;

and rely on the ptype-still-set error at the bottom of the loop
to complain about nonsensical cases.

Also, if we do keep the lookahead checks, the one in DecodeTimeOnly
could be simplified --- it's accepting some cases that actually
aren't supported there.

regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index a7558d39a0..13856b06d1 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO y2001m02d04 format */
+	int			ptype = 0;		/* "prefix type" for ISO and Julian formats */
 	int			i;
 	int			val;
 	int			dterr;
@@ -1071,6 +1071,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	{
 		char	   *cp;
 
+		/*
+		 * Allow a preceding "t" field, but no other units.
+		 */
 		if (ptype != 0)
 		{
 			/* Sanity check; should not fail this test */
@@ -1175,8 +1178,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			case DTK_NUMBER:
 
 /*
- * Was this an "ISO date" with embedded field labels? An
- * example is "y2001m02d04" - thomas 2001-02-04
+ * Deal with cases where previous field labeled this one
  */
 if (ptype != 0)
 {
@@ -1187,85 +1189,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	value = strtoint(field[i], &cp, 10);
 	if (errno == ERANGE)
 		return DTERR_FIELD_OVERFLOW;
-
-	/*
-	 * only a few kinds are allowed to have an embedded
-	 * decimal
-	 */
-	if (*cp == '.')
-		switch (ptype)
-		{
-			case DTK_JULIAN:
-			case DTK_TIME:
-			case DTK_SECOND:
-break;
-			default:
-return DTERR_BAD_FORMAT;
-break;
-		}
-	else if (*cp != '\0')
+	if (*cp != '.' && *cp != '\0')
 		return DTERR_BAD_FORMAT;
 
 	switch (ptype)
 	{
-		case DTK_YEAR:
-			tm->tm_year = value;
-			tmask = DTK_M(YEAR);
-			break;
-
-		case DTK_MONTH:
-
-			/*
-			 * already have a month and hour? then assume
-			 * minutes
-			 */
-			if ((fmask & DTK_M(MONTH)) != 0 &&
-(fmask & DTK_M(HOUR)) != 0)
-			{
-tm->tm_min = value;
-tmask = DTK_M(MINUTE);
-			}
-			else
-			{
-tm->tm_mon = value;
-tmask = DTK_M(MONTH);
-			}
-			break;
-
-		case DTK_DAY:
-			tm->tm_mday = value;
-			tmask = DTK_M(DAY);
-			break;
-
-		case DTK_HOUR:
-			tm->tm_hour = value;
-			tmask = DTK_M(HOUR);
-			break;
-
-		case DTK_MINUTE:
-			tm->tm_min = value;
-			tmask = DTK_M(MINUTE);
-			break;
-
-		case DTK_SECOND:
-			tm->tm_sec = value;
-			tmask = DTK_M(SECOND);
-			if (*cp == '.')
-			{
-dterr = ParseFractionalSecond(cp, fsec);
-if (dterr)
-	return dterr;
-tmask = DTK_ALL_SECS_M;
-			}
-			break;
-
-		case DTK_TZ:
-			tmask = DTK_M(TZ);
-			dterr = DecodeTimezone(field[i], tzp);
-			if (dterr)
-return dterr;
-			break;
-
 		case DTK_JULIAN:
 			/* previous field was a label for "julian date" */
 			if (value < 0)
@@ -1519,6 +1447,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* reject consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1576,6 +1507,10 @@ DecodeDateTime(char **field, int

Re: Date-Time dangling unit fix

2023-03-05 Thread Joseph Koshakow
Also I removed some dead code from the previous patch.

- Joe Koshakow
From 2ff08d729bca87992514d0651fdb62455e43cd8a Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Remove unknown ISO format, handle dandling units

This commit removes the date format of "y2001m02d04" and the time
format of "h04mm05s06". These were never documented and don't seem to
be valid ISO formats.

Additionally this commit handles repeated and dangling julian units
in DecodeDateTime.
---
 src/backend/utils/adt/datetime.c   | 219 ++---
 src/test/regress/expected/horology.out |  41 ++---
 src/test/regress/sql/horology.sql  |   4 +
 3 files changed, 36 insertions(+), 228 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d166613895..bf7cb94b52 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO y2001m02d04 format */
+	int			ptype = 0;		/* "prefix type" for ISO and Julian formats */
 	int			i;
 	int			val;
 	int			dterr;
@@ -1174,10 +1174,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 			case DTK_NUMBER:
 
-/*
- * Was this an "ISO date" with embedded field labels? An
- * example is "y2001m02d04" - thomas 2001-02-04
- */
 if (ptype != 0)
 {
 	char	   *cp;
@@ -1188,84 +1184,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	if (errno == ERANGE)
 		return DTERR_FIELD_OVERFLOW;
 
-	/*
-	 * only a few kinds are allowed to have an embedded
-	 * decimal
-	 */
-	if (*cp == '.')
-		switch (ptype)
-		{
-			case DTK_JULIAN:
-			case DTK_TIME:
-			case DTK_SECOND:
-break;
-			default:
-return DTERR_BAD_FORMAT;
-break;
-		}
-	else if (*cp != '\0')
+	if (*cp != '.' && *cp != '\0')
 		return DTERR_BAD_FORMAT;
 
 	switch (ptype)
 	{
-		case DTK_YEAR:
-			tm->tm_year = value;
-			tmask = DTK_M(YEAR);
-			break;
-
-		case DTK_MONTH:
-
-			/*
-			 * already have a month and hour? then assume
-			 * minutes
-			 */
-			if ((fmask & DTK_M(MONTH)) != 0 &&
-(fmask & DTK_M(HOUR)) != 0)
-			{
-tm->tm_min = value;
-tmask = DTK_M(MINUTE);
-			}
-			else
-			{
-tm->tm_mon = value;
-tmask = DTK_M(MONTH);
-			}
-			break;
-
-		case DTK_DAY:
-			tm->tm_mday = value;
-			tmask = DTK_M(DAY);
-			break;
-
-		case DTK_HOUR:
-			tm->tm_hour = value;
-			tmask = DTK_M(HOUR);
-			break;
-
-		case DTK_MINUTE:
-			tm->tm_min = value;
-			tmask = DTK_M(MINUTE);
-			break;
-
-		case DTK_SECOND:
-			tm->tm_sec = value;
-			tmask = DTK_M(SECOND);
-			if (*cp == '.')
-			{
-dterr = ParseFractionalSecond(cp, fsec);
-if (dterr)
-	return dterr;
-tmask = DTK_ALL_SECS_M;
-			}
-			break;
-
-		case DTK_TZ:
-			tmask = DTK_M(TZ);
-			dterr = DecodeTimezone(field[i], tzp);
-			if (dterr)
-return dterr;
-			break;
-
 		case DTK_JULIAN:
 			/* previous field was a label for "julian date" */
 			if (value < 0)
@@ -1510,6 +1433,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1536,7 +1462,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			 ftype[i + 1] != DTK_TIME &&
 			 ftype[i + 1] != DTK_DATE))
 			return DTERR_BAD_FORMAT;
-
 		ptype = val;
 		break;
 
@@ -1567,6 +1492,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -1933,7 +1862,7 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO h04mm05s06 format */
+	int			ptype = 0;		/* "prefix type" for ISO format */
 	int			i;
 	int			val;
 	int			dterr;
@@ -2060,133 +1989,12 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 			case DTK_NUMBER:
 
 /*
- * Was this an "ISO time" with embedded field labels? An
- * example is "h04mm05s06" - thomas 2001-02-04
+ * Was this an "ISO time" An example is "T040506.789"
  */
 if (ptype != 0)
 {
-	char	   *cp;
-	int			value;
-
-	/* Only accept a date under limited circumstances */
 	switch (ptype)
 	{
-		case DTK_JULIAN:
-		case DTK_YEAR:
-		case DTK_MONTH:
-			

Re: Date-Time dangling unit fix

2023-03-05 Thread Joseph Koshakow
On Sun, Mar 5, 2023 at 12:54 PM Tom Lane  wrote:
>
> We do accept this:
>
> => select '12:34'::time;
>time
> --
>  12:34:00
> (1 row)
>
> so that must be going through a different code path, which I didn't
> try to identify yet.

That query will contain a single field of "12:34" with ftype DTK_TIME.
That will call into DecodeTime(), which calls into DecodeTimeCommon(),
where we have:

*tmask = DTK_TIME_M;

- Joe Koshakow


Re: Date-Time dangling unit fix

2023-03-05 Thread Tom Lane
[ I removed Lockhart, because he's taken no part in Postgres work for
  more than twenty years; if that address even still works, you're
  just bugging him ]

Alexander Lakhin  writes:
> In fact,
> SELECT time 'h04mm05s06';
> doesn't work for many years, but
> SELECT time 'h04mm05s06.0';
> still does.

I traced that down to this in DecodeTimeOnly:

if ((fmask & DTK_TIME_M) != DTK_TIME_M)
return DTERR_BAD_FORMAT;

where we have

#define DTK_ALL_SECS_M  (DTK_M(SECOND) | DTK_M(MILLISECOND) | 
DTK_M(MICROSECOND))
#define DTK_TIME_M  (DTK_M(HOUR) | DTK_M(MINUTE) | DTK_ALL_SECS_M)

So in other words, this test insists on seeing hour, minute, second,
*and* fractional-second fields.  That seems obviously too picky.
It might not matter if we rip out this syntax, but I see other similar
tests so I suspect some of them will still be reachable.

Personally I'd say that hh:mm is a plenty complete enough time, and
whether you write seconds is optional, let alone fractional seconds.
We do accept this:

=> select '12:34'::time;
   time   
--
 12:34:00
(1 row)

so that must be going through a different code path, which I didn't
try to identify yet.

regards, tom lane




Re: Date-Time dangling unit fix

2023-03-05 Thread Joseph Koshakow
Attached is a patch for removing the discussed format of date-times.
From f35284762c02ed466496e4e562b5f95a884b5ef1 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Remove unknown ISO format, handle dandling units

This commit removes the date format of "y2001m02d04" and the time
format of "h04mm05s06". These were never documented and don't seem to
be valid ISO formats.

Additionally this commit handles repeated and dangling julian units
in DecodeDateTime.
---
 src/backend/utils/adt/datetime.c   | 210 ++---
 src/test/regress/expected/horology.out |  41 ++---
 src/test/regress/sql/horology.sql  |   4 +
 3 files changed, 37 insertions(+), 218 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d166613895..51b72ad6c2 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO y2001m02d04 format */
+	int			ptype = 0;		/* "prefix type" for ISO and Julian formats */
 	int			i;
 	int			val;
 	int			dterr;
@@ -1174,10 +1174,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 			case DTK_NUMBER:
 
-/*
- * Was this an "ISO date" with embedded field labels? An
- * example is "y2001m02d04" - thomas 2001-02-04
- */
 if (ptype != 0)
 {
 	char	   *cp;
@@ -1188,84 +1184,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	if (errno == ERANGE)
 		return DTERR_FIELD_OVERFLOW;
 
-	/*
-	 * only a few kinds are allowed to have an embedded
-	 * decimal
-	 */
-	if (*cp == '.')
-		switch (ptype)
-		{
-			case DTK_JULIAN:
-			case DTK_TIME:
-			case DTK_SECOND:
-break;
-			default:
-return DTERR_BAD_FORMAT;
-break;
-		}
-	else if (*cp != '\0')
+	if (*cp != '.' && *cp != '\0')
 		return DTERR_BAD_FORMAT;
 
 	switch (ptype)
 	{
-		case DTK_YEAR:
-			tm->tm_year = value;
-			tmask = DTK_M(YEAR);
-			break;
-
-		case DTK_MONTH:
-
-			/*
-			 * already have a month and hour? then assume
-			 * minutes
-			 */
-			if ((fmask & DTK_M(MONTH)) != 0 &&
-(fmask & DTK_M(HOUR)) != 0)
-			{
-tm->tm_min = value;
-tmask = DTK_M(MINUTE);
-			}
-			else
-			{
-tm->tm_mon = value;
-tmask = DTK_M(MONTH);
-			}
-			break;
-
-		case DTK_DAY:
-			tm->tm_mday = value;
-			tmask = DTK_M(DAY);
-			break;
-
-		case DTK_HOUR:
-			tm->tm_hour = value;
-			tmask = DTK_M(HOUR);
-			break;
-
-		case DTK_MINUTE:
-			tm->tm_min = value;
-			tmask = DTK_M(MINUTE);
-			break;
-
-		case DTK_SECOND:
-			tm->tm_sec = value;
-			tmask = DTK_M(SECOND);
-			if (*cp == '.')
-			{
-dterr = ParseFractionalSecond(cp, fsec);
-if (dterr)
-	return dterr;
-tmask = DTK_ALL_SECS_M;
-			}
-			break;
-
-		case DTK_TZ:
-			tmask = DTK_M(TZ);
-			dterr = DecodeTimezone(field[i], tzp);
-			if (dterr)
-return dterr;
-			break;
-
 		case DTK_JULIAN:
 			/* previous field was a label for "julian date" */
 			if (value < 0)
@@ -1510,6 +1433,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1536,7 +1462,6 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			 ftype[i + 1] != DTK_TIME &&
 			 ftype[i + 1] != DTK_DATE))
 			return DTERR_BAD_FORMAT;
-
 		ptype = val;
 		break;
 
@@ -1567,6 +1492,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -1933,7 +1862,7 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO h04mm05s06 format */
+	int			ptype = 0;		/* "prefix type" for ISO format */
 	int			i;
 	int			val;
 	int			dterr;
@@ -2060,133 +1989,23 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 			case DTK_NUMBER:
 
 /*
- * Was this an "ISO time" with embedded field labels? An
- * example is "h04mm05s06" - thomas 2001-02-04
+ * Was this an "ISO time" An example is "T040506.789"
  */
 if (ptype != 0)
 {
 	char	   *cp;
 	int			value;
 
-	/* Only accept a date under limited circumstances */
-	switch (ptype)
-	{
-		case DTK_JULIAN:
-		case DTK_YEAR:
-		case DTK_MONTH:
-	

Re: Date-Time dangling unit fix

2023-03-05 Thread Alexander Lakhin

Hello,

05.03.2023 02:31, Joseph Koshakow wrote:

I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c

/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/

which comes from this commit [1], which was authored by Thomas Lockhart
(presumably the same thomas from the comment).


I've also seen another interesting comment in datetime.c:
    /*
 * Was this an "ISO time" with embedded field labels? An
 * example is "h04mm05s06" - thomas 2001-02-04
 */
In fact,
SELECT time 'h04mm05s06';
doesn't work for many years, but
SELECT time 'h04mm05s06.0';
still does.

I've just found that I mentioned it some time ago:
https://www.postgresql.org/message-id/dff75442-2468-f74f-568c-6006e141062f%40gmail.com

Best regards,
Alexander

Re: Date-Time dangling unit fix

2023-03-04 Thread Joseph Koshakow
On Sat, Mar 4, 2023 at 4:05 PM Tom Lane  wrote:
>
>I started to look at this, and soon noticed that while we have test
cases
>matching this sort of date input, there is no documentation for it.
The
>code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
>because it looks a lot like the ISO 8601 format for intervals
(durations).
>But I don't have a copy of ISO 8601, and some googling fails to find
any
>indication that anybody else believes this is a valid datetime format.
>Wikipedia for example documents a lot of variants of ISO 8601 [1],
>but nothing that looks like this.
>
>I wonder if we should just rip this code out instead of fixing it.
>I suspect its real-world usage is not different from zero.  We'd
>have to keep the "Jnnn" Julian-date case, though, so maybe there's
>little to be saved.
>
>If we do keep it, there's documentation work to be done.  But the
>first bit of doco I'd want to see is a pointer to a standard.

I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c

/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/

which comes from this commit [1], which was authored by Thomas Lockhart
(presumably the same thomas from the comment). I've CC'ed Thomas in
case the email still exists and they happen to remember. The commit
message mentions ISO, but not the variant mentioned in the comment.
The mailing list thread can be found here [2], but it doesn't provide
much more information. I also found the following thread [3], which
happens to have you in it in case you remember it, which seemed to be
the motivation for commit [1]. It only contains the following line
about ISO:

> o support for "ISO variants" on input, including embedded "T" preceeding
the time fields

All that seems to imply the "y2001m02d04" ISO variant was never really
discussed in much detail and it's probably fine to remove it. Though,
it has been around for 22 years which makes it a bit scary to remove.

- Joe Koshakow

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d
[2]
https://www.postgresql.org/message-id/flat/3BB433D5.3CB4164E%40fourpalms.org
[3]
https://www.postgresql.org/message-id/flat/3B970FF8.B9990807%40fourpalms.org#c57d83c80d295bfa19887c92122369c3


Re: Date-Time dangling unit fix

2023-03-04 Thread Tom Lane
Joseph Koshakow  writes:
> On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow  wrote:
>> I just found another class of this bug that the submitted patch does
>> not fix. If the units are at the beginning of the string, then they are
>> also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
>> think the fix here is to check and make sure that ptype is 0 before
>> reassigning the value to a non-zero number. I'll send an updated patch
>> with this tonight.

> Attached is the described patch.

I started to look at this, and soon noticed that while we have test cases
matching this sort of date input, there is no documentation for it.  The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals (durations).
But I don't have a copy of ISO 8601, and some googling fails to find any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1],
but nothing that looks like this.

I wonder if we should just rip this code out instead of fixing it.
I suspect its real-world usage is not different from zero.  We'd
have to keep the "Jnnn" Julian-date case, though, so maybe there's
little to be saved.

If we do keep it, there's documentation work to be done.  But the
first bit of doco I'd want to see is a pointer to a standard.

regards, tom lane

[1] https://en.wikipedia.org/wiki/ISO_8601




Re: Date-Time dangling unit fix

2022-12-12 Thread Joseph Koshakow
On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow  wrote:
>
> I just found another class of this bug that the submitted patch does
> not fix. If the units are at the beginning of the string, then they are
> also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
> think the fix here is to check and make sure that ptype is 0 before
> reassigning the value to a non-zero number. I'll send an updated patch
> with this tonight.

Attached is the described patch.

- Joe Koshakow
From af72736bb4149afa629281e27da2141635a93cac Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Handle dangling units in date-time input

DecodeDateTime and DecodeTimeOnly allowed dangling unit types at the
beginning and end of inputs without returning an error. For example,
`date '1995-08-06 m y d'` and
`timestamp 'y m s d y2001m02d04 h04mm17s34'` were considered a valid
date and the dangling units were ignored. This commit fixes this issue
so an error is returned instead.
---
 src/backend/utils/adt/datetime.c  | 21 -
 src/test/regress/expected/date.out| 10 ++
 src/test/regress/expected/time.out|  5 +
 src/test/regress/expected/timestamp.out   | 10 ++
 src/test/regress/expected/timestamptz.out | 10 ++
 src/test/regress/expected/timetz.out  |  5 +
 src/test/regress/sql/date.sql |  5 +
 src/test/regress/sql/time.sql |  3 +++
 src/test/regress/sql/timestamp.sql|  5 +
 src/test/regress/sql/timestamptz.sql  |  5 +
 src/test/regress/sql/timetz.sql   |  3 +++
 11 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b5b117a8ca..ebd7caff08 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1509,6 +1509,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1535,7 +1538,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			 ftype[i + 1] != DTK_TIME &&
 			 ftype[i + 1] != DTK_DATE))
 			return DTERR_BAD_FORMAT;
-
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1566,6 +1571,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -2367,6 +2376,9 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -2385,6 +2397,9 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 			 ftype[i + 1] != DTK_DATE))
 			return DTERR_BAD_FORMAT;
 
+		/* prevent consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -2415,6 +2430,10 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out
index f8f83e40e9..f4239a5402 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1526,3 +1526,13 @@ select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
 ERROR:  time field value out of range: 24:00:2.1
+-- test error on dangling units
+SELECT date '1995-08-06 m';
+ERROR:  invalid input syntax for type date: "1995-08-06 m"
+LINE 1: SELECT date '1995-08-06 m';
+^
+SET datestyle = ISO;
+SELECT date 'y m s d y2001m02d04';
+ERROR:  invalid input syntax for type date: "y m s d y2001m02d04"
+LINE 1: SELECT date 'y m s d y2001m02d04';
+^
diff --git a/src/test/regress/expected/time.out b/src/test/regress/expected/time.out
index a44caededd..5f7058eca8 100644
--- a/src/test/regress/expected/time.out
+++ b/src/test/regress/expected/time.out
@@ -229,3 +229,8 @@ SELECT date_part('epoch',   TIME '2020-05-26 13:30:25.575401');
  48625.575401
 (1 row)
 
+-- test error on dangling units
+SELECT time '12:30:15 d';
+ERROR:  invalid input syntax for type time: "12:30:15 d"
+LINE 1: SELECT time '12:30:15 d';
+^
diff --git a/src/test/regress/expec

Re: Date-Time dangling unit fix

2022-12-12 Thread Joseph Koshakow
I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.


Re: Date-Time dangling unit fix

2022-12-11 Thread Joseph Koshakow
On Sun, Dec 11, 2022 at 10:29 AM Joseph Koshakow  wrote:
>
> Hi all,
>
> Attached is a patch to fix a parsing error for date-time types that
> allow dangling units in the input. For example,
> `date '1995-08-06 m y d'` was considered a valid date and the dangling
> units were ignored.
>
> Intervals also suffer from a similar issue, but the attached patch
> doesn't fix that issue. For example,
> `interval '1 day second month 6 hours days years ago'` is parsed as a
> valid interval with -1 days and -6 hours. I'm hoping to fix that in a
> later patch, but it will likely be more complicated than the other
> date-time fixes.
>
> - Joe Koshakow

I think I sent that to the wrong email address.
From fbcf39211fc7a379ea021160298604694383d56c Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Handle dangling units in date-time input

DecodeDateTime and DecodeTimeOnly allowed dangling unit types on input
without returning an error. For example, `date '1995-08-06 m y d'` was
considered a valid date and the dangling units were ignored. This
commit fixes this issue so an error is returned instead.
---
 src/backend/utils/adt/datetime.c  | 8 
 src/test/regress/expected/date.out| 5 +
 src/test/regress/expected/time.out| 5 +
 src/test/regress/expected/timestamp.out   | 5 +
 src/test/regress/expected/timestamptz.out | 5 +
 src/test/regress/expected/timetz.out  | 5 +
 src/test/regress/sql/date.sql | 3 +++
 src/test/regress/sql/time.sql | 3 +++
 src/test/regress/sql/timestamp.sql| 3 +++
 src/test/regress/sql/timestamptz.sql  | 3 +++
 src/test/regress/sql/timetz.sql   | 3 +++
 11 files changed, 48 insertions(+)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b5b117a8ca..a985d1b6ea 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1566,6 +1566,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -2415,6 +2419,10 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out
index f8f83e40e9..fec466a594 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1526,3 +1526,8 @@ select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
 ERROR:  time field value out of range: 24:00:2.1
+-- test error on dangling units
+SELECT date '1995-08-06 m';
+ERROR:  invalid input syntax for type date: "1995-08-06 m"
+LINE 1: SELECT date '1995-08-06 m';
+^
diff --git a/src/test/regress/expected/time.out b/src/test/regress/expected/time.out
index a44caededd..5f7058eca8 100644
--- a/src/test/regress/expected/time.out
+++ b/src/test/regress/expected/time.out
@@ -229,3 +229,8 @@ SELECT date_part('epoch',   TIME '2020-05-26 13:30:25.575401');
  48625.575401
 (1 row)
 
+-- test error on dangling units
+SELECT time '12:30:15 d';
+ERROR:  invalid input syntax for type time: "12:30:15 d"
+LINE 1: SELECT time '12:30:15 d';
+^
diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out
index be66274738..6fce7319eb 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -2110,3 +2110,8 @@ select * from generate_series('2020-01-01 00:00'::timestamp,
   '2020-01-02 03:00'::timestamp,
   '0 hour'::interval);
 ERROR:  step size cannot equal zero
+-- test error on dangling units
+SELECT timestamp '1995-08-06 12:30:15 y';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 12:30:15 y"
+LINE 1: SELECT timestamp '1995-08-06 12:30:15 y';
+ ^
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index fb06acbccc..565c5595ea 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -3085,3 +3085,8 @@ select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';
  Tue Jan 17 16:00:00 2017 PST
 (1 row)
 
+-- test error on dangling units
+SELECT timestamptz '1995-08-06 12:30:15 m';
+ERROR:  invalid input syntax for type timestamp with time zone: "1995-08-06 12:30:15 m"
+LINE 1: SELECT timestamptz '1995-08-06 1

Date-Time dangling unit fix

2022-12-11 Thread Joseph Koshakow
Hi all,

Attached is a patch to fix a parsing error for date-time types that
allow dangling units in the input. For example,
`date '1995-08-06 m y d'` was considered a valid date and the dangling
units were ignored.

Intervals also suffer from a similar issue, but the attached patch
doesn't fix that issue. For example,
`interval '1 day second month 6 hours days years ago'` is parsed as a
valid interval with -1 days and -6 hours. I'm hoping to fix that in a
later patch, but it will likely be more complicated than the other
date-time fixes.

- Joe Koshakow
From fbcf39211fc7a379ea021160298604694383d56c Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Handle dangling units in date-time input

DecodeDateTime and DecodeTimeOnly allowed dangling unit types on input
without returning an error. For example, `date '1995-08-06 m y d'` was
considered a valid date and the dangling units were ignored. This
commit fixes this issue so an error is returned instead.
---
 src/backend/utils/adt/datetime.c  | 8 
 src/test/regress/expected/date.out| 5 +
 src/test/regress/expected/time.out| 5 +
 src/test/regress/expected/timestamp.out   | 5 +
 src/test/regress/expected/timestamptz.out | 5 +
 src/test/regress/expected/timetz.out  | 5 +
 src/test/regress/sql/date.sql | 3 +++
 src/test/regress/sql/time.sql | 3 +++
 src/test/regress/sql/timestamp.sql| 3 +++
 src/test/regress/sql/timestamptz.sql  | 3 +++
 src/test/regress/sql/timetz.sql   | 3 +++
 11 files changed, 48 insertions(+)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b5b117a8ca..a985d1b6ea 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1566,6 +1566,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -2415,6 +2419,10 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out
index f8f83e40e9..fec466a594 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1526,3 +1526,8 @@ select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
 ERROR:  time field value out of range: 24:00:2.1
+-- test error on dangling units
+SELECT date '1995-08-06 m';
+ERROR:  invalid input syntax for type date: "1995-08-06 m"
+LINE 1: SELECT date '1995-08-06 m';
+^
diff --git a/src/test/regress/expected/time.out b/src/test/regress/expected/time.out
index a44caededd..5f7058eca8 100644
--- a/src/test/regress/expected/time.out
+++ b/src/test/regress/expected/time.out
@@ -229,3 +229,8 @@ SELECT date_part('epoch',   TIME '2020-05-26 13:30:25.575401');
  48625.575401
 (1 row)
 
+-- test error on dangling units
+SELECT time '12:30:15 d';
+ERROR:  invalid input syntax for type time: "12:30:15 d"
+LINE 1: SELECT time '12:30:15 d';
+^
diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out
index be66274738..6fce7319eb 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -2110,3 +2110,8 @@ select * from generate_series('2020-01-01 00:00'::timestamp,
   '2020-01-02 03:00'::timestamp,
   '0 hour'::interval);
 ERROR:  step size cannot equal zero
+-- test error on dangling units
+SELECT timestamp '1995-08-06 12:30:15 y';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 12:30:15 y"
+LINE 1: SELECT timestamp '1995-08-06 12:30:15 y';
+ ^
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index fb06acbccc..565c5595ea 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -3085,3 +3085,8 @@ select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';
  Tue Jan 17 16:00:00 2017 PST
 (1 row)
 
+-- test error on dangling units
+SELECT timestamptz '1995-08-06 12:30:15 m';
+ERROR:  invalid input syntax for type timestamp with time zone: "1995-08-06 12:30:15 m"
+LINE 1: SELECT timestamptz '1995-08-06 12:30:15 m';
+   ^
diff --git a/src/test/regress/expected/timetz.out b/src/test/regress/expected/timetz.out
inde