Re: [HACKERS] Rectifying wrong Date outputs

Piyush Newe Sun, 20 Mar 2011 22:54:40 -0700

On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane <[email protected]> wrote:

> Robert Haas <[email protected]> writes:
> > On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
> > <[email protected]> wrote:
> >> Keep in mind that the datetime stuff was abandoned by the maintainer
> >> some years ago with quite some rough edges.  Some of it has been fixed,
> >> but a lot of bugs remain.  Looks like this is one of those places and it
> >> seems appropriate to spend some time fixing it.  Since it would involve
> >> a behavior change, it should only go to 9.2, of course.
>
> > I wouldn't object to fixing the problem with # of digits > # of Ys in
> > 9.1, if the fix is simple and clear-cut.  I think we are still
> > accepting patches to make minor tweaks, like the tab-completion patch
> > I committed yesterday.  It also doesn't bother me tremendously if we
> > push it off, but I don't think that anyone's going to be too sad if
> > TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
> > sensible than 3010-01-01.
>
> Agreed, it's certainly not too late for bug fixes in 9.1.  I agree
> that this isn't something we would want to tweak in released branches,
> but 9.1 isn't there yet.
>
>
I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.

> Having said that, it's not entirely clear to me what sane behavior is
> here.  Personally I would expect that an n-Ys format spec would consume
> at most n digits from the input.  Otherwise how are you going to use
> to_date to pick apart strings that don't have any separators?  So
> I think the problem is actually upstream of the behavior complained of
> here.  However, what we should first do is see what Oracle does in such
> cases, because the main driving factor for these functions is Oracle
> compatibility not what might seem sane in a vacuum.
>
>
Following is the extended chart which is comparing the behavior of Oracle,
PG & EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10',  'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10',  'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067',  'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001',  'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010',  'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010',  'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010',  'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.

In all above mentioned cases, the observation is, If the # Ys are lesser
than the # of digits,, then it should throw an error. Only in case of 'YY',
its not correct, unless the year is later than 9999. In this way, we can fix
the wrong outputs in PG.

>                        regards, tom lane
>

-- 
-- 
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index aba1145..ad42126 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -989,6 +989,7 @@ static DCHCacheEntry *DCH_cache_getnew(char *str);
 static NUMCacheEntry *NUM_cache_search(char *str);
 static NUMCacheEntry *NUM_cache_getnew(char *str);
 static void NUM_cache_remove(NUMCacheEntry *ent);
+static int DecideCentury(int *in, int len);
 
 
 /* ----------
@@ -2733,21 +2734,14 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 			case DCH_IYYY:
 				from_char_parse_int(&out->year, &s, n);
 				out->yysz = 4;
+				DecideCentury(&out->year, out->yysz);
 				s += SKIP_THth(n->suffix);
 				break;
 			case DCH_YYY:
 			case DCH_IYY:
 				from_char_parse_int(&out->year, &s, n);
 				out->yysz = 3;
-
-				/*
-				 * 3-digit year: '100' ... '999' = 1100 ... 1999 '000' ...
-				 * '099' = 2000 ... 2099
-				 */
-				if (out->year >= 100)
-					out->year += 1000;
-				else
-					out->year += 2000;
+				DecideCentury(&out->year, out->yysz);
 				s += SKIP_THth(n->suffix);
 				break;
 			case DCH_YY:
@@ -2755,14 +2749,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				from_char_parse_int(&out->year, &s, n);
 				out->yysz = 2;
 
-				/*
-				 * 2-digit year: '00' ... '69'	= 2000 ... 2069 '70' ... '99'
-				 * = 1970 ... 1999
-				 */
-				if (out->year < 70)
-					out->year += 2000;
-				else
-					out->year += 1900;
+				DecideCentury(&out->year, out->yysz);
 				s += SKIP_THth(n->suffix);
 				break;
 			case DCH_Y:
@@ -2770,10 +2757,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				from_char_parse_int(&out->year, &s, n);
 				out->yysz = 1;
 
-				/*
-				 * 1-digit year: always +2000
-				 */
-				out->year += 2000;
+				DecideCentury(&out->year, out->yysz);
 				s += SKIP_THth(n->suffix);
 				break;
 			case DCH_RM:
@@ -5161,3 +5145,63 @@ float8_to_char(PG_FUNCTION_ARGS)
 	NUM_TOCHAR_finish;
 	PG_RETURN_TEXT_P(result);
 }
+
+/*
+ * In case of different Inputs & different Year formats, engine itself
+ * has to decide the century for the given year. This function will take
+ * care of the input & accordingly append the century their if not provided.
+ *
+ *   Input 	=	Transformed output
+ * 0 ... 9 		= 2000 ... 2009
+ * 00 ... 69 	= 2000 ... 2069
+ * 70 ... 99 	= 1900 ... 1999
+ * 000 ... 999 	= 2000 ... 2999
+ *
+ * In case input is greater than 9999 it throws an error. The basic rule of
+ * the function is " If the format is lesser than the actual value, then it
+ * should throw an error. Only in case of 'YY', we need not throw an error,
+ * unless the year is later than 9999".
+ */
+
+static int DecideCentury(int *in, int len)
+{
+	if (len == 1)
+	{
+		if (*in > 0 && *in < 9)
+			*in += 2000;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("date format picture ends before converting entire input string %d ", len)));
+	}
+	else if (len == 2)
+	{
+		if (*in > 0 && *in < 69)
+			*in += 2000;
+		else if (*in > 70 && *in < 99)
+			*in += 1900;
+		else if (*in > 9999)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("date format picture ends before converting entire input string %d ", len)));
+
+	}
+	else if (len == 3)
+	{
+		if (*in > 0 && *in < 999)
+			*in += 2000;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("date format picture ends before converting entire input string %d ", len)));
+	}
+	else if (len == 4)
+	{
+		if (*in > 9999)
+			ereport(ERROR,
+				(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+				 errmsg("date format picture ends before converting entire input string %d ", len)));
+	}
+
+	return 0;
+}

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rectifying wrong Date outputs

Reply via email to