I am +1 on returning NULL for seemingly "invalid" dates/timestamps.
Invalid may not be the most appropriate term since the parsing adheres to
JDK APIs but indeed the results may seem surprising.

I guess we can merge this PR when all the comments are addressed if nobody
raises a concern in the meantime.

Best,
Stamatis

On Tue, Jul 20, 2021 at 5:04 PM Sankar Hariappan <
sankar.hariap...@microsoft.com> wrote:

> +1
>
> Thanks Ashish for the comparison!
>
> I talked to few Hive users (HDInsight) and they supported returning NULL
> for invalid date/timestamp inputs instead of returning incorrect results or
> exception.
>
> Can others pls share your thoughts?
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Ashish Sharma <ashishkumarsharm...@gmail.com>
> *Sent:* 20 July 2021 14:02
> *To:* dev@hive.apache.org
> *Cc:* sankar.hariap...@microsoft.com.invalid; sank...@apache.org;
> u...@hive.apache.org; David <dam6...@gmail.com>
> *Subject:* Re: [EXTERNAL] Re: Move Date and Timestamp parsing from
> ResolverStyle.LENIENT to ResolverStyle.STRICT
>
>
>
> Hi all,
>
>
>
> I also feel that adding more config doesn't make sense in this as we are
> tightening the date and timestamp format. We should decide upon a single
> solution even if it break the compatibility. Below the comparison of HIVE
> 1.2, HIVE 3.2, MYSQL, PostgreSQL, Oracle
>
>
>
>
>
> *Query*
>
> *Hive 1.2*
>
> *Hive 3.2*
>
> *Mysql*
>
> *PostgreSQL*
>
> *ORACLE *
>
> select cast('2020-20-20' as date);
>
> NULL
>
> 2021-08-20
>
> NULL
>
> date/time field value out of range: "2020-20-20"
>
> not a valid month
>
> select cast(null as date);
>
> NULL
>
> NULL
>
> NULL
>
> NULL
>
> NULL
>
> select cast('2020-02-31' as date);
>
> 2020-03-02
>
> 2020-03-02
>
> NULL
>
> date/time field value out of range: "2020-02-31"
>
> date format picture ends before converting entire input string
>
> select cast('2020/02/20' as date);
>
> NULL
>
> NULL
>
> 2020-02-20
>
> 2020-02-20
>
> literal does not match format string
>
> select cast('0000-00-00' as date);
>
> NULL
>
> 0002-11-30
>
> NULL
>
> date/time field value out of range: "0000-00-00"
>
> literal does not match format string
>
>
>
>
>
> From the comparison it is quite clear that date and timestamp formatting
> was much tighter in older versions of HIVE. For most of the wrong date
> input *NULL *was the standard response instead of Exception.
>
>
>
> Also when I went through the code I found that. While doing the Vector
> implementation of some of the date related UDF like datediff etc. MySql was
> taken as the gold standard
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-15338%3FfocusedCommentId%3D15727553%26page%3Dcom.atlassian.jira.plugin.system.issuetabpanels%253Acomment-tabpanel%23comment-15727553&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537409104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UjkFmuO8PHHkcojm1f%2FPJipguQ1JYMkbl%2F0XzcrvGBg%3D&reserved=0>.
> So it make more sense that  we should comply with MySql as we already refer
> MySql as gold standard and returning NULL as result for wrong dates in cast
> is also documented
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fhive%2Flanguagemanual%2Btypes%23LanguageManualTypes-CastingDates&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=69O6Ct8q%2FK8l31D4yK4eF1fQRFjtl3jAiooP3FWJTJ4%3D&reserved=0>
>
>
>
>
> *So I propose to make NULL as the standard response for all parsing
> errors.*
>
>
>
> Thanks
>
> Ashish Sharma
>
>
>
> On Tue, Jul 13, 2021 at 9:52 PM Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
>
> Hi all,
>
> Thanks for pushing this forward Ashish!
>
> Actually I am not in favor of creating a flag for this. Either we decide
> consciously to break backward compatibility in the hope that we are
> improving the expected results or we keep the current behavior.
> Adding another flag means that we maintain and support two variants that
> makes the problem of test coverage brought by David even worse.
>
> I second David's idea to run some tests over some well adopted DBMS (MySQL,
> Oracle, MSSQL, Postgres) to see what they return.
> I think Ashish already did some tests over MySQL and MSSQL but personally I
> would like to see some more (dates + engines) in order to express
> a preference.
> We shouldn't forget that since Hive is implemented in Java, having
> functions that are inline with the Java APIs is not such a bad idea.
> The last comment is slightly supportive of the current behavior.
>
> I am including user@ list in the discussion since we should definitely
> consider the feedback of people that are using Hive for real.
>
> Best,
> Stamatis
>
> On Tue, Jul 13, 2021 at 4:31 PM David <dam6...@gmail.com> wrote:
>
> > Hello,
> >
> > Is anyone able to try out a few different vendor RDBMS to see how they
> > handle invalid dates, or provide links to documentation, both for invalid
> > formatting and things like mm-dd-yyy 12-40-2021?
> >
> > Thanks.
> >
> > On Tue, Jul 13, 2021 at 5:14 AM Sankar Hariappan
> > <sankar.hariap...@microsoft.com.invalid> wrote:
> >
> >> I'm supporting this change to return "NULL" for invalid date/timestamp.
> >> In the interest of backward compatibility, can we make all these changes
> >> under a flag which can be enabled by default?
> >>
> >>
> >> Thanks,
> >> Sankar
> >> -----Original Message-----
> >> From: David <dam6...@gmail.com>
> >> Sent: 10 July 2021 07:35
> >> To: dev <dev@hive.apache.org>
> >> Cc: sank...@apache.org; Stamatis Zampetakis <zabe...@gmail.com>
> >> Subject: [EXTERNAL] Re: Move Date and Timestamp parsing from
> >> ResolverStyle.LENIENT to ResolverStyle.STRICT
> >>
> >> Hello,
> >>
> >> I too would be in favor of this. It drastically cuts down on the test
> >> matrix for Hive if we can clamp down on timestamp formats. With that
> being
> >> said, I've tried this and it's a big effort.  I put it down without
> getting
> >> consensus or buy-in or engagement on the effort. Please check out my
> work
> >> here:
> >>
> >>
> >>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FHIVE-24814&amp;data=04%7C01%7CSankar.Hariappan%40microsoft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=mvYgaG7liJOwUZmMvgwlo%2B1HvcUsrnzXA3Ltfz5yEYE%3D&amp;reserved=0
> >>
> >>
> >> On Fri, Jul 9, 2021, 9:49 PM Ashish Sharma <
> ashishkumarsharm...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > When casting incorrect date or timestamp literals to DATE or TIMESTAMP
> >> > data type hive returns wrong values
> >> >
> >> > hive> select cast('2020-20-20' as date);
> >> >
> >> > OK
> >> >
> >> > 2021-08-20
> >> >
> >> > Time taken: 4.436 seconds, Fetched: 1 row(s)
> >> >
> >> >
> >> > I have created a solution draft. Please review the draft and provide
> >> > your valuable feedback on the same.
> >> >
> >> >
> >> >
> >> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> >> > .google.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgoogle.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=buVSLhJVTluXbYd0iy491oHq2LU2e%2FfKfKYRks1co30%3D&reserved=0>
> %2Fdocument%2Fd%2F1YTTPlNq3qyzlKfYVkSl3EFhVQ6-wa9WFRdkdIeCo
> >> > c1Y%2Fedit%3Fusp%3Dsharing&amp;data=04%7C01%7CSankar.Hariappan%40micro
> >> > soft.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsoft.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537429016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=no6XEmDCYcHNdjQduJdynmsEJpLww56lKSEze4LZ8Qo%3D&reserved=0>
> %7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7c
> >> > d011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> >> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;
> >> > sdata=iihK9wJC%2B1uPktHSE9BpXADvbal1UT7vZ3rwigkgkIY%3D&amp;reserved=0
> >> >
> >> >
> >> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu
> >> > es.apache.org
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fes.apache.org%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537429016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W4mMKQSceHXb1NFvkYwtBvy02B81WoX9mAyPHd0F77s%3D&reserved=0>
> %2Fjira%2Fbrowse%2FHIVE-25306&amp;data=04%7C01%7CSankar.H
> >> > ariappan%40microsoft.com
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2F40microsoft.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537438972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a8wpFdw26oKHEDLE1XqTGVBCIgkHs%2Fj0r9BMnMkszOA%3D&reserved=0>
> %7Cd47432b9d7654d66a46908d943472338%7C72f988bf
> >> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFp
> >> > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> >> > 0%3D%7C3000&amp;sdata=nW%2Bw%2B0AYn%2BbvOqRLrXghfH0FG%2B1cQW%2BzdrpT%2
> >> > B9R%2B6rA%3D&amp;reserved=0
> >> >
> >> >
> >> > Thank you
> >> >
> >> > Ashish Sharma
> >> >
> >>
> >
>
>

Reply via email to