MartinKolbAtWork commented on PR #12227: URL: https://github.com/apache/datafusion/pull/12227#issuecomment-2323958125
> I wonder what the cost is for using string_to_datetime_formatted vs a more limited string_to_date_formatted. There should be slightly less parsing involved for that. Hi @Omega359 , thanks for asking. This is a good question. Let me throw in two aspects. 1) The current implementation also uses a “complete” parsing, so the proposed change does not make anything worse. The change just uses milliseconds instead of nanoseconds as an intermediate value to avoid the limitation of the upper bound of year 2262 when using nanoseconds. 2) Simply parsing the values for year, month, and day out of a timestamp will not always create correct value for a date. This is an example of an existing test case in DataFusion: https://github.com/apache/datafusion/blob/dd3208943d728d845497d6a12ce4c0eacc061dcd/datafusion/sqllogictest/test_files/dates.slt#L127-L129 The correct day of the timestamp `01-14-2023 01:01:30+05:30` is `13`, although the “parsed day” is `14`. This is because the timestamp refers to a time-zone where the local day is already the 14th, but the UTC day is the 13th. When parsing to a “timestamp”, it is expected that the “day” is “14” because the timestamp object can be asked for its time-zone, so the UTC date (13th) can be calculated from that available information. A simple “date” object however does not carry time-zone information. Therefore the date must refer to UTC, which is the 13th. And, as a brief reminder… I just intended to fix the issue that the date calculation has an upper limit of 2262. Both, the full timestamp parsing as well as the UTC handling are already in the code-base. I introduced neither of them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org