MartinKolbAtWork commented on PR #12227:
URL: https://github.com/apache/datafusion/pull/12227#issuecomment-2323958125

   > I wonder what the cost is for using string_to_datetime_formatted vs a more 
limited string_to_date_formatted. There should be slightly less parsing 
involved for that.
   
   Hi @Omega359 ,
   thanks for asking. This is a good question. Let me throw in two aspects.
   
   1) The current implementation also uses a “complete” parsing, so the 
proposed change does not make anything worse. The change just uses milliseconds 
instead of nanoseconds as an intermediate value to avoid the limitation of the 
upper bound of year 2262 when using nanoseconds.
   
   2) Simply parsing the values for year, month, and day out of a timestamp 
will not always create correct value for a date. This is an example of an 
existing test case in DataFusion: 
https://github.com/apache/datafusion/blob/dd3208943d728d845497d6a12ce4c0eacc061dcd/datafusion/sqllogictest/test_files/dates.slt#L127-L129
   The correct day of the timestamp `01-14-2023 01:01:30+05:30` is `13`, 
although the “parsed day” is `14`. This is because the timestamp refers to a 
time-zone where the local day is already the 14th, but the UTC day is the 13th.
   When parsing to a “timestamp”, it is expected that the “day” is “14” because 
the timestamp object can be asked for its time-zone, so the UTC date (13th) can 
be calculated from that available information. A simple “date” object however 
does not carry time-zone information. Therefore the date must refer to UTC, 
which is the 13th.
   
   And, as a brief reminder… I just intended to fix the issue that the date 
calculation has an upper limit of 2262. Both, the full timestamp parsing as 
well as the UTC handling are already in the code-base. I introduced neither of 
them.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to