alamb commented on issue #765: URL: https://github.com/apache/arrow-datafusion/issues/765#issuecomment-884415600
The core of the problem is that `TimestampNanosecondArray` is defined to have type `Timestamp(Nanoseconds, None)` -- the second argument of `None` means the following according to the arrow spec [reference to schema.fbs](https://github.com/apache/arrow/blob/master/format/Schema.fbs#L251-L270): ``` /// * If the time zone is null or an empty string, the data is a local date-time /// and does not represent a single moment in time. Instead it represents a wall clock /// time and care should be taken to avoid interpreting it semantically as an instant. ``` So that certainly suggests we should not be applying any normalization to timestamps if there is no specific timezone set; Instead, we should return the raw "naive" timestamp (which corresponds to the arrow semantics for `Timestamp(_, None)` I think) Now this leaves open the question of "what do we do if the timestamp has an explicit timezone in it"? For example, `2021-07-20 23:28:50-05:00` If the desired output timezone is `UTC` then it makes sense to convert this to UTC 👍 ; However if the desired output timezone is "None" then what? I feel this is very similar to the question that @velvia was getting at in https://github.com/apache/arrow-datafusion/issues/686 I will continue the conversation there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
