Hello Imran, Thank you for bringing this problem up. I have faced to the issue of handling timestamps and dates when I implemented date/timestamp parsing in CSV/JSON datasource: https://github.com/apache/spark/pull/23150 https://github.com/apache/spark/pull/23196
Maxim Gekk Technical Solutions Lead Databricks B. V. <http://databricks.com/> On Fri, Dec 7, 2018 at 8:33 PM Li Jin <[email protected]> wrote: > Imran, > > Thanks for sharing this. When working on interop between Spark and > Pandas/Arrow in the past, we also faced some issues due to the different > definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp > has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime > or OffsetDateTime semantics. (Detailed discussion is in the PR: > https://github.com/apache/spark/pull/18664#issuecomment-316554156.) > > For one I am excited to see this effort going but also would love to see > interop of Python to be included/considered in the picture. I don't think > it adds much to what has already been proposed already because Python > timestamps are basically LocalDateTime or OffsetDateTime. > > Li > > > > On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid <[email protected]> > wrote: > >> Hi, >> >> I'd like to discuss the future of timestamp support in Spark, in >> particular with respect of handling timezones in different SQL types. In >> a nutshell: >> >> * There are at least 3 different ways of handling the timestamp type >> across timezone changes >> * We'd like Spark to clearly distinguish the 3 types (it currently >> implements 1 of them), in a way that is backwards compatible, and also >> compliant with the SQL standard. >> * We'll get agreement across Spark, Hive, and Impala. >> >> Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed >> doc, describing the problem in more detail, the state of various SQL >> engines, and how we can get to a better state without breaking any current >> use cases. The proposal is good for Spark by itself. We're also going to >> the Hive & Impala communities with this proposal, as its better for >> everyone if everything is compatible. >> >> Note that this isn't proposing a specific implementation in Spark as yet, >> just a description of the overall problem and our end goal. We're going to >> each community to get agreement on the overall direction. Then each >> community can figure out specifics as they see fit. (I don't think there >> are any technical hurdles with this approach eg. to decide whether this >> would be even possible in Spark.) >> >> Here's a link to the doc Zoltan has put together. It is a bit long, but >> it explains how such a seemingly simple concept has become such a mess and >> how we can get to a better state. >> >> >> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky >> >> Please review the proposal and let us know your opinions, concerns and >> suggestions. >> >> thanks, >> Imran >> >
