Re: proposal for expanded & consistent timestamp types

Maxim Gekk Fri, 07 Dec 2018 13:39:00 -0800

Hello Imran,

Thank you for bringing this problem up. I have faced to the issue of
handling timestamps and dates when I implemented date/timestamp parsing in
CSV/JSON datasource:
https://github.com/apache/spark/pull/23150
https://github.com/apache/spark/pull/23196


Maxim Gekk

Technical Solutions Lead

Databricks B. V.  <http://databricks.com/>


On Fri, Dec 7, 2018 at 8:33 PM Li Jin <ice.xell...@gmail.com> wrote:

> Imran,
>
> Thanks for sharing this. When working on interop between Spark and
> Pandas/Arrow in the past, we also faced some issues due to the different
> definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp
> has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime
> or OffsetDateTime semantics. (Detailed discussion is in the PR:
> https://github.com/apache/spark/pull/18664#issuecomment-316554156.)
>
> For one I am excited to see this effort going but also would love to see
> interop of Python to be included/considered in the picture. I don't think
> it adds much to what has already been proposed already because Python
> timestamps are basically LocalDateTime or OffsetDateTime.
>
> Li
>
>
>
> On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid <iras...@cloudera.com.invalid>
> wrote:
>
>> Hi,
>>
>> I'd like to discuss the future of timestamp support in Spark, in
>> particular with respect of handling timezones in different SQL types.   In
>> a nutshell:
>>
>> * There are at least 3 different ways of handling the timestamp type
>> across timezone changes
>> * We'd like Spark to clearly distinguish the 3 types (it currently
>> implements 1 of them), in a way that is backwards compatible, and also
>> compliant with the SQL standard.
>> * We'll get agreement across Spark, Hive, and Impala.
>>
>> Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed
>> doc, describing the problem in more detail, the state of various SQL
>> engines, and how we can get to a better state without breaking any current
>> use cases.  The proposal is good for Spark by itself.  We're also going to
>> the Hive & Impala communities with this proposal, as its better for
>> everyone if everything is compatible.
>>
>> Note that this isn't proposing a specific implementation in Spark as yet,
>> just a description of the overall problem and our end goal.  We're going to
>> each community to get agreement on the overall direction.  Then each
>> community can figure out specifics as they see fit.  (I don't think there
>> are any technical hurdles with this approach eg. to decide whether this
>> would be even possible in Spark.)
>>
>> Here's a link to the doc Zoltan has put together.  It is a bit long, but
>> it explains how such a seemingly simple concept has become such a mess and
>> how we can get to a better state.
>>
>>
>> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky
>>
>> Please review the proposal and let us know your opinions, concerns and
>> suggestions.
>>
>> thanks,
>> Imran
>>
>

Re: proposal for expanded & consistent timestamp types

Reply via email to