Re: proposal for expanded & consistent timestamp types

Li Jin Tue, 11 Dec 2018 11:36:23 -0800

Of course. I added some comments in the doc.

On Tue, Dec 11, 2018 at 12:01 PM Imran Rashid <im...@therashids.com> wrote:


> Hi Li,
>
> thanks for the comments!  I admit I had not thought very much about python
> support, its a good point.  But I'd actually like to clarify one thing
> about the doc -- though it discusses java types, the point is actually
> about having support for these logical types at the SQL level.  The doc
> uses java names instead of SQL names just because there is so much
> confusion around the SQL names, as they haven't been implemented
> consistently.  Once there is support for the additional logical types, then
> we'd absolutely want to get the same support in python.
>
> Its great to hear there are existing python types we can map each behavior
> to.  Could you add a comment on the doc on each of the types, mentioning
> the equivalent in python?
>
> thanks,
> Imran
>
> On Fri, Dec 7, 2018 at 1:33 PM Li Jin <ice.xell...@gmail.com> wrote:
>
>> Imran,
>>
>> Thanks for sharing this. When working on interop between Spark and
>> Pandas/Arrow in the past, we also faced some issues due to the different
>> definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp
>> has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime
>> or OffsetDateTime semantics. (Detailed discussion is in the PR:
>> https://github.com/apache/spark/pull/18664#issuecomment-316554156.)
>>
>> For one I am excited to see this effort going but also would love to see
>> interop of Python to be included/considered in the picture. I don't think
>> it adds much to what has already been proposed already because Python
>> timestamps are basically LocalDateTime or OffsetDateTime.
>>
>> Li
>>
>>
>>
>> On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid <iras...@cloudera.com.invalid>
>> wrote:
>>
>>> Hi,
>>>
>>> I'd like to discuss the future of timestamp support in Spark, in
>>> particular with respect of handling timezones in different SQL types.   In
>>> a nutshell:
>>>
>>> * There are at least 3 different ways of handling the timestamp type
>>> across timezone changes
>>> * We'd like Spark to clearly distinguish the 3 types (it currently
>>> implements 1 of them), in a way that is backwards compatible, and also
>>> compliant with the SQL standard.
>>> * We'll get agreement across Spark, Hive, and Impala.
>>>
>>> Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed
>>> doc, describing the problem in more detail, the state of various SQL
>>> engines, and how we can get to a better state without breaking any current
>>> use cases.  The proposal is good for Spark by itself.  We're also going to
>>> the Hive & Impala communities with this proposal, as its better for
>>> everyone if everything is compatible.
>>>
>>> Note that this isn't proposing a specific implementation in Spark as
>>> yet, just a description of the overall problem and our end goal.  We're
>>> going to each community to get agreement on the overall direction.  Then
>>> each community can figure out specifics as they see fit.  (I don't think
>>> there are any technical hurdles with this approach eg. to decide whether
>>> this would be even possible in Spark.)
>>>
>>> Here's a link to the doc Zoltan has put together.  It is a bit long, but
>>> it explains how such a seemingly simple concept has become such a mess and
>>> how we can get to a better state.
>>>
>>>
>>> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky
>>>
>>> Please review the proposal and let us know your opinions, concerns and
>>> suggestions.
>>>
>>> thanks,
>>> Imran
>>>
>>

Re: proposal for expanded & consistent timestamp types

Reply via email to