Of course. I added some comments in the doc. On Tue, Dec 11, 2018 at 12:01 PM Imran Rashid <[email protected]> wrote:
> Hi Li, > > thanks for the comments! I admit I had not thought very much about python > support, its a good point. But I'd actually like to clarify one thing > about the doc -- though it discusses java types, the point is actually > about having support for these logical types at the SQL level. The doc > uses java names instead of SQL names just because there is so much > confusion around the SQL names, as they haven't been implemented > consistently. Once there is support for the additional logical types, then > we'd absolutely want to get the same support in python. > > Its great to hear there are existing python types we can map each behavior > to. Could you add a comment on the doc on each of the types, mentioning > the equivalent in python? > > thanks, > Imran > > On Fri, Dec 7, 2018 at 1:33 PM Li Jin <[email protected]> wrote: > >> Imran, >> >> Thanks for sharing this. When working on interop between Spark and >> Pandas/Arrow in the past, we also faced some issues due to the different >> definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp >> has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime >> or OffsetDateTime semantics. (Detailed discussion is in the PR: >> https://github.com/apache/spark/pull/18664#issuecomment-316554156.) >> >> For one I am excited to see this effort going but also would love to see >> interop of Python to be included/considered in the picture. I don't think >> it adds much to what has already been proposed already because Python >> timestamps are basically LocalDateTime or OffsetDateTime. >> >> Li >> >> >> >> On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid <[email protected]> >> wrote: >> >>> Hi, >>> >>> I'd like to discuss the future of timestamp support in Spark, in >>> particular with respect of handling timezones in different SQL types. In >>> a nutshell: >>> >>> * There are at least 3 different ways of handling the timestamp type >>> across timezone changes >>> * We'd like Spark to clearly distinguish the 3 types (it currently >>> implements 1 of them), in a way that is backwards compatible, and also >>> compliant with the SQL standard. >>> * We'll get agreement across Spark, Hive, and Impala. >>> >>> Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed >>> doc, describing the problem in more detail, the state of various SQL >>> engines, and how we can get to a better state without breaking any current >>> use cases. The proposal is good for Spark by itself. We're also going to >>> the Hive & Impala communities with this proposal, as its better for >>> everyone if everything is compatible. >>> >>> Note that this isn't proposing a specific implementation in Spark as >>> yet, just a description of the overall problem and our end goal. We're >>> going to each community to get agreement on the overall direction. Then >>> each community can figure out specifics as they see fit. (I don't think >>> there are any technical hurdles with this approach eg. to decide whether >>> this would be even possible in Spark.) >>> >>> Here's a link to the doc Zoltan has put together. It is a bit long, but >>> it explains how such a seemingly simple concept has become such a mess and >>> how we can get to a better state. >>> >>> >>> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky >>> >>> Please review the proposal and let us know your opinions, concerns and >>> suggestions. >>> >>> thanks, >>> Imran >>> >>
