I don't know about removal but you could probably ignore the timezone
string and it's not clear the issues would be that significant.

If Rust never produces a non-null non-UTC timestamp then I don't see
that as an issue.

If you are consuming data with a timestamp string other than UTC it
isn't really clear what information that timestamp string is supposed
to convey anyways.  Are you supposed to extract fields as if you were
in that time zone?  Or does this indicate the time zone the data was
captured in?  Postgresql, etc. do not support this concept.  Probably
the safest thing to do would be to reject the data.

There still remains the question of whether or not you need to
distinguish between local times and instant times.  Or, in python
terms, naive vs non-naive.  Or, in parquet terms, whether you need to
worry about the isAdjustedToUtc flag.  Or, in postgres terms, whether
you need to distinguish between "timestamp with timezone" and
"timestamp without timezone".

This boils down to whether you want to support the constraints offered
by these semantic hints from the user or not.  For example, forbidding
comparison between the two types of timestamps or altering how you
display them.  If those features are not important, then Rust could
ignore the time zone field completely.  That could cause an
interoperability issue though (e.g. data going into rust with timezone
UTC comes back out with no timezone even though nothing changed).
Ideally rust could ignore the time zone string but leave it unchanged.

On Wed, Jul 7, 2021 at 6:58 AM Joris Van den Bossche
<jorisvandenboss...@gmail.com> wrote:
>
> On Wed, 7 Jul 2021 at 18:46, Jorge Cardoso Leitão <jorgecarlei...@gmail.com>
> wrote:
>
> > Hi,
> >
> > AFAIK timezone is part of the spec.
>
>
> And for reference, the current spec (Schema flatbuffer file) for timestamp
> is at
> https://github.com/apache/arrow/blob/6c8d30ea82222fd2750b999840872d3f6cbdc8f8/format/Schema.fbs#L217-L247.
>
>
>
> > In Python, that would be [1]
> >
> > import pyarrow as pa
> > dt1 = pa.timestamp("ms", "+00:10")
> > dt2 = pa.timestamp("ms")
> >
> > arrow-rs is not very consistent with how it handles it. imo that is an
> > artifact of being currently difficult (API wise) to create an array with a
> > timezone, which have caused people to not use it much (and thus not
> > implement kernels with it / test it properly).
> >
> > I do not see how removing it would be compatible with the Arrow spec,
> > though.
> >
> > Best,
> > Jorge
> >
> > [1] https://arrow.apache.org/docs/python/generated/pyarrow.timestamp.html
> >
> >
> >
> > On Wed, Jul 7, 2021 at 6:37 PM Evan Chan <e...@urbanlogiq.com> wrote:
> >
> > > Hi folks,
> > >
> > > Some of us are having a discussion about a direction change for Rust
> > Arrow
> > > timestamp types, which current support both a resolution field (Ns,
> > Micros,
> > > Ms, Seconds) similar to the other language implementations, but also
> > > optionally a timezone string field.   I believe the timezone field is
> > > unique to the Rust implementation, as I don’t find it in the C/C++ and
> > > Python docs.   At the same time, in reality if the timezone field is non
> > > null, this is not well supported at all in the current code.  Functions
> > > returning timestamps pretty much all return a null timezone, for example,
> > > and don’t allow the timezone to be specified.
> > >
> > > The proposal would be to eliminate the timezone field and bring the Rust
> > > Arrow timestamp type in line with that of the other language
> > > implementations, also simplifying implementation.   It seems this is in
> > > line with direction of other projects (Parquet, Spark, and most DBs have
> > > timestamp types which do not have explicit timezones or are implicitly
> > UTC).
> > >
> > > Please feel free to see
> > > https://github.com/apache/arrow-datafusion/issues/686 <
> > > https://github.com/apache/arrow-datafusion/issues/686>
> > > (Or would it be better to discuss here in mailing list?)
> > >
> > > Cheers!
> > > Evan
> >

Reply via email to