Timestamps with different precision / Timedeltas

Uwe Korn Tue, 21 Jun 2016 13:41:19 -0700

Hello,

in addition to categoricals, we also miss at the moment a conversionfrom Timestamps in Pandas/NumPy to Arrow. Currently we only have two(exact) resolutions for them: DATE for days and TIMESTAMP formilliseconds. Ashttps://docs.scipy.org/doc/numpy/reference/arrays.datetime.html notesthere are several more. We do not need to cater for all but at leastsome of them. Therefore I have the following questions which I like tohave solved in some form before implementing:


 * Do we want to cater for other resolutions?
 * If we do not provide, e.g. nanosecond resolution (sadly the default
   in Pandas), do we cast with precision loss to the nearest match? Or
   should we force the user to do it?
 * Not so important for me at the moment: Do we want to support time zones?

My current objective is to have them for Parquet file writing. Sadlythis has the same limitations. So the two main options seem to be


 * "roundtrip will only yield correct timezone and logical type if we
   read with Arrow/Pandas again (as we use "proprietary" metadata to
   encode it)"
 * "we restrict us to milliseconds and days as resolution" (for the
   latter option, we need to decide how graceful we want to be in the
   Pandas<->Arrow conversion).

Further datatype we have not yet in Arrow but partly in Parquet istimedelta (or INTERVAL in Parquet). Probably we need to add anotherlogical type to Arrow to implement them. Open for suggestions here, too.

Also in the Arrow spec there is TIME which seems to be the same asTIMESTAMP (as far as the comments in the C++ code goes). Is there maybesome distinction I'm missing?


Cheers

Uwe

Timestamps with different precision / Timedeltas

Reply via email to