hi Roman, I agree with you that it is not a small change because of the new union-based logical type representation, and compatibility for old Parquet files (as well as an option to write "old" metadata for compatibility with old Parquet readers).
- Wes On Tue, Nov 13, 2018 at 10:13 AM Roman Karlstetter <roman.karlstet...@gmail.com> wrote: > > Hi, > > that sounds like the task might not be ideally suited for someone new to > implementations of both arrow and parquet, especially since all that > compatibility issues should be handled correctly. > I think it does not make sense for me to continue with this implementation, > unless there are some further specifications on how this should be > implemented. > > Roman > > Von: Wes McKinney > Gesendet: Montag, 12. November 2018 16:50 > An: dev@arrow.apache.org > Betreff: Re: Support for TIMESTAMP_NANOS in parquet-cpp > > hi Roman, > > For nanosecond Arrow timestamps, the relevant code path for this is here: > > https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.cc#L607 > > You'll also have to modify some code in parquet/types.*, > parquet/schema.*, parquet/arrow/schema.cc to handle the additional > metadata. If you aren't dealing with Arrow at all, then it should be > sufficient just to modify the handling of the logical types metadata > in parquet/types.*. > > So there is a significant complication that I didn't think about yet: > we aren't handling the new logical types union in parquet-cpp yet, so > there's quite a lot of work beyond just dealing with the nanosecond > metadata. I am also not sure what are the implications for backwards > compatibility and haven't had time to look in detail at what needs to > be done since the new metadata structure was added to the Thrift > definition > > - Wes > On Mon, Nov 12, 2018 at 4:31 AM Roman Karlstetter > <roman.karlstet...@gmail.com> wrote: > > > > I've had the chance to look into this. > > There is one issue that came up which I don't know how to handle. > > Previously, int96 seems to have been used for nanosecond precision, but > > this is somewhat deprecated, as far as I understand it. > > So, how should we handle nanoseconds and int96 vs int64 in 1) reading from > > and b) writing to parquet. > > There seem to be some writer settings, all related to timestamp precision > > properties. Is there any advise someone of you can give me in that regard? > > > > Thanks, > > Roman > > > > Von: Roman Karlstetter > > Gesendet: Freitag, 9. November 2018 08:38 > > An: dev@arrow.apache.org > > Betreff: AW: Support for TIMESTAMP_NANOS in parquet-cpp > > > > I would be willing to implement that. I’ll probably need some advice on my > > patch though, as I’m fairly new to the parquet code. > > > > Roman > > > > Von: Wes McKinney > > Gesendet: Donnerstag, 8. November 2018 23:22 > > An: dev@arrow.apache.org > > Betreff: Re: Support for TIMESTAMP_NANOS in parquet-cpp > > > > I opened an issue here > > https://issues.apache.org/jira/browse/ARROW-3729. Patches would be > > welcome > > On Sat, Oct 20, 2018 at 12:55 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > > hi Roman, > > > > > > We would welcome adding such a document to the Arrow wiki > > > https://cwiki.apache.org/confluence/display/ARROW. As to your other > > > questions, it really depends on whether there is a member of the > > > Parquet community who will do the work. Patches that implement any > > > released functionality in the Parquet format specification are > > > welcome. > > > > > > Thanks > > > Wes > > > On Thu, Oct 18, 2018 at 10:59 AM Roman Karlstetter > > > <roman.karlstet...@gmail.com> wrote: > > > > > > > > Hi everyone, > > > > in parquet-format, there is now support for TIMESTAMP_NANOS: > > > > https://github.com/apache/parquet-format/pull/102 > > > > For parquet-cpp, this is not yet supported. I have a few questions now: > > > > • is there an overview of what release of parquet-format is currently > > > > fully support in parquet-cpp (something like a feature support matrix)? > > > > • how fast are new features in parquet-format adopted? > > > > I think having a document describing the current completeness of > > > > implementation of the spec would be very helpful for users of the > > > > parquet-cpp library. > > > > Thanks, > > > > Roman > > > > > > > > > > > > >