Re: [I] Parquet reader: option to pass INT96 as bytes instead of as Timestamp [arrow-rs]

via GitHub Sat, 08 Mar 2025 01:22:11 -0800


tustvold commented on issue #7220:
URL: https://github.com/apache/arrow-rs/issues/7220#issuecomment-2708161150

> mentioned there is a similar thing in arrow-cpp:
https://github.com/apache/arrow/blob/784aa6faf69f5cf135e09976a281dea9ebf58166/cpp/src/parquet/arrow/schema_internal.cc#L205-L206

This looks to just influence what TimeUnit it coerces to, e.g. milliseconds,
nanoseconds, etc...

> An opt-in feature that allows INT96 to pass unmodified bytes for each
value, perhaps as FixedSizedBinary(12).

My 2 cents is that whilst possible, this results in an unfortunate UX. IMO
we should support Int96 to the best of our ability, rather than forcing every
downstream to reproduce this logic. Whilst it may be somewhat depressing that
Spark is STILL writing a type that has been deprecated for almost a decade, it
is where we are at and we should support it.

That being said I would suggest we split this issue into two parts:

* Support influencing the precision used, similar to arrow-cpp
* Support legacy rebase modes for timestamps before 1900 written by Spark
versions before 3.x - see
[here](https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes)

I suspect most users only actually care about the first of these - the
number of people writing dates pre-1900 is likely small, and the people doing
so with a half decade old version of Spark or Hive is likely even smaller, we
can likely leave it as an issue for someone to pick up if they have a use-case
for it.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] Parquet reader: option to pass INT96 as bytes instead of as Timestamp [arrow-rs]

Reply via email to