tustvold commented on issue #7220: URL: https://github.com/apache/arrow-rs/issues/7220#issuecomment-2708161150
> mentioned there is a similar thing in arrow-cpp: https://github.com/apache/arrow/blob/784aa6faf69f5cf135e09976a281dea9ebf58166/cpp/src/parquet/arrow/schema_internal.cc#L205-L206 This looks to just influence what TimeUnit it coerces to, e.g. milliseconds, nanoseconds, etc... > An opt-in feature that allows INT96 to pass unmodified bytes for each value, perhaps as FixedSizedBinary(12). My 2 cents is that whilst possible, this results in an unfortunate UX. IMO we should support Int96 to the best of our ability, rather than forcing every downstream to reproduce this logic. Whilst it may be somewhat depressing that Spark is STILL writing a type that has been deprecated for almost a decade, it is where we are at and we should support it. That being said I would suggest we split this issue into two parts: * Support influencing the precision used, similar to arrow-cpp * Support legacy rebase modes for timestamps before 1900 written by Spark versions before 3.x - see [here](https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes) I suspect most users only actually care about the first of these - the number of people writing dates pre-1900 is likely small, and the people doing so with a half decade old version of Spark or Hive is likely even smaller, we can likely leave it as an issue for someone to pick up if they have a use-case for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org