tustvold commented on issue #7220:
URL: https://github.com/apache/arrow-rs/issues/7220#issuecomment-2708161150

   > mentioned there is a similar thing in arrow-cpp: 
https://github.com/apache/arrow/blob/784aa6faf69f5cf135e09976a281dea9ebf58166/cpp/src/parquet/arrow/schema_internal.cc#L205-L206
   
   This looks to just influence what TimeUnit it coerces to, e.g. milliseconds, 
nanoseconds, etc...
   
   > An opt-in feature that allows INT96 to pass unmodified bytes for each 
value, perhaps as FixedSizedBinary(12).
   
   My 2 cents is that whilst possible, this results in an unfortunate UX. IMO 
we should support Int96 to the best of our ability, rather than forcing every 
downstream to reproduce this logic. Whilst it may be somewhat depressing that 
Spark is STILL writing a type that has been deprecated for almost a decade, it 
is where we are at and we should support it.
   
   That being said I would suggest we split this issue into two parts:
   
   * Support influencing the precision used, similar to arrow-cpp
   * Support legacy rebase modes for timestamps before 1900 written by Spark 
versions before 3.x - see 
[here](https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes)
   
   I suspect most users only actually care about the first of these - the 
number of people writing dates pre-1900 is likely small, and the people doing 
so with a half decade old version of Spark or Hive is likely even smaller, we 
can likely leave it as an issue for someone to pick up if they have a use-case 
for it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to