Github user rtreffer commented on the pull request: https://github.com/apache/spark/pull/6796#issuecomment-114835713 @liancheng I'll rebase on your branch, I really like the way you cleaned up toPrimitiveDataType by using a fluent Types interface. This will make this patch way easier. Talking about testing/compatibility/interoperability, I have added a hive-generated parquet file that I'd like to turn into a test case: https://github.com/rtreffer/spark/tree/spark-4176-store-large-decimal-in-parquet/sql/core/src/test/resources/hive-decimal-parquet There are some parquet files attached to tickets in jira, too. Do you plan to convert those into tests? Regarding FIXED_LENGTH_BYTE_ARRAY.... The overhead would decreases compared to size. BINARY overhead would be <10% from ~DECIMAL(100) and <25% from ~DECIAL(40) (pre-compression). I'd expect DECIMAL(40) to use the full precision only from time to time. But yeah, I've overlooked the 4 byte overhead at https://github.com/Parquet/parquet-format/blob/master/Encodings.md and assumed it would be less, FIXED_LENGTH_BYTE_ARRAY should be good for now (until s.o. complains).
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org