Github user rtreffer commented on the pull request:

    https://github.com/apache/spark/pull/6796#issuecomment-114835713
  
    @liancheng I'll rebase on your branch, I really like the way you cleaned up 
toPrimitiveDataType by using a fluent Types interface. This will make this 
patch way easier.
    
    Talking about testing/compatibility/interoperability, I have added a 
hive-generated parquet file that I'd like to turn into a test case:
    
https://github.com/rtreffer/spark/tree/spark-4176-store-large-decimal-in-parquet/sql/core/src/test/resources/hive-decimal-parquet
 
    There are some parquet files attached to tickets in jira, too.
    Do you plan to convert those into tests?
    
    Regarding FIXED_LENGTH_BYTE_ARRAY.... The overhead would decreases compared 
to size. BINARY overhead would be <10% from ~DECIMAL(100) and <25% from 
~DECIAL(40) (pre-compression). I'd expect DECIMAL(40) to use the full precision 
only from time to time. But yeah, I've overlooked the 4 byte overhead at 
https://github.com/Parquet/parquet-format/blob/master/Encodings.md and assumed 
it would be less, FIXED_LENGTH_BYTE_ARRAY should be good for now (until s.o. 
complains).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to