Ryan Blue created PARQUET-172:
---------------------------------

             Summary: Add support for non-String binary if parquet-thrift
                 Key: PARQUET-172
                 URL: https://issues.apache.org/jira/browse/PARQUET-172
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.5.0
            Reporter: Ryan Blue


Thrift [considers binary a "special" type|https://thrift.apache.org/docs/types] 
that isn't in the official spec but is "to provide better interoperability with 
java". The parquet-thrift side doesn't currently support binary because Thrift 
String fields are converted to UTF8-annotated binary. The result is that binary 
fields get mangled when stored in Parquet because Parquet assumes they are UTF8.

I think some storage layer in Java Thrift must know about binary and pass the 
unencoded bytes, but that Parquet hasn't implemented a similar hack. (The [type 
conversion|https://github.com/apache/incubator-parquet-mr/blob/master/parquet-thrift/src/main/java/parquet/thrift/ThriftSchemaConverter.java#L86]
 code at least has no entry for binary.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to