Ryan Blue created PARQUET-172:
---------------------------------
Summary: Add support for non-String binary if parquet-thrift
Key: PARQUET-172
URL: https://issues.apache.org/jira/browse/PARQUET-172
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 1.5.0
Reporter: Ryan Blue
Thrift [considers binary a "special" type|https://thrift.apache.org/docs/types]
that isn't in the official spec but is "to provide better interoperability with
java". The parquet-thrift side doesn't currently support binary because Thrift
String fields are converted to UTF8-annotated binary. The result is that binary
fields get mangled when stored in Parquet because Parquet assumes they are UTF8.
I think some storage layer in Java Thrift must know about binary and pass the
unencoded bytes, but that Parquet hasn't implemented a similar hack. (The [type
conversion|https://github.com/apache/incubator-parquet-mr/blob/master/parquet-thrift/src/main/java/parquet/thrift/ThriftSchemaConverter.java#L86]
code at least has no entry for binary.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)