[
https://issues.apache.org/jira/browse/PARQUET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355914#comment-14355914
]
Ryan Blue commented on PARQUET-172:
-----------------------------------
Added tests to validate binary support in [PR
#145|https://github.com/apache/incubator-parquet-mr/pull/145].
> Add support for non-String binary in parquet-thrift
> ---------------------------------------------------
>
> Key: PARQUET-172
> URL: https://issues.apache.org/jira/browse/PARQUET-172
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.5.0
> Reporter: Ryan Blue
> Assignee: Ryan Blue
>
> Thrift [considers binary a "special"
> type|https://thrift.apache.org/docs/types] that isn't in the official spec
> but is "to provide better interoperability with java". The parquet-thrift
> side doesn't currently support binary because Thrift String fields are
> converted to UTF8-annotated binary. The result is that binary fields get
> mangled when stored in Parquet because Parquet assumes they are UTF8.
> I think some storage layer in Java Thrift must know about binary and pass the
> unencoded bytes, but that Parquet hasn't implemented a similar hack. (The
> [type
> conversion|https://github.com/apache/incubator-parquet-mr/blob/master/parquet-thrift/src/main/java/parquet/thrift/ThriftSchemaConverter.java#L86]
> code at least has no entry for binary.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)