[ 
https://issues.apache.org/jira/browse/PARQUET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355914#comment-14355914
 ] 

Ryan Blue commented on PARQUET-172:
-----------------------------------

Added tests to validate binary support in [PR 
#145|https://github.com/apache/incubator-parquet-mr/pull/145].

> Add support for non-String binary in parquet-thrift
> ---------------------------------------------------
>
>                 Key: PARQUET-172
>                 URL: https://issues.apache.org/jira/browse/PARQUET-172
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.5.0
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>
> Thrift [considers binary a "special" 
> type|https://thrift.apache.org/docs/types] that isn't in the official spec 
> but is "to provide better interoperability with java". The parquet-thrift 
> side doesn't currently support binary because Thrift String fields are 
> converted to UTF8-annotated binary. The result is that binary fields get 
> mangled when stored in Parquet because Parquet assumes they are UTF8.
> I think some storage layer in Java Thrift must know about binary and pass the 
> unencoded bytes, but that Parquet hasn't implemented a similar hack. (The 
> [type 
> conversion|https://github.com/apache/incubator-parquet-mr/blob/master/parquet-thrift/src/main/java/parquet/thrift/ThriftSchemaConverter.java#L86]
>  code at least has no entry for binary.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to