[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive

Wes McKinney (JIRA) Wed, 12 Apr 2017 20:30:07 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967059#comment-15967059
 ]


Wes McKinney commented on ARROW-785:
------------------------------------

If I convert the strings to UTF8, then the problem goes away:

{code}
df['WORD'] = df['WORD'].str.decode('utf8')
{code}

then in parquet-mr and Spark

{code}
java -jar target/parquet-tools-1.9.0.jar test2.parq 
YEAR = 2017
WORD = Word 1

YEAR = 2018
WORD = Word 2
{code}

> possible issue on writing parquet via pyarrow, subsequently read in Hive
> ------------------------------------------------------------------------
>
>                 Key: ARROW-785
>                 URL: https://issues.apache.org/jira/browse/ARROW-785
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Jeff Reback
>            Priority: Minor
>             Fix For: 0.3.0
>
>
> details here: 
> http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f
> This round trips in pandas->parquet->pandas just fine on released pandas 
> (0.19.2) and pyarrow (0.2).
> OP stats that it is not readable in Hive however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive

Reply via email to