[ https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967059#comment-15967059 ]
Wes McKinney commented on ARROW-785: ------------------------------------ If I convert the strings to UTF8, then the problem goes away: {code} df['WORD'] = df['WORD'].str.decode('utf8') {code} then in parquet-mr and Spark {code} java -jar target/parquet-tools-1.9.0.jar test2.parq YEAR = 2017 WORD = Word 1 YEAR = 2018 WORD = Word 2 {code} > possible issue on writing parquet via pyarrow, subsequently read in Hive > ------------------------------------------------------------------------ > > Key: ARROW-785 > URL: https://issues.apache.org/jira/browse/ARROW-785 > Project: Apache Arrow > Issue Type: Bug > Reporter: Jeff Reback > Priority: Minor > Fix For: 0.3.0 > > > details here: > http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f > This round trips in pandas->parquet->pandas just fine on released pandas > (0.19.2) and pyarrow (0.2). > OP stats that it is not readable in Hive however. -- This message was sent by Atlassian JIRA (v6.3.15#6346)