[ https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961999#comment-15961999 ]
Ashima Sood commented on ARROW-785: ----------------------------------- Since we have an option to explicitly provide a schema, I updated the code as below : table=pa.Table.from_pandas(dataFrame,schema=dfschema) where dfschema = pa.Schema.from_fields([ (pa.Field.from_py('YEAR', pa.int64())),( pa.Field.from_py('WORD', pa.string())) ]) Regarless, getting below output: dfschema:: YEAR: int64 WORD: string table schema:: YEAR: int64 WORD: binary Eventually, since the WORD datatype remains binary, I still am not able to get the string value during query/read. I also see that string dtype uses object references/pointers. Is there a way to get the value instead of the reference? Also, when we use CAST(SUBSTR) during the select query, the value is show. Is there a right way to create hive table in that case? Thank you! > possible issue on writing parquet via pyarrow, subsequently read in Hive > ------------------------------------------------------------------------ > > Key: ARROW-785 > URL: https://issues.apache.org/jira/browse/ARROW-785 > Project: Apache Arrow > Issue Type: Bug > Reporter: Jeff Reback > Priority: Minor > Fix For: 0.3.0 > > > details here: > http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f > This round trips in pandas->parquet->pandas just fine on released pandas > (0.19.2) and pyarrow (0.2). > OP stats that it is not readable in Hive however. -- This message was sent by Atlassian JIRA (v6.3.15#6346)