[ 
https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961999#comment-15961999
 ] 

Ashima Sood commented on ARROW-785:
-----------------------------------

Since we have an option to explicitly provide a schema, I updated the code as 
below :
table=pa.Table.from_pandas(dataFrame,schema=dfschema)

where dfschema = pa.Schema.from_fields([ (pa.Field.from_py('YEAR', 
pa.int64())),( pa.Field.from_py('WORD', pa.string())) ])

Regarless, getting below output:

dfschema::
YEAR: int64
WORD: string

table schema::
YEAR: int64
WORD: binary

Eventually, since the WORD datatype remains binary, I still am not able to get 
the string value during query/read.

I also see that string dtype uses object references/pointers. Is there a way to 
get the value instead of the reference?
Also, when we use CAST(SUBSTR) during the select query, the value is show. Is 
there a right way to create hive table in that case?

Thank you!

> possible issue on writing parquet via pyarrow, subsequently read in Hive
> ------------------------------------------------------------------------
>
>                 Key: ARROW-785
>                 URL: https://issues.apache.org/jira/browse/ARROW-785
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Jeff Reback
>            Priority: Minor
>             Fix For: 0.3.0
>
>
> details here: 
> http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f
> This round trips in pandas->parquet->pandas just fine on released pandas 
> (0.19.2) and pyarrow (0.2).
> OP stats that it is not readable in Hive however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to