[ 
https://issues.apache.org/jira/browse/ARROW-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923401#comment-16923401
 ] 

Uwe L. Korn commented on ARROW-6277:
------------------------------------

This could be interesting for date columns when working together with pandas. 
To correctly round-trip date columns in the cycle Parquet -> Arrow -> pandas -> 
Arrow -> Parquet you need to use object columns in pandas with datetime.date 
objects. These can be quite repetitive and thus using dictionary encoding helps 
a lot here. Otherwise I would see the same use case for float columns but that 
isn't something I haven't yet used, mostly due to pandas not really working 
well with float categories.

> [C++][Parquet] Support reading/writing other Parquet primitive types to 
> DictionaryArray
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-6277
>                 URL: https://issues.apache.org/jira/browse/ARROW-6277
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.15.0
>
>
> As follow up to ARROW-3246, we should support direct read/write of the other 
> Parquet primitive types. Currently only BYTE_ARRAY is implemented as it 
> provides the most performance benefit.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to