[ https://issues.apache.org/jira/browse/ARROW-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923401#comment-16923401 ]
Uwe L. Korn commented on ARROW-6277: ------------------------------------ This could be interesting for date columns when working together with pandas. To correctly round-trip date columns in the cycle Parquet -> Arrow -> pandas -> Arrow -> Parquet you need to use object columns in pandas with datetime.date objects. These can be quite repetitive and thus using dictionary encoding helps a lot here. Otherwise I would see the same use case for float columns but that isn't something I haven't yet used, mostly due to pandas not really working well with float categories. > [C++][Parquet] Support reading/writing other Parquet primitive types to > DictionaryArray > --------------------------------------------------------------------------------------- > > Key: ARROW-6277 > URL: https://issues.apache.org/jira/browse/ARROW-6277 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Wes McKinney > Priority: Major > Fix For: 0.15.0 > > > As follow up to ARROW-3246, we should support direct read/write of the other > Parquet primitive types. Currently only BYTE_ARRAY is implemented as it > provides the most performance benefit. -- This message was sent by Atlassian Jira (v8.3.2#803003)