[ https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761825#comment-16761825 ]
Hatem Helal commented on ARROW-3769: ------------------------------------ Made a start on the unittests here: [https://github.com/mathworks/arrow/pull/12] [~wesmckinn], could you take a look and let me know if this is heading in the right direction? > [C++] Support reading non-dictionary encoded binary Parquet columns directly > as DictionaryArray > ----------------------------------------------------------------------------------------------- > > Key: ARROW-3769 > URL: https://issues.apache.org/jira/browse/ARROW-3769 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Wes McKinney > Assignee: Hatem Helal > Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > If the goal is to hash this data anyway into a categorical-type array, then > it would be better to offer the option to "push down" the hashing into the > Parquet read hot path rather than first fully materializing a dense vector of > {{ByteArray}} values, which could use a lot of memory after decompression -- This message was sent by Atlassian JIRA (v7.6.3#76005)