[
https://issues.apache.org/jira/browse/ARROW-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-38:
----------------------------
External issue URL: https://github.com/apache/arrow/issues/15411
> C++: Algorithms for using nested types in a hash table context
> --------------------------------------------------------------
>
> Key: ARROW-38
> URL: https://issues.apache.org/jira/browse/ARROW-38
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
>
> Computing hash values (and performing equality comparisons) for top-level
> slots in nested-type data (for example, computing DISTINCT on a
> {{List<List<Int32>>}}, related: ARROW-32) can be fairly complex.
> Additionally, value slots at any level of the type tree can be null.
> We should explore various algorithms for their performance and memory use in
> practical settings. For example, one can compute a contiguous "record" / byte
> array resulting from a depth-first traversal of a single value slot for the
> purposes of computing a hash value or comparing with another slot. If anyone
> has other ideas from past experiences I would be keen to learn more.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)