[ https://issues.apache.org/jira/browse/ARROW-12301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-12301: ------------------------------- Description: When calculating unique for chunked DictionaryArrays we currently run through all chunks and unify their dictionaries and then collect chunk indices. We could avoid the dictionary unification by using a generic hash. [See discussion here|https://github.com/apache/arrow/pull/9683] and [here|https://issues.apache.org/jira/browse/ARROW-10403] was: When calculating unique for chunked DictionaryArrays we currently run through all chunks and unify their dictionaries and then collect chunk indices. We could avoid the dictionary unification by using a generic hash. [See discussion here|https://github.com/apache/arrow/pull/9683] and [here|#ARROW-10403] > [C++][Compute] Use generic hash-aggregate for DictionaryArrays > -------------------------------------------------------------- > > Key: ARROW-12301 > URL: https://issues.apache.org/jira/browse/ARROW-12301 > Project: Apache Arrow > Issue Type: Improvement > Reporter: Rok Mihevc > Priority: Major > > When calculating unique for chunked DictionaryArrays we currently run through > all chunks and unify their dictionaries and then collect chunk indices. We > could avoid the dictionary unification by using a generic hash. > [See discussion here|https://github.com/apache/arrow/pull/9683] and > [here|https://issues.apache.org/jira/browse/ARROW-10403] -- This message was sent by Atlassian Jira (v8.3.4#803005)