[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421634#comment-17421634 ] Percy Camilo Triveño Aucahuasi commented on ARROW-14035: Related https://issues.apache.org/jira/browse/ARROW-14158 > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel, pull-request-available > Fix For: 6.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421478#comment-17421478 ] Percy Camilo Triveño Aucahuasi commented on ARROW-14035: Draft PR https://github.com/apache/arrow/pull/11257 > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel, pull-request-available > Fix For: 6.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419368#comment-17419368 ] Percy Camilo Triveño Aucahuasi commented on ARROW-14035: Thanks David! > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel > Fix For: 6.0.0 > > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419366#comment-17419366 ] David Li commented on ARROW-14035: -- {{value_counts}} gives you a histogram where the x-axis are the distinct values and the y-axis is the number of occurrences of that value. {{count_distinct}} is just {{COUNT(DISTINCT *)}}. Also, {{value_counts}} is a vector kernel whereas this should be a scalar aggregate kernel. > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel > Fix For: 6.0.0 > > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419364#comment-17419364 ] Percy Camilo Triveño Aucahuasi commented on ARROW-14035: Thanks [~icook], another question: What is the difference between _value_counts_ and _count_distinct_? [https://github.com/apache/arrow/blob/master/docs/source/cpp/compute.rst#associative-transforms] > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel > Fix For: 6.0.0 > > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418866#comment-17418866 ] Ian Cook commented on ARROW-14035: -- {quote}1. Do we need to compute the same thing of hash_count_distinct but without using the hash table from the hash group? {quote} Yes {quote}Are we going to offer non hash version for all hash_x functions too? (hash_distinct, hash_count, hash_sum) {quote} Yes I think we should aim for that (or nearly that; there might be a few exceptions where it does not make sense.) Comparing the lists of aggregation functions and hash (grouped) aggregation functions in [compute.rst|https://github.com/apache/arrow/blob/master/docs/source/cpp/compute.rst], they are mostly the same already, with just a few differences. I think this issue and ARROW-13309 are the most important two additions to bring these two lists closer to parity. > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel > Fix For: 6.0.0 > > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14035) [C++][Compute] Implement non-hash count_distinct aggregate kernel
[ https://issues.apache.org/jira/browse/ARROW-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418863#comment-17418863 ] Percy Camilo Triveño Aucahuasi commented on ARROW-14035: Can you please elaborate more about this requirement? # Do we need to compute the same thing of hash_distinct but without using the hash table from the hash group? # Are we going to offer non hash version for all hash_x functions too? (hash_distinct, hash_count, hash_sum) cc [~icook] @lidavidm > [C++][Compute] Implement non-hash count_distinct aggregate kernel > - > > Key: ARROW-14035 > URL: https://issues.apache.org/jira/browse/ARROW-14035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ian Cook >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Critical > Labels: kernel > Fix For: 6.0.0 > > > ARROW-12728 added a {{hash_count_distinct}} hash aggregate kernel, but there > is no non-hash {{count_distinct}} aggregate kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)