deniskuzZ commented on code in PR #6114:
URL: https://github.com/apache/hive/pull/6114#discussion_r2405777721
##########
ql/src/test/queries/clientpositive/vector_count_distinct_multiarg.q:
##########
@@ -0,0 +1,35 @@
+drop table if exists test_vector;
+create external table test_vector(id string, pid bigint) PARTITIONED BY
(full_date int);
+insert into test_vector (pid, full_date, id) values (1, '20240305', '6150');
+
+--------------------------------------------------------------------------------
+-- 1. Basic COUNT cases (valid in vectorization)
+--------------------------------------------------------------------------------
+SELECT COUNT(pid) AS cnt_col, COUNT(*) AS cnt_star, COUNT(20240305) AS
cnt_const, COUNT(DISTINCT pid) as cnt_distinct, COUNT(1) AS CNT
+FROM test_vector WHERE full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(pid) AS cnt_col, COUNT(*) AS cnt_star, COUNT(20240305) AS
cnt_const,COUNT(DISTINCT pid) as cnt_distinct, COUNT(1) AS CNT
+FROM test_vector WHERE full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 2. COUNT with DISTINCT column + constant (INVALID in vectorization)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, 20240305) AS CNT FROM test_vector WHERE
full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(DISTINCT pid, 20240305) AS CNT FROM test_vector WHERE
full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 3. COUNT(DISTINCT pid, full_date) (multi-col distinct ā FAIL)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, full_date) AS CNT FROM test_vector WHERE
full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(DISTINCT pid, full_date) AS CNT FROM test_vector WHERE
full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 4. COUNT(DISTINCT pid, full_date, id) (multi-col distinct ā FAIL)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, full_date, id) AS CNT FROM test_vector WHERE
full_date=20240305;
Review Comment:
Interesting that it works for you ā Iām getting an exception unless I wrap
the distinct columns in parentheses.
````
org.apache.hadoop.hive.ql.exec.UDFArgumentException: DISTINCT keyword must
be specified
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount.getEvaluator(GenericUDAFCount.java:73)
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]