deniskuzZ commented on code in PR #6114:
URL: https://github.com/apache/hive/pull/6114#discussion_r2405777721


##########
ql/src/test/queries/clientpositive/vector_count_distinct_multiarg.q:
##########
@@ -0,0 +1,35 @@
+drop table if exists test_vector;
+create external table test_vector(id string, pid bigint) PARTITIONED BY 
(full_date int);
+insert into test_vector (pid, full_date, id) values (1, '20240305', '6150');
+
+--------------------------------------------------------------------------------
+-- 1. Basic COUNT cases (valid in vectorization)
+--------------------------------------------------------------------------------
+SELECT COUNT(pid) AS cnt_col, COUNT(*) AS cnt_star, COUNT(20240305) AS 
cnt_const, COUNT(DISTINCT pid) as cnt_distinct, COUNT(1) AS CNT
+FROM test_vector WHERE full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(pid) AS cnt_col, COUNT(*) AS cnt_star, COUNT(20240305) AS 
cnt_const,COUNT(DISTINCT pid) as cnt_distinct, COUNT(1) AS CNT
+FROM test_vector WHERE full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 2. COUNT with DISTINCT column + constant (INVALID in vectorization)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, 20240305) AS CNT FROM test_vector WHERE 
full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(DISTINCT pid, 20240305) AS CNT FROM test_vector WHERE 
full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 3. COUNT(DISTINCT pid, full_date) (multi-col distinct → FAIL)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, full_date) AS CNT FROM test_vector WHERE 
full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(DISTINCT pid, full_date) AS CNT FROM test_vector WHERE 
full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 4. COUNT(DISTINCT pid, full_date, id) (multi-col distinct → FAIL)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, full_date, id) AS CNT FROM test_vector WHERE 
full_date=20240305;

Review Comment:
   Interesting that it works for you — I’m getting an exception unless I wrap 
the distinct columns in parentheses.
   ````
    org.apache.hadoop.hive.ql.exec.UDFArgumentException: DISTINCT keyword must 
be specified
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount.getEvaluator(GenericUDAFCount.java:73)
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to