Indhumathi27 commented on code in PR #6114:
URL: https://github.com/apache/hive/pull/6114#discussion_r2405921379


##########
ql/src/test/queries/clientpositive/vector_count_distinct_multiarg.q:
##########
@@ -0,0 +1,35 @@
+drop table if exists test_vector;
+create external table test_vector(id string, pid bigint) PARTITIONED BY 
(full_date int);
+insert into test_vector (pid, full_date, id) values (1, '20240305', '6150');
+
+--------------------------------------------------------------------------------
+-- 1. Basic COUNT cases (valid in vectorization)
+--------------------------------------------------------------------------------
+SELECT COUNT(pid) AS cnt_col, COUNT(*) AS cnt_star, COUNT(20240305) AS 
cnt_const, COUNT(DISTINCT pid) as cnt_distinct, COUNT(1) AS CNT
+FROM test_vector WHERE full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(pid) AS cnt_col, COUNT(*) AS cnt_star, COUNT(20240305) AS 
cnt_const,COUNT(DISTINCT pid) as cnt_distinct, COUNT(1) AS CNT
+FROM test_vector WHERE full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 2. COUNT with DISTINCT column + constant (INVALID in vectorization)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, 20240305) AS CNT FROM test_vector WHERE 
full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(DISTINCT pid, 20240305) AS CNT FROM test_vector WHERE 
full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 3. COUNT(DISTINCT pid, full_date) (multi-col distinct → FAIL)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, full_date) AS CNT FROM test_vector WHERE 
full_date=20240305;
+EXPLAIN VECTORIZATION EXPRESSION
+SELECT COUNT(DISTINCT pid, full_date) AS CNT FROM test_vector WHERE 
full_date=20240305;
+
+--------------------------------------------------------------------------------
+-- 4. COUNT(DISTINCT pid, full_date, id) (multi-col distinct → FAIL)
+--------------------------------------------------------------------------------
+SELECT COUNT(DISTINCT pid, full_date, id) AS CNT FROM test_vector WHERE 
full_date=20240305;

Review Comment:
   COUNT UDAF excepts  DISTINCT to be specified, when the parameters are more 
than 1.
   
https://github.com/apache/hive/blob/0e8749ee85bf202e69f5b4d3e8d325a01a5a2033/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java#L73



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to