[
https://issues.apache.org/jira/browse/HIVE-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201521#comment-17201521
]
Jiehong Lian edited comment on HIVE-12778 at 9/24/20, 1:41 PM:
---------------------------------------------------------------
It's a bug when the having clause contains the aggregator with distinct hint,
while the aggregator does not in the select clause. such as:
{code:java}
SELECT key
FROM src
GROUP BY key
HAVING COUNT(value) >= 4 and count(distinct value) > 1
;{code}
I have a patch fix the bug.
{noformat}
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
b/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
index 35d25138fe..4450bc2d47 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
@@ -214,6 +214,14 @@ public void clearDistinctFuncExprsForClause(String clause)
{
}
}
+ public void addDistinctFuncExprsForClause(String clause, List<ASTNode>
dists) {
+ if (destToDistinctFuncExprs.containsKey(clause)) {
+ destToDistinctFuncExprs.get(clause).addAll(dists);
+ } else {
+ destToDistinctFuncExprs.put(clause, dists);
+ }
+ }
+
public void setDistinctFuncExprsForClause(String clause, List<ASTNode> ast) {
destToDistinctFuncExprs.put(clause, ast);
}
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
index e05474c2de..ec2fd8df7a 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
@@ -1408,8 +1408,10 @@ public boolean doPhase1(ASTNode ast, QB qb, Phase1Ctx
ctx_1, PlannerContext plan
case HiveParser.TOK_HAVING:
qbp.setHavingExprForClause(ctx_1.dest, ast);
- qbp.addAggregationExprsForClause(ctx_1.dest,
- doPhase1GetAggregationsFromSelect(ast, qb, ctx_1.dest));
+ LinkedHashMap<String, ASTNode> aggsInHaving =
+ doPhase1GetAggregationsFromSelect(ast, qb, ctx_1.dest);
+ qbp.addAggregationExprsForClause(ctx_1.dest, aggsInHaving);
+ qbp.addDistinctFuncExprsForClause(ctx_1.dest,
doPhase1GetDistinctFuncExprs(aggsInHaving));
break;
case HiveParser.KW_WINDOW:
{noformat}
was (Author: dindin5258):
It's a bug when the having clause contains the aggregator with distinct hint,
while the aggregator does not in the select clause. such as:
{code:java}
SELECT key
FROM src
GROUP BY key
HAVING COUNT(value) >= 4 and count(distinct value) > 1
;{code}
I have a patch fix the bug.
{code:java}
// patch.diff
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
b/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
index 35d25138fe..4450bc2d47 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
@@ -214,6 +214,14 @@ public void clearDistinctFuncExprsForClause(String clause)
{
}
}+ public void addDistinctFuncExprsForClause(String clause, List<ASTNode>
dists) {
+ if (destToDistinctFuncExprs.containsKey(clause)) {
+ destToDistinctFuncExprs.get(clause).addAll(dists);
+ } else {
+ destToDistinctFuncExprs.put(clause, dists);
+ }
+ }
+
public void setDistinctFuncExprsForClause(String clause, List<ASTNode> ast) {
destToDistinctFuncExprs.put(clause, ast);
}
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
index e05474c2de..ec2fd8df7a 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
@@ -1408,8 +1408,10 @@ public boolean doPhase1(ASTNode ast, QB qb, Phase1Ctx
ctx_1, PlannerContext plan case HiveParser.TOK_HAVING:
qbp.setHavingExprForClause(ctx_1.dest, ast);
- qbp.addAggregationExprsForClause(ctx_1.dest,
- doPhase1GetAggregationsFromSelect(ast, qb, ctx_1.dest));
+ LinkedHashMap<String, ASTNode> aggsInHaving =
+ doPhase1GetAggregationsFromSelect(ast, qb, ctx_1.dest);
+ qbp.addAggregationExprsForClause(ctx_1.dest, aggsInHaving);
+ qbp.addDistinctFuncExprsForClause(ctx_1.dest,
doPhase1GetDistinctFuncExprs(aggsInHaving));
break; case HiveParser.KW_WINDOW:
{code}
> Having with count distinct doesn't work for special combination
> ---------------------------------------------------------------
>
> Key: HIVE-12778
> URL: https://issues.apache.org/jira/browse/HIVE-12778
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.0, 1.2.1
> Reporter: Peter Brejcak
> Priority: Major
>
> There is problem for combination of count(distinct ) in having clause without
> count(distinct ) in select clause.
> First case returns error *FAILED: SemanticException [Error 10002]: Line
> Invalid column reference* (unexpected)
> If I add count(distinct ) to select clause result is ok (expected).
> Please run code to see it.
> Steps to reproduce:
> {code}
> create table table_subquery_having_problem (id int, value int);
> insert into table table_subquery_having_problem values (1,1);
> insert into table table_subquery_having_problem values (1,2);
> insert into table table_subquery_having_problem values (1,3);
> insert into table table_subquery_having_problem values (1,4);
> insert into table table_subquery_having_problem values (1,5);
> insert into table table_subquery_having_problem values (1,6);
> insert into table table_subquery_having_problem values (1,7);
> insert into table table_subquery_having_problem values (1,8);
> insert into table table_subquery_having_problem values (1,9);
> select x.id from table_subquery_having_problem x
> group by x.id
> having count(distinct x.value)>1; -- result is ERROR
> select x.id, count(distinct x.value) from table_subquery_having_problem x
> group by x.id
> having count(distinct x.value)>1; --result is OK
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)