Re: [PR] HIVE-27194: Support expression in limit and offset clauses [hive]

via GitHub Tue, 03 Oct 2023 09:27:28 -0700


zabetak commented on code in PR #4751:
URL: https://github.com/apache/hive/pull/4751#discussion_r1344022457



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:
##########
@@ -21,10 +21,12 @@
 import com.google.common.collect.Lists;
 import com.google.common.collect.Multimap;
 
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.HashSet;
+import java.util.*;
 
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.parse.*;

Review Comment:
   The Hive [Hive code 
style](https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions)
 does not use wildcard imports.



##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -2799,9 +2799,24 @@ destination
 limitClause
 @init { pushMsg("limit clause", state); }
 @after { popMsg(state); }
-   :
-   KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num)
-   | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? 
$num)
+   : KW_LIMIT offset=Number COMMA numExpr=limitExpression

Review Comment:
   It seems that we can't parse `LIMIT (2+2), (5+5)` but we can `LIMIT (5+5) 
OFFSET (2+2)`. Any reason for this distinction?



##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -2799,9 +2799,24 @@ destination
 limitClause
 @init { pushMsg("limit clause", state); }
 @after { popMsg(state); }
-   :
-   KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num)
-   | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? 
$num)
+   : KW_LIMIT offset=Number COMMA numExpr=limitExpression
+      -> ^(TOK_LIMIT ($offset)? $numExpr)

Review Comment:
   I don't see optional parts in this rule why do we need the `?` symbol in the 
tree rewrite?



##########
ql/src/test/queries/clientnegative/limit_expression_negative.q:
##########
@@ -0,0 +1,10 @@
+--! qt:dataset:src
+-- Limit and offset queries
+select key from src limit 1.5;
+select key from src limit 1.5 offset 1;
+select key from src limit (1) offset (1.5);
+select key from src limit 1 offset (1.5+2);
+select key from src limit (1+2.5) offset 1;
+select key from src limit (1+2*3.5);
+select key from src limit (1+2*3.5) offset (3*4*5.5);
+select key from src order by key limit (1+2*3.5) offset (3*4*5.5);

Review Comment:
   If I recall well the negative CLI driver stops at the first error so if you 
want to verify multiple invalid queries you will have to put one per file. 
Maybe it would be better/easier to test the negative scenarios using unit tests 
and not end-to-end tests (qtests).



##########
ql/src/test/queries/clientpositive/limit_expression.q:
##########
@@ -0,0 +1,8 @@
+--! qt:dataset:src
+-- Limit and offset queries
+select key from src limit 1;

Review Comment:
   The changes in this PR affect also the code path of `.. LIMIT 10, 20` 
syntax. Please add tests accordingly.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java:
##########
@@ -118,6 +119,8 @@ public class QBParseInfo {
   // KEY of SimpleEntry: offset
   // VALUE of SimpleEntry: rowcount
   private final Map<String, SimpleEntry<Integer, Integer>> destToLimit;
+  private final Map<String, SimpleEntry<ASTNode, ASTNode>> destToLimitAST;

Review Comment:
   I don't think we need both maps since one is kinda a superset of the other. 
Moreover, I really think we should unify these cause having separate getters 
for the same clause can lead to inconsistencies and bugs in the code. 
   
   I am prettty sure that unifying these maps will necessitate changes to other 
places in the code that we possibly missed to update. I am mostly thinking 
around discrepancies when CBO (`hive.cbo.enabled`) is on and off.
   
   



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:
##########
@@ -1322,4 +1319,109 @@ public static void 
replaceTabAlias(Collection<ExprNodeDesc> exprs, String oldAli
 
   }
 
+  /**

Review Comment:
   The exact same (or very similar) methods exist inside the `SemanticAnalyzer` 
class. Please avoid code duplication and try to re-use as much as possible.
   
   Moreover, this class does not deal with `AST -> ExprNodeDesc` conversions so 
the new code here seems misplaced.



##########
ql/src/test/queries/clientpositive/limit_expression.q:
##########
@@ -0,0 +1,8 @@
+--! qt:dataset:src
+-- Limit and offset queries
+select key from src limit 1;

Review Comment:
   Using LIMIT and OFFSET without ORDER BY is not deterministic and arguably 
invalid. Please add an ORDER BY clause in the queries here.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/LimitOffsetExpr.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+
+public class LimitOffsetExpr {

Review Comment:
   All the `ASTToken`s are kept inside the `QBParseInfo` so I don't see clearly 
the necessity/purpose of this class. Let's re-evaluate its usefulness after we 
deal with the rest of the review comments.



##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -2799,9 +2799,24 @@ destination
 limitClause
 @init { pushMsg("limit clause", state); }
 @after { popMsg(state); }
-   :
-   KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num)
-   | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? 
$num)
+   : KW_LIMIT offset=Number COMMA numExpr=limitExpression
+      -> ^(TOK_LIMIT ($offset)? $numExpr)
+   | KW_LIMIT (numExpr=limitExpression) (KW_OFFSET 
(offsetExpr=offsetExpression))?
+      -> ^(TOK_LIMIT ($offsetExpr)? $numExpr)
+   ;
+
+limitExpression
+@init { pushMsg("limit expression", state); }
+@after { popMsg(state); }
+   : LPAREN! expression RPAREN!
+   | Number

Review Comment:
   Strange rule. Isn't `Number` part of the `expression` rule?



##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -2799,9 +2799,24 @@ destination
 limitClause
 @init { pushMsg("limit clause", state); }
 @after { popMsg(state); }
-   :
-   KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num)
-   | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? 
$num)
+   : KW_LIMIT offset=Number COMMA numExpr=limitExpression
+      -> ^(TOK_LIMIT ($offset)? $numExpr)
+   | KW_LIMIT (numExpr=limitExpression) (KW_OFFSET 
(offsetExpr=offsetExpression))?
+      -> ^(TOK_LIMIT ($offsetExpr)? $numExpr)
+   ;
+
+limitExpression
+@init { pushMsg("limit expression", state); }
+@after { popMsg(state); }
+   : LPAREN! expression RPAREN!
+   | Number
+   ;
+
+offsetExpression
+@init { pushMsg("offset expression", state); }
+@after { popMsg(state); }
+   : LPAREN! expression RPAREN!
+   | Number
    ;

Review Comment:
   These two rules are identical, why do we need both?



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:
##########
@@ -60,11 +61,6 @@
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.hadoop.util.ReflectionUtils;
 
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-

Review Comment:
   Please refrain from changing unrelated code. This leads to bigger merge 
conflicts during backports and pollutes git history.



##########
ql/src/test/queries/clientnegative/limit_expression_negative.q:
##########
@@ -0,0 +1,10 @@
+--! qt:dataset:src
+-- Limit and offset queries
+select key from src limit 1.5;
+select key from src limit 1.5 offset 1;
+select key from src limit (1) offset (1.5);
+select key from src limit 1 offset (1.5+2);
+select key from src limit (1+2.5) offset 1;
+select key from src limit (1+2*3.5);
+select key from src limit (1+2*3.5) offset (3*4*5.5);
+select key from src order by key limit (1+2*3.5) offset (3*4*5.5);

Review Comment:
   nit: new line at the end of file 
(https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline)



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -1925,11 +1925,19 @@ boolean doPhase1(ASTNode ast, QB qb, Phase1Ctx ctx_1, 
PlannerContext plannerCtx)
 
       case HiveParser.TOK_LIMIT:
         queryProperties.setHasLimit(true);
+        LimitOffsetExpr limitOffsetExpr = new LimitOffsetExpr(qbp);
         if (ast.getChildCount() == 2) {
-          qbp.setDestLimit(ctx_1.dest,
-              Integer.valueOf(ast.getChild(0).getText()), 
Integer.valueOf(ast.getChild(1).getText()));
+          if (ast.getChild(1).getChildCount() == 0 && 
ast.getChild(0).getChildCount() == 0) {
+            limitOffsetExpr.setDestLimitOffset(ctx_1.dest, 
Integer.valueOf(ast.getChild(0).getText()), 
Integer.valueOf(ast.getChild(1).getText()));
+          } else {
+            limitOffsetExpr.setDestLimitOffset(ctx_1.dest, (ASTNode) 
ast.getChild(0), (ASTNode) ast.getChild(1));
+          }
         } else {
-          qbp.setDestLimit(ctx_1.dest, Integer.valueOf(0), 
Integer.valueOf(ast.getChild(0).getText()));
+          if (ast.getChild(0).getChildCount() == 0) {
+            limitOffsetExpr.setDestLimitOffset(ctx_1.dest, 0, 
Integer.valueOf(ast.getChild(0).getText()));
+          } else {
+            limitOffsetExpr.setDestLimitOffset(ctx_1.dest, new ASTNode(new 
CommonToken(HiveParser.Number, "0")), (ASTNode) ast.getChild(0));
+          }

Review Comment:
   Do we need to distinguish between a simple integer and ASTNode? The ASTNode 
subsumes the integer literal. This is similar to a comment I left in 
QBParseInfo.



##########
ql/src/test/queries/clientpositive/limit_expression.q:
##########
@@ -0,0 +1,8 @@
+--! qt:dataset:src
+-- Limit and offset queries
+select key from src limit 1;
+select key from src limit 1 offset 1;
+select key from src limit (1) offset (1);
+select key from src limit 1 offset (1+2);
+select key from src limit (1+2) offset 1;

Review Comment:
   Add something for `limit (2+2) offset (1+1)`



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -3835,33 +3835,25 @@ private RelNode genOBLogicalPlan(QB qb, Pair<RelNode, 
RowResolver> selPair,
       // 1. OB Expr sanity test
       // in strict mode, in the presence of order by, limit must be
       // specified
-      Integer limit = qb.getParseInfo().getDestLimit(dest);
-      if (limit == null) {
+      OBLogicalPlanGenState obLogicalPlanGenState = 
beginGenOBLogicalPlan(obAST, selPair, outermostOB);

Review Comment:
   I haven't thoroughly reviewed the changes in this class. Will do it in the 
next review round.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/LimitOffsetExpr.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+
+public class LimitOffsetExpr {
+  private final QBParseInfo qbp;
+
+  public LimitOffsetExpr(QBParseInfo qbp) {
+    this.qbp = qbp;
+  }
+
+  public void setDestLimitOffset(String dest, Integer offsetValue, Integer 
limitValue) {
+    this.qbp.setDestLimit(dest, offsetValue, limitValue);
+  }
+  public void setDestLimitOffset(String dest, ASTNode offsetExpr, ASTNode 
limitExpr) {
+    this.qbp.setDestLimit(dest, offsetExpr, limitExpr);
+  }
+
+  public Integer getDestLimit(RowResolver inputRR, String dest) throws 
SemanticException {
+    Integer limitValue = this.qbp.getDestLimit(dest);
+    ASTNode limitExpr = this.qbp.getDestLimitAST(dest);
+    if (limitValue == null && limitExpr != null) {
+      try {
+        limitValue = (Integer) 
ExprNodeDescUtils.genValueFromConstantExpr(inputRR, limitExpr);

Review Comment:
   The conversion from `ASTToken` to `ExprNodeDesc ` should be done using 
existing methods and coding patterns. Check how `QBParseInfo#destToWhereExpr` 
is transformed from token to expression and get inspiration from there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-27194: Support expression in limit and offset clauses [hive]

Reply via email to