zabetak commented on code in PR #4751: URL: https://github.com/apache/hive/pull/4751#discussion_r1344022457
########## ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java: ########## @@ -21,10 +21,12 @@ import com.google.common.collect.Lists; import com.google.common.collect.Multimap; -import java.util.Arrays; -import java.util.Collection; -import java.util.HashSet; +import java.util.*; +import org.apache.hadoop.hive.ql.ErrorMsg; +import org.apache.hadoop.hive.ql.parse.*; Review Comment: The Hive [Hive code style](https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions) does not use wildcard imports. ########## parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: ########## @@ -2799,9 +2799,24 @@ destination limitClause @init { pushMsg("limit clause", state); } @after { popMsg(state); } - : - KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num) - | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? $num) + : KW_LIMIT offset=Number COMMA numExpr=limitExpression Review Comment: It seems that we can't parse `LIMIT (2+2), (5+5)` but we can `LIMIT (5+5) OFFSET (2+2)`. Any reason for this distinction? ########## parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: ########## @@ -2799,9 +2799,24 @@ destination limitClause @init { pushMsg("limit clause", state); } @after { popMsg(state); } - : - KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num) - | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? $num) + : KW_LIMIT offset=Number COMMA numExpr=limitExpression + -> ^(TOK_LIMIT ($offset)? $numExpr) Review Comment: I don't see optional parts in this rule why do we need the `?` symbol in the tree rewrite? ########## ql/src/test/queries/clientnegative/limit_expression_negative.q: ########## @@ -0,0 +1,10 @@ +--! qt:dataset:src +-- Limit and offset queries +select key from src limit 1.5; +select key from src limit 1.5 offset 1; +select key from src limit (1) offset (1.5); +select key from src limit 1 offset (1.5+2); +select key from src limit (1+2.5) offset 1; +select key from src limit (1+2*3.5); +select key from src limit (1+2*3.5) offset (3*4*5.5); +select key from src order by key limit (1+2*3.5) offset (3*4*5.5); Review Comment: If I recall well the negative CLI driver stops at the first error so if you want to verify multiple invalid queries you will have to put one per file. Maybe it would be better/easier to test the negative scenarios using unit tests and not end-to-end tests (qtests). ########## ql/src/test/queries/clientpositive/limit_expression.q: ########## @@ -0,0 +1,8 @@ +--! qt:dataset:src +-- Limit and offset queries +select key from src limit 1; Review Comment: The changes in this PR affect also the code path of `.. LIMIT 10, 20` syntax. Please add tests accordingly. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java: ########## @@ -118,6 +119,8 @@ public class QBParseInfo { // KEY of SimpleEntry: offset // VALUE of SimpleEntry: rowcount private final Map<String, SimpleEntry<Integer, Integer>> destToLimit; + private final Map<String, SimpleEntry<ASTNode, ASTNode>> destToLimitAST; Review Comment: I don't think we need both maps since one is kinda a superset of the other. Moreover, I really think we should unify these cause having separate getters for the same clause can lead to inconsistencies and bugs in the code. I am prettty sure that unifying these maps will necessitate changes to other places in the code that we possibly missed to update. I am mostly thinking around discrepancies when CBO (`hive.cbo.enabled`) is on and off. ########## ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java: ########## @@ -1322,4 +1319,109 @@ public static void replaceTabAlias(Collection<ExprNodeDesc> exprs, String oldAli } + /** Review Comment: The exact same (or very similar) methods exist inside the `SemanticAnalyzer` class. Please avoid code duplication and try to re-use as much as possible. Moreover, this class does not deal with `AST -> ExprNodeDesc` conversions so the new code here seems misplaced. ########## ql/src/test/queries/clientpositive/limit_expression.q: ########## @@ -0,0 +1,8 @@ +--! qt:dataset:src +-- Limit and offset queries +select key from src limit 1; Review Comment: Using LIMIT and OFFSET without ORDER BY is not deterministic and arguably invalid. Please add an ORDER BY clause in the queries here. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/LimitOffsetExpr.java: ########## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.parse; +import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils; + +public class LimitOffsetExpr { Review Comment: All the `ASTToken`s are kept inside the `QBParseInfo` so I don't see clearly the necessity/purpose of this class. Let's re-evaluate its usefulness after we deal with the rest of the review comments. ########## parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: ########## @@ -2799,9 +2799,24 @@ destination limitClause @init { pushMsg("limit clause", state); } @after { popMsg(state); } - : - KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num) - | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? $num) + : KW_LIMIT offset=Number COMMA numExpr=limitExpression + -> ^(TOK_LIMIT ($offset)? $numExpr) + | KW_LIMIT (numExpr=limitExpression) (KW_OFFSET (offsetExpr=offsetExpression))? + -> ^(TOK_LIMIT ($offsetExpr)? $numExpr) + ; + +limitExpression +@init { pushMsg("limit expression", state); } +@after { popMsg(state); } + : LPAREN! expression RPAREN! + | Number Review Comment: Strange rule. Isn't `Number` part of the `expression` rule? ########## parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: ########## @@ -2799,9 +2799,24 @@ destination limitClause @init { pushMsg("limit clause", state); } @after { popMsg(state); } - : - KW_LIMIT ((offset=Number COMMA)? num=Number) -> ^(TOK_LIMIT ($offset)? $num) - | KW_LIMIT num=Number KW_OFFSET offset=Number -> ^(TOK_LIMIT ($offset)? $num) + : KW_LIMIT offset=Number COMMA numExpr=limitExpression + -> ^(TOK_LIMIT ($offset)? $numExpr) + | KW_LIMIT (numExpr=limitExpression) (KW_OFFSET (offsetExpr=offsetExpression))? + -> ^(TOK_LIMIT ($offsetExpr)? $numExpr) + ; + +limitExpression +@init { pushMsg("limit expression", state); } +@after { popMsg(state); } + : LPAREN! expression RPAREN! + | Number + ; + +offsetExpression +@init { pushMsg("offset expression", state); } +@after { popMsg(state); } + : LPAREN! expression RPAREN! + | Number ; Review Comment: These two rules are identical, why do we need both? ########## ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java: ########## @@ -60,11 +61,6 @@ import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; import org.apache.hadoop.util.ReflectionUtils; -import java.util.ArrayList; -import java.util.List; -import java.util.Map; -import java.util.Set; - Review Comment: Please refrain from changing unrelated code. This leads to bigger merge conflicts during backports and pollutes git history. ########## ql/src/test/queries/clientnegative/limit_expression_negative.q: ########## @@ -0,0 +1,10 @@ +--! qt:dataset:src +-- Limit and offset queries +select key from src limit 1.5; +select key from src limit 1.5 offset 1; +select key from src limit (1) offset (1.5); +select key from src limit 1 offset (1.5+2); +select key from src limit (1+2.5) offset 1; +select key from src limit (1+2*3.5); +select key from src limit (1+2*3.5) offset (3*4*5.5); +select key from src order by key limit (1+2*3.5) offset (3*4*5.5); Review Comment: nit: new line at the end of file (https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline) ########## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ########## @@ -1925,11 +1925,19 @@ boolean doPhase1(ASTNode ast, QB qb, Phase1Ctx ctx_1, PlannerContext plannerCtx) case HiveParser.TOK_LIMIT: queryProperties.setHasLimit(true); + LimitOffsetExpr limitOffsetExpr = new LimitOffsetExpr(qbp); if (ast.getChildCount() == 2) { - qbp.setDestLimit(ctx_1.dest, - Integer.valueOf(ast.getChild(0).getText()), Integer.valueOf(ast.getChild(1).getText())); + if (ast.getChild(1).getChildCount() == 0 && ast.getChild(0).getChildCount() == 0) { + limitOffsetExpr.setDestLimitOffset(ctx_1.dest, Integer.valueOf(ast.getChild(0).getText()), Integer.valueOf(ast.getChild(1).getText())); + } else { + limitOffsetExpr.setDestLimitOffset(ctx_1.dest, (ASTNode) ast.getChild(0), (ASTNode) ast.getChild(1)); + } } else { - qbp.setDestLimit(ctx_1.dest, Integer.valueOf(0), Integer.valueOf(ast.getChild(0).getText())); + if (ast.getChild(0).getChildCount() == 0) { + limitOffsetExpr.setDestLimitOffset(ctx_1.dest, 0, Integer.valueOf(ast.getChild(0).getText())); + } else { + limitOffsetExpr.setDestLimitOffset(ctx_1.dest, new ASTNode(new CommonToken(HiveParser.Number, "0")), (ASTNode) ast.getChild(0)); + } Review Comment: Do we need to distinguish between a simple integer and ASTNode? The ASTNode subsumes the integer literal. This is similar to a comment I left in QBParseInfo. ########## ql/src/test/queries/clientpositive/limit_expression.q: ########## @@ -0,0 +1,8 @@ +--! qt:dataset:src +-- Limit and offset queries +select key from src limit 1; +select key from src limit 1 offset 1; +select key from src limit (1) offset (1); +select key from src limit 1 offset (1+2); +select key from src limit (1+2) offset 1; Review Comment: Add something for `limit (2+2) offset (1+1)` ########## ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: ########## @@ -3835,33 +3835,25 @@ private RelNode genOBLogicalPlan(QB qb, Pair<RelNode, RowResolver> selPair, // 1. OB Expr sanity test // in strict mode, in the presence of order by, limit must be // specified - Integer limit = qb.getParseInfo().getDestLimit(dest); - if (limit == null) { + OBLogicalPlanGenState obLogicalPlanGenState = beginGenOBLogicalPlan(obAST, selPair, outermostOB); Review Comment: I haven't thoroughly reviewed the changes in this class. Will do it in the next review round. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/LimitOffsetExpr.java: ########## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.parse; +import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils; + +public class LimitOffsetExpr { + private final QBParseInfo qbp; + + public LimitOffsetExpr(QBParseInfo qbp) { + this.qbp = qbp; + } + + public void setDestLimitOffset(String dest, Integer offsetValue, Integer limitValue) { + this.qbp.setDestLimit(dest, offsetValue, limitValue); + } + public void setDestLimitOffset(String dest, ASTNode offsetExpr, ASTNode limitExpr) { + this.qbp.setDestLimit(dest, offsetExpr, limitExpr); + } + + public Integer getDestLimit(RowResolver inputRR, String dest) throws SemanticException { + Integer limitValue = this.qbp.getDestLimit(dest); + ASTNode limitExpr = this.qbp.getDestLimitAST(dest); + if (limitValue == null && limitExpr != null) { + try { + limitValue = (Integer) ExprNodeDescUtils.genValueFromConstantExpr(inputRR, limitExpr); Review Comment: The conversion from `ASTToken` to `ExprNodeDesc ` should be done using existing methods and coding patterns. Check how `QBParseInfo#destToWhereExpr` is transformed from token to expression and get inspiration from there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org