scarlin-cloudera commented on code in PR #4783: URL: https://github.com/apache/hive/pull/4783#discussion_r1350249988
########## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ########## @@ -5996,7 +5996,7 @@ private ReduceSinkOperator genCommonGroupByPlanReduceSinkOperator(QB qb, List<St for (Map.Entry<ASTNode, ExprNodeDesc> entry : nodeOutputs.entrySet()) { ASTNode parameter = entry.getKey(); ExprNodeDesc expression = entry.getValue(); - if (!(expression instanceof ExprNodeColumnDesc)) { + if (!(expression instanceof ExprNodeColumnDesc) && !ExprNodeConstantDesc.isFoldedFromCol(expression)) { Review Comment: So this is a good question, and I kinda have to walk you through the algorithm to explain why. Perhaps I should put this in the comment as well. The short answer is that there may still be a latent bug (not 100% positive), but including all constants would probably introduce a different bug. The statement on lines 5991 and lines 5992 generate expressions for all the where clauses. It takes every subtree within the AST for the where clause and spits out an associated expression and places it in nodeOutputs. So if the where expression contained (not the real AST, but just an example) ">= ( dt, 1000)", nodeoutputs would have 3 entries, "=(dt, 1000)", "dt" and "1000" Furthermore, the call in genAllExprNodeDesc takes the inputRowResolver as a parameter. Internally, it checks every subtree and tries to match it up with an column expression in inputRowResolver. If it succeeds, the expression is generated from the column information given in the inputRowResolver. If it fails, it generates its own expression based on the syntax. So going back to the above example, nodeOutputs would contain 3 expressions. ">= (dt,1000)" would map to an ExprNodeGenericFuncDesc. "1000" will map to an ExprNodeConstantDesc. "dt" should theoretically map to an ExprNodeColumnDesc. But it doesn't and that's why there is a bug here. The RowResolver knows that it's a constant that is aliased to "dt", as created by CBO. So the expression does succeed in finding a ColumnInfo, marks that it has been folded from a column, and creates an ExprNodeConstantDesc. One theoretically fix would have been to make sure that an "ExprNodeColumnDesc" is returned from the RowResolver. I'll be honest: I didn't try this. This RowResolver code is used in many places, and my gut feeling is that changing this low level behavior would cause issues in other places. I can go ahead and try this if you think this will work, but yeah, I don't think it will. So going back to the purpose of this "if" statement. We only want to keep expressions that we think are columns. These will get placed into the output RowResolver. So it would be wrong to place "1000" into the outputRowResolver. This is the reason the code "continues" when the expression is an ExprNodeConstantDesc. In my original statement, I mentioned there still might be a latent bug. I'm not quite sure what would happen if the input RowResolver contained a true constant that wasn't folded. That should also theoretically be treated as a column. I wasn't sure how to generate an SQL statement that used common groups and used CBO to get to this point. In fact, it might not be possible, because the where clause wouldn't really need that column to be placed in the output RowResolver. But my gut feeling here is that if we do find this is a bug? This is gonna require a little bit of a rewrite. One aside I figured I should also mention. In the last paragraph, I mentioned that we need to use CBO. I couldn't reproduce this in non-CBO. This happened when the where clause "dt = 1000" was optimized to place the "1000" in the select portion and alias this column name to "dt". Sorry for the very lengthy, wordy explanation, but I hope this helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org