[
https://issues.apache.org/jira/browse/HIVE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460490#comment-13460490
]
Yin Huai commented on HIVE-3495:
--------------------------------
Just looked at this problem in details. Here is the reason...
if we set map-side-aggregation to false (set hive.map.aggr=false;) (to make my
point clear, let's also assume hive.groupby.skewindata is false), in
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan, genGroupByPlan1MR
will be used to generate the ReduceSinkOperator (through
genGroupByPlanReduceSinkOperator) and GroupByOperator
(genGroupByPlanGroupByOperator). Since in this case, there is no
map-side-aggregation, inside getReduceValuesForReduceSinkNoMapAgg,
getReduceValuesForReduceSinkNoMapAgg will be called to generate reduce values
by looking at the every children in every aggregation tree. You can see, inside
getReduceValuesForReduceSinkNoMapAgg, genExprNodeDesc is called to generate the
ExprNodeDesc and for the case I described in the description,
ExprNodeConstantDesc will be used for those constant parameters. For every
children in every aggregation tree, a entry from the parameter to a ColumnInfo
will be added into the reduceSinkOutputRowResolver. However, based on the code,
it seems that when an ASTNode has ColumnInfo, ExprNodeColumnDesc will be used
for this Node (when a node has ColumnInfo, the class of ExprNodeDesc should be
ExprNodeColumnDesc). Thus, in genGroupByPlanGroupByOperator, when all children
of all aggregation trees are being converted to aggParameters,
ExprNodeColumnDesc will be used because all parameters (no matter what it is)
have their own ColumnInfos. Thus, we will get the error.
To solve this problem, we can have two ways.
1) We extend the class of ColumnInfo to record ExprNodeDesc not just the
typeInfo.
2) For ExprNodeDesc other than ExprNodeColumnDesc, do not create ColumnInfo and
thus, RowResolver will not have a match. Then, we use genExprNodeDesc to
generate ExprNodeDesc in genGroupByPlanGroupByOperator and
genGroupByPlanGroupByOperator1.
Seems the second option is the right way since it has a clear meaning. Will do
that first.
> elements in aggParameters passed to SemanticAnalyzer.getGenericUDAFEvaluator
> are generated in two different ways
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-3495
> URL: https://issues.apache.org/jira/browse/HIVE-3495
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Yin Huai
> Assignee: Yin Huai
> Priority: Minor
>
> When I was working on HIVE-3493, I also found elements in aggParameters are
> generated by two different ways. One is
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(ASTNode,
> RowResolver). Another is to create an ExprNodeColumnDesc. Since a UDAF may
> need to check the type of its parameters, e.g. percentile_approx
> (GenericUDAFPercentileApprox), if the second way is used, we may get a
> UDFArgumentTypeException.
> An example used to reply the error is
> {code:sql}
> set hive.map.aggr=false;
> SELECT percentile_approx(cast(substr(src.value,5) AS double), 0.5) FROM src;
> {code}.
> Here is the log
> {code}
> 2012-09-20 12:36:06,947 DEBUG exec.FunctionRegistry
> (FunctionRegistry.java:getGenericUDAFResolver(849)) - Looking up GenericUDAF:
> percentile_approx
> 2012-09-20 12:36:06,952 ERROR ql.Driver (SessionState.java:printError(400)) -
> FAILED: UDFArgumentTypeException The second argument must be a constant, but
> double was passed instead.
> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: The second argument
> must be a constant, but double was passed instead.
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox.getEvaluator(GenericUDAFPercentileApprox.java:149)
> at
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:774)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:2389)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator(SemanticAnalyzer.java:2561)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlan1MR(SemanticAnalyzer.java:3341)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6140)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6903)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7484)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:903)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
> at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:713)
> at
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_replay(TestCliDriver.java:125)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:232)
> at junit.framework.TestSuite.run(TestSuite.java:227)
> at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:520)
> at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1060)
> at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:911)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira