[
https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153741#comment-14153741
]
Mithun Radhakrishnan commented on HIVE-8313:
--------------------------------------------
This seems to have to do with the changes introduced in HIVE-4209, to provide
caching for evaluation of deterministic sub-expressions.
In this particular case, the problem occurs in
{{ExprNodeGenericFuncEvaluator::_evaluate()}}:
{code:title=ExprNodeGenericFuncEvaluator.java|borderStyle=solid}
@Override
protected Object _evaluate(Object row, int version) throws HiveException {
rowObject = row;
if (ObjectInspectorUtils.isConstantObjectInspector(outputOI) &&
isDeterministic()) {
// The output of this UDF is constant, so don't even bother evaluating.
return ((ConstantObjectInspector)outputOI).getWritableConstantValue();
}
for (int i = 0; i < deferredChildren.length; i++) {
deferredChildren[i].prepare(version);
}
return genericUDF.evaluate(deferredChildren);
}
{code}
In Hive 0.10, the {{deferredChildren[i].evaluate()}} would be skipped in its
entirety, for "non-eager" evaluation. In Hive 0.12, that condition is checked
within the {{prepare()}} function, on every invocation, for *each record*, with
explosive effect.
A lot of this cost can be saved by skipping prepare() for
{{ExprNodeEvaluator}}s which yield the same value regardless of the row. E.g.
{{ExprNodeConstantEvaluator}} and {{ExprNodeNullEvaluator}}. I'll post a patch
for this shortly.
> Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
> ---------------------------------------------------------------------------
>
> Key: HIVE-8313
> URL: https://issues.apache.org/jira/browse/HIVE-8313
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.12.0, 0.13.0, 0.14.0
> Reporter: Mithun Radhakrishnan
> Assignee: Mithun Radhakrishnan
>
> Consider the following query:
> {code}
> SELECT foo, bar, goo, id
> FROM myTable
> WHERE id IN { 'A', 'B', 'C', 'D', ... , 'ZZZZZZ' };
> {code}
> One finds that when the IN clause has several thousand elements (and the
> table has several million rows), the query above takes orders-of-magnitude
> longer to run on Hive 0.12 than say Hive 0.10.
> I have a possibly incomplete fix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)