[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147481#comment-14147481 ] Gunther Hagleitner commented on HIVE-8188: -- on commit, you might want to add some comments saying that annotation lookups are expensive and that isDeterministic and isEstimable cannot change while running through the op pipeline. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147477#comment-14147477 ] Gunther Hagleitner commented on HIVE-8188: -- +1 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148225#comment-14148225 ] Gunther Hagleitner commented on HIVE-8188: -- Committed to .14 branch. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148539#comment-14148539 ] Navis commented on HIVE-8188: - [~prasanth_j] The original patch didn't used annotation, but a reviewer wanted that to be an annotation. So I've just replaced instanceOf to isEstimable(), not thinking deeper. Thing had done like that. Still my bad. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148558#comment-14148558 ] Prasanth J commented on HIVE-8188: -- [~navis] Looking for annotation using reflection one time is not a big problem. But using it in inner loop (as it was done in Group By) is :) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148581#comment-14148581 ] Gopal V commented on HIVE-8188: --- [~navis]: No worries, took a special benchmark run to catch this. I'm trying to catch anything that looks like a .14 regression before the RC goes out - the UDF that hit this case was the NOW() UDF patch. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142300#comment-14142300 ] Hive QA commented on HIVE-8188: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12670244/HIVE-8188.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6298 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestParse.testParse_union {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/904/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/904/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-904/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12670244 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Labels: Performance Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140154#comment-14140154 ] Prasanth J commented on HIVE-8188: -- I think its because hash-aggregation needs to estimate the size of the hash map. The values of the hashmaps are UDAFs whose aggregation buffer size can be estimated if the aggregation buffer has this annotation @AggregationType(estimable = true). GroupByOperator.shouldBeFlushed() is called for every row that is added to hash map. shouldBeFlushed() calls isEstimable() helper function which uses reflection every time to see if the aggregation function is estimable. Not sure why it is done this way but yes this will be slow as hell. This needs to be fixed. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Attachments: udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140232#comment-14140232 ] Prasanth J commented on HIVE-8188: -- I tried to avoid this reflection invocation multiple times in inner loop by computing total aggregation size once and reusing it in inner loop. I ran the following query {code} select ss_quantity, ss_store_sk, ss_promo_sk, count(ss_list_price), count(ss_sales_price), sum(ss_ext_sales_price) from store_sales_orc group by ss_quantity,ss_store_sk,ss_promo_sk; {code} store_sales had 2880404 rows. The original execution time was 18.5s and with the above changes the time went down to 15.5s which is ~17% gain which explains the reflection cost from the attached image. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Attachments: udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141095#comment-14141095 ] Gopal V commented on HIVE-8188: --- [~prasanth_j]: that is pretty neat, speedup. But that's not the place I found the fix in, it was in isDeterministic() within the Constant codepath in ExprNode evaluator. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Attachments: udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
[ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141099#comment-14141099 ] Prasanth J commented on HIVE-8188: -- Looking at the attached PNG (GBY + Reflection), I thought its UDAF that uses reflection in inner loop. Looks like many places needs improvement then. ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop - Key: HIVE-8188 URL: https://issues.apache.org/jira/browse/HIVE-8188 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Gopal V Attachments: udf-deterministic.png When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row. !udf-deterministic.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)