[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147481#comment-14147481
 ] 

Gunther Hagleitner commented on HIVE-8188:
--

on commit, you might want to add some comments saying that annotation lookups 
are expensive and that isDeterministic and isEstimable cannot change while 
running through the op pipeline.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147477#comment-14147477
 ] 

Gunther Hagleitner commented on HIVE-8188:
--

+1

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148225#comment-14148225
 ] 

Gunther Hagleitner commented on HIVE-8188:
--

Committed to .14 branch.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Fix For: 0.14.0

 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-25 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148539#comment-14148539
 ] 

Navis commented on HIVE-8188:
-

[~prasanth_j] The original patch didn't used annotation, but a reviewer wanted 
that to be an annotation. So I've just replaced instanceOf to isEstimable(), 
not thinking deeper. Thing had done like that. Still my bad.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Fix For: 0.14.0

 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-25 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148558#comment-14148558
 ] 

Prasanth J commented on HIVE-8188:
--

[~navis] Looking for annotation using reflection one time is not a big problem. 
But using it in inner loop (as it was done in Group By) is :)

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Fix For: 0.14.0

 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148581#comment-14148581
 ] 

Gopal V commented on HIVE-8188:
---

[~navis]: No worries, took a special benchmark run to catch this.

I'm trying to catch anything that looks like a .14 regression before the RC 
goes out - the UDF that hit this case was the NOW() UDF patch.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Fix For: 0.14.0

 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142300#comment-14142300
 ] 

Hive QA commented on HIVE-8188:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12670244/HIVE-8188.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6298 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/904/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/904/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-904/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12670244

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Attachments: HIVE-8188.1.patch, HIVE-8188.2.patch, 
 udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140154#comment-14140154
 ] 

Prasanth J commented on HIVE-8188:
--

I think its because hash-aggregation needs to estimate the size of the hash 
map. The values of the hashmaps are UDAFs whose aggregation buffer size can be 
estimated if the aggregation buffer has this annotation 
@AggregationType(estimable = true). GroupByOperator.shouldBeFlushed() is 
called for every row that is added to hash map. shouldBeFlushed() calls 
isEstimable() helper function which uses reflection every time to see if the 
aggregation function is estimable. Not sure why it is done this way but yes 
this will be slow as hell. This needs to be fixed.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
 Attachments: udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140232#comment-14140232
 ] 

Prasanth J commented on HIVE-8188:
--

I tried to avoid this reflection invocation multiple times in inner loop by 
computing total aggregation size once and reusing it in inner loop. I ran the 
following query
{code}
select ss_quantity, ss_store_sk, ss_promo_sk, count(ss_list_price), 
count(ss_sales_price), sum(ss_ext_sales_price) from store_sales_orc group by 
ss_quantity,ss_store_sk,ss_promo_sk;
{code}

store_sales had 2880404 rows. The original execution time was 18.5s and with 
the above changes the time went down to 15.5s which is ~17% gain which explains 
the reflection cost from the attached image.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
 Attachments: udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141095#comment-14141095
 ] 

Gopal V commented on HIVE-8188:
---

[~prasanth_j]: that is pretty neat, speedup.

But that's not the place I found the fix in, it was in isDeterministic() within 
the Constant codepath in ExprNode evaluator.

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
 Attachments: udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

2014-09-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141099#comment-14141099
 ] 

Prasanth J commented on HIVE-8188:
--

Looking at the attached PNG (GBY + Reflection), I thought its UDAF that uses 
reflection in inner loop. Looks like many places needs improvement then. 

 ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight 
 loop
 -

 Key: HIVE-8188
 URL: https://issues.apache.org/jira/browse/HIVE-8188
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Gopal V
 Attachments: udf-deterministic.png


 When running a near-constant UDF, most of the CPU is burnt within the VM 
 trying to read the class annotations for every row.
 !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)