[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030410#comment-16030410
 ] 

Hive QA commented on HIVE-16757:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870455/HIVE-16757.06.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10791 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=228)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5481/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5481/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5481/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870455 - PreCommit-HIVE-Build

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16757
>                 URL: https://issues.apache.org/jira/browse/HIVE-16757
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>         Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>       at GeneratedMetadataHandler_RowCount.getRowCount_$
>       at GeneratedMetadataHandler_RowCount.getRowCount
>       at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>       at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>       at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to