[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

Ashutosh Chauhan (JIRA) Tue, 30 May 2017 14:25:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030161#comment-16030161
 ]


Ashutosh Chauhan commented on HIVE-16757:
-----------------------------------------

Left few comments on RB.

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16757
>                 URL: https://issues.apache.org/jira/browse/HIVE-16757
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>         Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>       at GeneratedMetadataHandler_RowCount.getRowCount_$
>       at GeneratedMetadataHandler_RowCount.getRowCount
>       at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>       at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>       at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

Reply via email to