[ https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030410#comment-16030410 ]
Hive QA commented on HIVE-16757: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12870455/HIVE-16757.06.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10791 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=228) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] (batchId=9) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5481/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5481/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5481/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12870455 - PreCommit-HIVE-Build > Use of deprecated getRows() instead of new > estimateRowCount(RelMetadataQuery..) has serious performance impact > -------------------------------------------------------------------------------------------------------------- > > Key: HIVE-16757 > URL: https://issues.apache.org/jira/browse/HIVE-16757 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, > HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, > HIVE-16757.06.patch > > > Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because > it places a new memoization cache on the stack. Hidden in the deperecated > {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we > have a number of places where we're calling the deprecated {{getRows()}} > instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which > accepts the RelMetadataQuery, which most places we actually have it handy to > pass. On looking at the a complex query (49 joins) there are 2995340 calls to > {{AbstractRelNode.getRows}}, each one busting the current memoization cache > away. > Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many > times. since it does not memoize its result and the call is recursive, it > results in an explosion of calls. for example a query with 49 joins, during > join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets > called 6442 as a top level call, but the recursivity exploded this to 501729 > calls. Memoization of the rezult would stop the recursion early. In my > testing this reduced the join reordering time for said query from 11s to > <1s..- > Note there is no need for {{HiveRelMdRowCount}} memoization because the > function is called in stacks similar to this: > {code} > at > org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66) > at GeneratedMetadataHandler_RowCount.getRowCount_$ > at GeneratedMetadataHandler_RowCount.getRowCount > at > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739) > {code} > and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization. -- This message was sent by Atlassian JIRA (v6.3.15#6346)