[ https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765477#comment-15765477 ]
Chao Sun commented on HIVE-15477: --------------------------------- [~prasanth_j] wonder if you can take a look? thanks. > Provide options to adjust filter stats when column stats are not available > -------------------------------------------------------------------------- > > Key: HIVE-15477 > URL: https://issues.apache.org/jira/browse/HIVE-15477 > Project: Hive > Issue Type: Bug > Components: Statistics > Affects Versions: 2.2.0 > Reporter: Chao Sun > Assignee: Chao Sun > Attachments: HIVE-15477.1.patch > > > Currently when column stats are not available, Hive will assume the "worst" > case by setting the # of output rows to be 1/2 of the # of input rows, for > each predicate expression. This could be inaccurate, especially in the > presence of multiple predicates chained by AND. We have found in some cases > this could cause map join to have wrong ordering and thus fail with memory > issue. > One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) > that can be used to control the percentage of rows emitted by a predicate > expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)