Stamatis Zampetakis created HIVE-24252:
------------------------------------------
Summary: Improve decision model for using semijoin reducers
Key: HIVE-24252
URL: https://issues.apache.org/jira/browse/HIVE-24252
Project: Hive
Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
After a few experiments with TPC-DS 10TB dataset, we observed that in some
cases semijoin reducers were not effective; they didn't reduce the number of
records or they reduced the relation only a tiny bit.
In some cases we can make the semijoin reducer more effective by adding more
columns but this requires also a bigger bloom filter so the decision for the
number of columns to include in the bloom becomes more delicate.
The current decision model always chooses multi-column semijoin reducers if
they are available but this may not always beneficial if the a single column
can reduce significantly the target relation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)