-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28500/
-----------------------------------------------------------
Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang.
Bugs: HIVE-8943
https://issues.apache.org/jira/browse/HIVE-8943
Repository: hive-git
Description
-------
SparkMapJoinOptimizer by default combines nested mapjoins into one work due to
removal of RS for big-table. So we need to enhance the mapjoin check to
calculate if all the MapJoins in that work (spark-stage) will fit into the
memory, otherwise it might overwhelm memory for that particular spark executor.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
819eef1
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
0c339a5
ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION
ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION
ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION
ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION
ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION
ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/28500/diff/
Testing
-------
Added two unit tests:
1. auto_join_stats, which sets a memory limit and checks that algorithm does
not put more than 1 mapjoin in one BaseWork
2. auto_join_stats2, which is the same query without memory limit, and check
that algorithm puts all mapjoin in one BaseWork because it can.
Thanks,
Szehon Ho