Gunther Hagleitner created HIVE-7826:
----------------------------------------
Summary: Dynamic partition pruning on Tez
Key: HIVE-7826
URL: https://issues.apache.org/jira/browse/HIVE-7826
Project: Hive
Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
It's natural in a star schema to map one or more dimensions to partition
columns. Time or location are likely candidates.
It can also useful to be to compute the partitions one would like to scan via a
subquery (where p in select ... from ...).
The resulting joins in hive require a full table scan of the large table
though, because partition pruning takes place before the corresponding values
are known.
On Tez it's relatively straight forward to send the values needed to prune to
the application master - where splits are generated and tasks are submitted.
Using these values we can strip out any unneeded partitions dynamically, while
the query is running.
The approach is straight forward:
- Insert synthetic conditions for each join representing "x in (keys of other
side in join)"
- This conditions will be pushed as far down as possible
- If the condition hits a table scan and the column involved is a partition
column:
- Setup Operator to send key events to AM
- else:
- Remove synthetic predicate
--
This message was sent by Atlassian JIRA
(v6.2#6252)