[
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gunther Hagleitner updated HIVE-7826:
-------------------------------------
Attachment: HIVE-7826.4.patch
.4 fixes small issue with stats annotation for event operators.
> Dynamic partition pruning on Tez
> --------------------------------
>
> Key: HIVE-7826
> URL: https://issues.apache.org/jira/browse/HIVE-7826
> Project: Hive
> Issue Type: Bug
> Reporter: Gunther Hagleitner
> Assignee: Gunther Hagleitner
> Labels: TODOC14, tez
> Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch,
> HIVE-7826.4.patch
>
>
> It's natural in a star schema to map one or more dimensions to partition
> columns. Time or location are likely candidates.
> It can also useful to be to compute the partitions one would like to scan via
> a subquery (where p in select ... from ...).
> The resulting joins in hive require a full table scan of the large table
> though, because partition pruning takes place before the corresponding values
> are known.
> On Tez it's relatively straight forward to send the values needed to prune to
> the application master - where splits are generated and tasks are submitted.
> Using these values we can strip out any unneeded partitions dynamically,
> while the query is running.
> The approach is straight forward:
> - Insert synthetic conditions for each join representing "x in (keys of other
> side in join)"
> - This conditions will be pushed as far down as possible
> - If the condition hits a table scan and the column involved is a partition
> column:
> - Setup Operator to send key events to AM
> - else:
> - Remove synthetic predicate
--
This message was sent by Atlassian JIRA
(v6.2#6252)