[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111462#comment-14111462
 ] 

Gunther Hagleitner commented on HIVE-7826:
------------------------------------------

[~damien.carol] thank you for your interest. This feature is Tez only right 
now. But if you are using tez and you have a cluster with tez 0.5 running you 
can give this a spin. You basically need to use the apache tez branch and apply 
this patch. The relevant configs are:

hive.tez.dynamic.partition.pruning=true (turn it on or off)
hive.tez.dynamic.partition.pruning.max.event.size=size in bytes (maximum size 
of the event that the task will send to the AM, if it's bigger it will turn 
itself off)
hive.tez.dynamic.parition.pruning.max.data.size=size in bytes (maximum total 
size of expected output in the planning stage, if expected size is bigger, it 
will turn itself off)

Any feedback is welcome. Functionality and performance. If you describe your 
use case to me, I will make sure it's covered in the unit tests. If you're 
game: Code review is also welcome.

> Dynamic partition pruning on Tez
> --------------------------------
>
>                 Key: HIVE-7826
>                 URL: https://issues.apache.org/jira/browse/HIVE-7826
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>              Labels: TODOC14, tez
>         Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch
>
>
> It's natural in a star schema to map one or more dimensions to partition 
> columns. Time or location are likely candidates. 
> It can also useful to be to compute the partitions one would like to scan via 
> a subquery (where p in select ... from ...).
> The resulting joins in hive require a full table scan of the large table 
> though, because partition pruning takes place before the corresponding values 
> are known.
> On Tez it's relatively straight forward to send the values needed to prune to 
> the application master - where splits are generated and tasks are submitted. 
> Using these values we can strip out any unneeded partitions dynamically, 
> while the query is running.
> The approach is straight forward:
> - Insert synthetic conditions for each join representing "x in (keys of other 
> side in join)"
> - This conditions will be pushed as far down as possible
> - If the condition hits a table scan and the column involved is a partition 
> column:
>    - Setup Operator to send key events to AM
> - else:
>    - Remove synthetic predicate



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to