[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez

Damien Carol (JIRA) Mon, 01 Sep 2014 02:06:11 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117222#comment-14117222
 ]


Damien Carol commented on HIVE-7826:
------------------------------------

[~hagleitn] We used apache tez branch and deployed tez 0.5 to test this patch.
We haven't seen any problems of performance. Simply we weren't able to activate 
the pruning (we don't see anything in the logs).
Maybe our use case doesn't fit well.
We use tez for OLAP analysis. Some queries like that one :
{code:sql}
SELECT d1.label, count(*), sum(agg.amount) 
FROM agg_01 agg,
dim_shops d1
WHERE agg.dim_shops_id = d1.id
and
d1.label in ('foo', 'bar')
GROUP BY d1.label
ORDER BY d1.label
{code}
I was expecting that if agg_01 is partitioned by dim_shops_id, dynamic pruning 
will be activated.


> Dynamic partition pruning on Tez
> --------------------------------
>
>                 Key: HIVE-7826
>                 URL: https://issues.apache.org/jira/browse/HIVE-7826
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>              Labels: TODOC14, tez
>         Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
> HIVE-7826.4.patch, HIVE-7826.5.patch
>
>
> It's natural in a star schema to map one or more dimensions to partition 
> columns. Time or location are likely candidates. 
> It can also useful to be to compute the partitions one would like to scan via 
> a subquery (where p in select ... from ...).
> The resulting joins in hive require a full table scan of the large table 
> though, because partition pruning takes place before the corresponding values 
> are known.
> On Tez it's relatively straight forward to send the values needed to prune to 
> the application master - where splits are generated and tasks are submitted. 
> Using these values we can strip out any unneeded partitions dynamically, 
> while the query is running.
> The approach is straight forward:
> - Insert synthetic conditions for each join representing "x in (keys of other 
> side in join)"
> - This conditions will be pushed as far down as possible
> - If the condition hits a table scan and the column involved is a partition 
> column:
>    - Setup Operator to send key events to AM
> - else:
>    - Remove synthetic predicate
> Add  these properties :
> ||Property||Default Value||
> |{{hive.tez.dynamic.partition.pruning}}|true|
> |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
> |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez

Reply via email to