[ 
https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165647#comment-17165647
 ] 

Zhihua Deng edited comment on HIVE-23893 at 7/27/20, 11:39 AM:
---------------------------------------------------------------

[~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, 
I'm wondering that we cannot push the deterministic conditions when there are 
some nondeterministic conditions along, as the test case 
_rand_partitionpruner3.q_ shows that if i push *not(key > 50 or key < 10)* 
down,  the results are seen differently compared to the elder.  the ealier 
evaluation of *not(key > 50 or key < 10)* may affect the evaluation of *RAND(1) 
< 0.1*,  which make the result unpredicatable. But if we enable cbo, the 
deterministic conditions are seen to be pushed down, as the case shows: 
_explain extended select a.*, b.* from a join b on a.k = b.k where a.hs = 11 
and b.hs <= 10 and rand(1) < 0.1_, maybe it's better for the rbo of hive to 
work consistently to the cbo.


was (Author: dengzh):
[~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, 
I'm wondering if we can push the deterministic conditions when there are some 
nondeterministic conditions along, as the test case _rand_partitionpruner3.q_ 
shows that if i push *not(key > 50 or key < 10)* down,  the results are seen 
differently compared to the elder.  the ealier evaluation of *not(key > 50 or 
key < 10)* may affect the evaluation of *RAND(1) < 0.1*,  which make the result 
unpredicatable. But if we enable cbo, the deterministic conditions are seen to 
be pushed down, as the case shows: _explain extended select a.*, b.* from a 
join b on a.k = b.k where a.hs = 11 and b.hs <= 10 and rand(1) < 0.1_, maybe 
it's better for the rbo of hive to work consistently to the cbo.

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-23893
>                 URL: https://issues.apache.org/jira/browse/HIVE-23893
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>            Reporter: Zhihua Deng
>            Assignee: zhishui
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to