[ https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165647#comment-17165647 ]
Zhihua Deng edited comment on HIVE-23893 at 7/27/20, 11:39 AM: --------------------------------------------------------------- [~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, I'm wondering that we cannot push the deterministic conditions when there are some nondeterministic conditions along, as the test case _rand_partitionpruner3.q_ shows that if i push *not(key > 50 or key < 10)* down, the results are seen differently compared to the elder. the ealier evaluation of *not(key > 50 or key < 10)* may affect the evaluation of *RAND(1) < 0.1*, which make the result unpredicatable. But if we enable cbo, the deterministic conditions are seen to be pushed down, as the case shows: _explain extended select a.*, b.* from a join b on a.k = b.k where a.hs = 11 and b.hs <= 10 and rand(1) < 0.1_, maybe it's better for the rbo of hive to work consistently to the cbo. was (Author: dengzh): [~zhishui] Thanks for your concern, I leave some comments in your pr. In fact, I'm wondering if we can push the deterministic conditions when there are some nondeterministic conditions along, as the test case _rand_partitionpruner3.q_ shows that if i push *not(key > 50 or key < 10)* down, the results are seen differently compared to the elder. the ealier evaluation of *not(key > 50 or key < 10)* may affect the evaluation of *RAND(1) < 0.1*, which make the result unpredicatable. But if we enable cbo, the deterministic conditions are seen to be pushed down, as the case shows: _explain extended select a.*, b.* from a join b on a.k = b.k where a.hs = 11 and b.hs <= 10 and rand(1) < 0.1_, maybe it's better for the rbo of hive to work consistently to the cbo. > Extract deterministic conditions for pdd when the predicate contains > non-deterministic function > ----------------------------------------------------------------------------------------------- > > Key: HIVE-23893 > URL: https://issues.apache.org/jira/browse/HIVE-23893 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer > Reporter: Zhihua Deng > Assignee: zhishui > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Taken the following query for example, assume unix_timestamp is > non-deterministic before version 1.3.0: > > {{SELECT}} > {{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}} > {{ b.game_id AS game_id,}} > {{ b.game_name AS game_name,}} > {{ count(DISTINCT a.sha1_imei) uv}} > {{FROM}} > {{ gamesdk_userprofile a}} > {{ JOIN game_info_all b ON a.appid = b.dev_app_id}} > {{WHERE}} > {{ a.date = 20200704}} > {{ AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = > 20200704}} > {{ AND b.date = 20200704}} > {{GROUP BY}} > {{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}} > {{ b.game_id,}} > {{ b.game_name}} > {{ORDER BY}} > {{ uv DESC}} > {{LIMIT 200;}} > > The predicates(a.date = 20200704, b.date = 20200704) are unable to push down > to join op, make the optimizer unable to prune partitions, which may result > to a full scan on tables gamesdk_userprofile and game_info_all. -- This message was sent by Atlassian Jira (v8.3.4#803005)