[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553184#comment-13553184 ] Ashutosh Chauhan commented on HIVE-3426: I wonder if HIVE-3276 already covers the simple case. union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Zhenxiao Luo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551621#comment-13551621 ] Ashutosh Chauhan commented on HIVE-3426: What Zhenxiao is proposing a good first step. Lets try to optimize that query first and worry about GBY in subquery later. union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Zhenxiao Luo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551701#comment-13551701 ] Shreepadma Venugopalan commented on HIVE-3426: -- Yup, let's try to optimize the simple case first. Optimizing subqueries with GBY can be the next step. union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Zhenxiao Luo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466544#comment-13466544 ] Namit Jain commented on HIVE-3426: -- Yes, but what about the case when there is an group by in the sub-query ? union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Zhenxiao Luo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466419#comment-13466419 ] Zhenxiao Luo commented on HIVE-3426: TS -- FIL -- SEL \ \ UNION / / TS -- TIL -- SEL Needs to be updated into: TS --FIL -- SEL And, the FIL condition needs to be updated, eg. select key from srcpart where key = 484 union all select key from srcpart where key = 409 should be updated into: select key from srcpart where key = 484 OR key = 409 union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447496#comment-13447496 ] Namit Jain commented on HIVE-3426: -- Consider a query like: insert overwrite table tst_output partition (ds='1') select key, keyVal, agg from ( select key, '238' as keyVal, count(1) as agg from srcpart where ds = '2008-04-08' and hr = '11' and key = 238 group by key union all select key, '165' as keyVal, count(1) as agg from srcpart where ds = '2008-04-08' and hr = '11' and key = 165 group by key union all select key, '409' as keyVal, count(1) as agg from srcpart where ds = '2008-04-08' and hr = '12' and key = 409 group by key union all select key, '484' as keyVal, count(1) as agg from srcpart where ds = '2008-04-08' and hr = '12' and key = 484 group by key ) subq; It requires different map-reduce jobs for each sub-query. Since the same base table is being queried, ideally it should be a single map-reduce job. Atleast a single scan should be performed on ds=2008-04-08/hr=11 and hr=12 respectively. The query plan should be like a query plan with different branches for the table scan. union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira