[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900904#action_12900904 ] He Yongqiang commented on HIVE-1510: even without this patch, the 0.17 test failed on index_compat3.q. Please file a separate jira for this issue. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900940#action_12900940 ] Ning Zhang commented on HIVE-1510: -- it does't fail on trunk but caused by parallel test. HIVE-1576 was filed for this. Will tes again and commit once HIVE-1307 is committed. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900417#action_12900417 ] Ning Zhang commented on HIVE-1510: -- +1, will commit if tests pass HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900567#action_12900567 ] Ning Zhang commented on HIVE-1510: -- Yongqiang, the 0.17 test failed on index_compact3.q and script_pipe.q (the latter may be a false alarm). Can you take a look? HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900057#action_12900057 ] Ning Zhang commented on HIVE-1510: -- As discussed offline with Yongqiang, we should clean up the pathToPartitionInfo to contain only canonical representations for each partition. This could result in much cleaner code. If we do that IOPrepareCache is not needed at all and the function getPartitionDescFromPath is just simple hash lookup. We can make it as a follow up JIRA along with cleaning up the unnecessary info in pathToPartitionInfo as well. Here's some comments on the current patch: - the IOPrepareCache is cleared in Driver, which should only contain generic code irrespect to task types. Can you do it in ExecDriver.execute()? This will new cache is only used in ExecDriver anyways. - some comments on why you need a new hash map keyed with the paths only will be helpful. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900063#action_12900063 ] He Yongqiang commented on HIVE-1510: the IOPrepareCache is cleared in Driver, which should only contain generic code irrespect to task types. Can you do it in ExecDriver.execute()? This will new cache is only used in ExecDriver anyways. ExecDriver is per map-reduce task. Driver is per query. We should do this for query granularity. I think the pathToPartitionDesc is also per query map? some comments on why you need a new hash map keyed with the paths only will be helpful. will do it in a next patch. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900074#action_12900074 ] He Yongqiang commented on HIVE-1510: About the additional hashmap added, it is used to match path to partitionDesc by discarding partitionDesc's schema information. In the long run, we should normalize all input path to let them contain full schema and authorization information. This is a must to let hive work with multiple hdfs clusters. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900113#action_12900113 ] Ning Zhang commented on HIVE-1510: -- Other than the clean architecture concerns (Driver should be generic and should not assume tasks contain MR jobs), it seems also doesn't work if parallel execution is enabled: IOPrepareCache is thread local and parallel MR jobs are launched in different threads. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch, hive-1510.3.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899086#action_12899086 ] He Yongqiang commented on HIVE-1510: Since HIVE-1515 depends on Hadoop, can we close this jira without adding new archive testcases. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896752#action_12896752 ] He Yongqiang commented on HIVE-1510: Will update the patch once https://issues.apache.org/jira/browse/HIVE-1515 is in. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895370#action_12895370 ] Namit Jain commented on HIVE-1510: -- Some minor comments TestHiveFileFormatUtils: 1. Use a different PartitionDesc every time instead of partDesc_1 for the partitions. 2. Spelling mistakes: forth group Otherwise, it looks good to me. Ning, can you also OK it , since we spent a lot of time debugging in the past. Also, before checking it, can you try the following 4 types of queries (with CombineHiveInputFormat): 1. hadoop 17 normal query 2. hadoop 17 sampling query 3. hadoop 20 normal query 4. hadoop 20 sampling query HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895381#action_12895381 ] Ning Zhang commented on HIVE-1510: -- In HiveFileFormatUtils, removing scheme and authorities from the URI and only retain the path part may cause problem when the URI is har:// rather than hdfs://. This is one of the bugs that Paul fixed in hadoop for HAR to be able to work with CHIF. As a general comment, is it easier to just modify the pathToPartitionInfo to add a Path.SEPARATER at the end? You don't need to introduce the recursive checking and still can use the prefix matching. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895386#action_12895386 ] He Yongqiang commented on HIVE-1510: will also test against har. even the old code also remove the scheme and authorities part when try to match partitionDesc. no offense -- the old code is not very clear, and not efficient. The new code does the same thing with simplified logic. HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path Key: HIVE-1510 URL: https://issues.apache.org/jira/browse/HIVE-1510 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1510.1.patch set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; drop table combine_3_srcpart_seq_rc; create table combine_3_srcpart_seq_rc (key int , value string) partitioned by (ds string, hr string) stored as sequencefile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=00) select * from src; alter table combine_3_srcpart_seq_rc set fileformat rcfile; insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, hr=001) select * from src; desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00); desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001); select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key; drop table combine_3_srcpart_seq_rc; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.