[ https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906898#action_12906898 ]
Sammy Yu commented on HIVE-1610: -------------------------------- He, yes that's what the original 0002 patch does (it adds an additional check to ignore the port as well as test case for it). I'm not sure why there's a disparity in the port being there in the first place. I'll regenerate 0002 patch for svn against tr...@993445. Thanks! > Using CombinedHiveInputFormat causes partToPartitionInfo IOException > ---------------------------------------------------------------------- > > Key: HIVE-1610 > URL: https://issues.apache.org/jira/browse/HIVE-1610 > Project: Hadoop Hive > Issue Type: Bug > Environment: Hadoop 0.20.2 > Reporter: Sammy Yu > Attachments: > 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, > 0003-HIVE-1610.patch > > > I have a relatively complicated hive query using CombinedHiveInputFormat: > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.dynamic.partition=true; > set hive.exec.max.dynamic.partitions=1000; > set hive.exec.max.dynamic.partitions.pernode=300; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select > distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, > keywords.universal_rank, keywords.serp_type, keywords.date_indexed, > keywords.search_engine_type, keywords.week from keyword_serp_results keywords > JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, > min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, > keywords1.search_engine_type, keywords1.week, keywords1.rank, > dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN > (select domain, keyword, search_engine_type, week, max(date_indexed) as > max_date_indexed from keyword_serp_results group by > domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = > dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND > keywords1.search_engine_type = dupkeywords1.search_engine_type AND > keywords1.week = dupkeywords1.week AND keywords1.date_indexed = > dupkeywords1.max_date_indexed) dupkeywords2 group by > domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on > keywords.keyword = dupkeywords3.keyword AND keywords.domain = > dupkeywords3.domain AND keywords.search_engine_type = > dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND > keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = > dupkeywords3.best_rank; > > This query use to work fine until I updated to r991183 on trunk and started > getting this error: > java.io.IOException: cannot find dir = > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0 > in > partToPartitionInfo: > [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:100) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) > at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) > This query works if I don't change the hive.input.format. > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > I've narrowed down this issue to the commit for HIVE-1510. If I take out the > changeset from r987746, everything works as before. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.