[
https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905635#action_12905635
]
Sammy Yu commented on HIVE-1610:
--------------------------------
Yongqiang, thanks for taking a look at this.
If I take out the URI scheme checks, the original
TestHiveFileFormatUtils.testGetPartitionDescFromPathRecursively test case fails:
[junit] Running org.apache.hadoop.hive.ql.io.TestHiveFileFormatUtils
[junit] junit.framework.TestListener: tests to run: 2
[junit] junit.framework.TestListener:
startTest(testGetPartitionDescFromPathRecursively)
[junit] junit.framework.TestListener:
addFailure(testGetPartitionDescFromPathRecursively,
hdfs:///tbl/par1/part2/part3 should return null expected:<true> but was:<false>)
[junit] junit.framework.TestListener:
endTest(testGetPartitionDescFromPathRecursively)
[junit] junit.framework.TestListener:
startTest(testGetPartitionDescFromPathWithPort)
[junit] junit.framework.TestListener:
endTest(testGetPartitionDescFromPathWithPort)
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.091 sec
[junit] Test org.apache.hadoop.hive.ql.io.TestHiveFileFormatUtils FAILED
hdfs:///tbl/par1/part2/part3 should not match any PartitionDesc since the path
in the map is file:///tbl/par1/part2/part3. I will attach the svn version of
the patch shortly.
> Using CombinedHiveInputFormat causes partToPartitionInfo IOException
> ----------------------------------------------------------------------
>
> Key: HIVE-1610
> URL: https://issues.apache.org/jira/browse/HIVE-1610
> Project: Hadoop Hive
> Issue Type: Bug
> Environment: Hadoop 0.20.2
> Reporter: Sammy Yu
> Attachments:
> 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch
>
>
> I have a relatively complicated hive query using CombinedHiveInputFormat:
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions=1000;
> set hive.exec.max.dynamic.partitions.pernode=300;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select
> distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank,
> keywords.universal_rank, keywords.serp_type, keywords.date_indexed,
> keywords.search_engine_type, keywords.week from keyword_serp_results keywords
> JOIN (select domain, keyword, search_engine_type, week, max_date_indexed,
> min(rank) as best_rank from (select keywords1.domain, keywords1.keyword,
> keywords1.search_engine_type, keywords1.week, keywords1.rank,
> dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN
> (select domain, keyword, search_engine_type, week, max(date_indexed) as
> max_date_indexed from keyword_serp_results group by
> domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword =
> dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND
> keywords1.search_engine_type = dupkeywords1.search_engine_type AND
> keywords1.week = dupkeywords1.week AND keywords1.date_indexed =
> dupkeywords1.max_date_indexed) dupkeywords2 group by
> domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on
> keywords.keyword = dupkeywords3.keyword AND keywords.domain =
> dupkeywords3.domain AND keywords.search_engine_type =
> dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND
> keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank =
> dupkeywords3.best_rank;
>
> This query use to work fine until I updated to r991183 on trunk and started
> getting this error:
> java.io.IOException: cannot find dir =
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0
> in
> partToPartitionInfo:
> [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:100)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> This query works if I don't change the hive.input.format.
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> I've narrowed down this issue to the commit for HIVE-1510. If I take out the
> changeset from r987746, everything works as before.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.