Marc Limotte created HADOOP-9328: ------------------------------------ Summary: INSERT INTO a S3 external table with no reduce phase results in FileNotFoundException Key: HADOOP-9328 URL: https://issues.apache.org/jira/browse/HADOOP-9328 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.9.0 Environment: YARN, Hadoop 2.0.2-alpha Ubuntu Reporter: Marc Limotte
With Yarn and Hadoop 2.0.2-alpha, hive 0.9.0. The destination is an S3 table, the source for the query is a small hive managed table. CREATE EXTERNAL TABLE payout_state_product ( state STRING, product_id STRING, element_id INT, element_value DOUBLE, number_of_fields INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 's3://com.weatherbill.foo/bar/payout_state_product/'; A simple query to copy the results from the hive managed table into a S3. hive> INSERT OVERWRITE TABLE payout_state_product SELECT * FROM payout_state_product_cached; Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1360884012490_0014, Tracking URL = http://i-9ff9e9ef.us-east-1.production.climatedna.net:8088/proxy/application_1360884012490_0014/ Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=i-9ff9e9ef.us-east-1.production.climatedna.net:8032 -kill job_1360884012490_0014 Hadoop job information for Stage-1: number of mappers: 100; number of reducers: 0 2013-02-22 19:15:46,709 Stage-1 map = 0%, reduce = 0% ...snip... 2013-02-22 19:17:02,374 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 427.13 sec MapReduce Total cumulative CPU time: 7 minutes 7 seconds 130 msec Ended Job = job_1360884012490_0014 Ended Job = -1776780875, job is filtered out (removed at runtime). Launching Job 2 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator java.io.FileNotFoundException: File does not exist: /tmp/hive-marc/hive_2013-02-22_19-15-31_691_7365912335285010827/-ext-10002/000000_0 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:493) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:284) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:244) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:386) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:352) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.processPaths(CombineHiveInputFormat.java:419) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:390) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:479) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /tmp/hive-marc/hive_2013-02-22_19-15-31_691_7365912335285010827/-ext-10002/000000_0)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira