[ https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367545#comment-14367545 ]
Albert Chu commented on MAPREDUCE-5528: --------------------------------------- Sure, I'll update the patch and retest just to make sure everything is fine and dandy. > TeraSort fails with "can't read paritions file" - does not read partition > file from distributed cache > ----------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples > Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0 > Reporter: Albert Chu > Assignee: Albert Chu > Priority: Minor > Attachments: MAPREDUCE-5528.patch > > > I was trying to run TeraSort against a parallel networked file system, > setting things up via the 'file://" scheme. I always got the following error > when running terasort: > {noformat} > 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : > attempt_1379960046506_0001_m_000080_1, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > at > org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:678) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) > Caused by: java.io.FileNotFoundException: File _partition.lst does not exist > at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:486) > at org.apache.hadoop.util.Shell.run(Shell.java:417) > at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74) > at > org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763) > at > org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161) > at > org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246) > ... 10 more > {noformat} > After digging into TeraSort, I noticed that the partitions file was created > in the output directory, then added into the distributed cache > {noformat} > Path outputDir = new Path(args[1]); > ... > Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME); > ... > job.addCacheFile(partitionUri); > {noformat} > but the partitions file doesn't seem to be read back from the output > directory or distributed cache: > {noformat} > FileSystem fs = FileSystem.getLocal(conf); > ... > Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME); > splitPoints = readPartitions(fs, partFile, conf); > {noformat} > It seems the file is being read from whatever the working directory is for > the filesystem returned from FileSystem.getLocal(conf). > Under HDFS this code works, the working directory seems to be the distributed > cache (I guess by default??). > But when I set things up with the networked file system and 'file://' scheme, > the working directory was the directory I was running my Hadoop binaries out > of. > The attached patch fixed things for me. It grabs the partition file from the > distributed cache all of the time, instead of trusting things underneath to > work out. It seems to be the right thing to do??? > Apologies, I was unable to get this to reproduce under the TeraSort example > tests, such as TestTeraSort.java, so no test added. Not sure what the subtle > difference is in the setup. I tested under both HDFS & 'file' scheme and the > patch worked under both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)