[ https://issues.apache.org/jira/browse/MAPREDUCE-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910504#action_12910504 ]
Amar Kamat commented on MAPREDUCE-2078: --------------------------------------- There is a {{FileSystem.globStatus(Path)}} API in FileSystem to enumerate all the paths represented by a globbed path. The current {{TraceBuilder}} code does the following {code} for (int i = 2 + switchTop; i < args.length; ++i) { Path thisPath = new Path(args[i]); FileSystem fs = thisPath.getFileSystem(conf); if (fs.getFileStatus(thisPath).isDirectory()) { FileStatus[] statuses = fs.listStatus(thisPath); for (FileStatus s : statuses) { // process the file .. } } {code} This needs to changed to first flatten the globbed paths passed as input. So the suggested fix is {code} for (int i = 2 + switchTop; i < args.length; ++i) { // iterate over the input Path thisPath = new Path(args[i]); // get the filesystem specific to the input passed FileSystem fs = thisPath.getFileSystem(conf); // flatten the globbed file path FileStatus[] realStatuses = fs.globStatus(thisPath); // iterate over all the files under the globbed input path for (FileStatus status : realStatuses) { // extract the actual (flat) path from the file status Path realPath = status.getPath(); // now do what is done in the trunk if (fs.getFileStatus(realPath).isDirectory()) { FileStatus[] statuses = fs.listStatus(realPath); for (FileStatus s : statuses) { // process the file .. } } } } {code} I ran {{TraceBuilder}} with this fix and now it works with globbed input paths. > TraceBuilder unable to generate the traces while giving the job history path > by globing. > ---------------------------------------------------------------------------------------- > > Key: MAPREDUCE-2078 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2078 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen > Reporter: Vinay Kumar Thota > Assignee: Amar Kamat > > I was trying to generate the traces for MR job histories by using > TraceBuilder. However, it's unable to generate the traces while giving the > job history path by globing. It throws a file not found exception even though > the job history path is exists. > I have provide the job history path in the below way. > hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*/ > Exception: > java.io.FileNotFoundException: File does not exist: > hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/* > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525) > at > org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.<init>(TraceBuilder.java:88) > at > org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:183) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:121) > It's truncating the last slash in the path. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.