[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910504#action_12910504
 ] 

Amar Kamat commented on MAPREDUCE-2078:
---------------------------------------

There is a {{FileSystem.globStatus(Path)}} API in FileSystem to enumerate all 
the paths represented by a globbed path. 

The current {{TraceBuilder}} code does the following
{code}
  for (int i = 2 + switchTop; i < args.length; ++i) {
    Path thisPath = new Path(args[i]);
    FileSystem fs = thisPath.getFileSystem(conf);
    if (fs.getFileStatus(thisPath).isDirectory()) {
      FileStatus[] statuses = fs.listStatus(thisPath);
      for (FileStatus s : statuses) {
        // process the file 
        ..
      }
    }
{code}

This needs to changed to first flatten the globbed paths passed as input. So 
the suggested fix is 
{code}
  for (int i = 2 + switchTop; i < args.length; ++i) { // iterate over the input
    Path thisPath = new Path(args[i]);
    // get the filesystem specific to the input passed
    FileSystem fs = thisPath.getFileSystem(conf);

    // flatten the globbed file path
    FileStatus[] realStatuses = fs.globStatus(thisPath);

    // iterate over all the files under the globbed input path
    for (FileStatus status : realStatuses) {
      // extract the actual (flat) path from the file status
      Path realPath = status.getPath();

      // now do what is done in the trunk 
      if (fs.getFileStatus(realPath).isDirectory()) {
      FileStatus[] statuses = fs.listStatus(realPath);
      for (FileStatus s : statuses) {
        // process the file 
        ..
      }
    }
  }
}
{code}

I ran {{TraceBuilder}} with this fix and now it works with globbed input paths.

> TraceBuilder unable to generate the traces while giving the job history path 
> by globing.
> ----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2078
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2078
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Vinay Kumar Thota
>            Assignee: Amar Kamat
>
> I was trying to generate the traces for MR job histories by using 
> TraceBuilder. However, it's unable to generate the traces while giving the 
> job history path by globing. It throws a file not found exception even though 
> the job history path is exists.
> I have provide the job history path in the below way.
> hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*/
> Exception:
> java.io.FileNotFoundException: File does not exist:
> hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
>         at 
> org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.<init>(TraceBuilder.java:88)
>         at 
> org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:183)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at 
> org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:121)
> It's truncating the last  slash in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to