[jira] Commented: (HADOOP-3041) Within a task, the value ofJobConf.getOutputPath() method is modified

Devaraj Das (JIRA) Tue, 18 Mar 2008 06:47:21 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579846#action_12579846
 ]


Devaraj Das commented on HADOOP-3041:
-------------------------------------

Alejandro, the reason for modifying the job's output dir is to let user apps 
transparently deal with things like creation of side files in the task's output 
directory, and, speculative tasks creating the same output files. Another 
reason is that the getOutputPath can be used (and is usually used) in the 
OutputFormat implementation. All user code could use getOutputPath and create 
task specific stuff there and the framework automatically promotes/discards 
these files upon successful/failed task completion. Look at the JavaDoc in 
JobConf.getOutputPath() to get a clear explanation of what i am trying to say 
(by the way this doc needs to be fixed to include _temporary).
You are facing the problem since you create a directory in the _same level_ as 
the _actual_ output directory of the job. One way to address your problem is to 
provide an additional API like JobConf.getConfiguredOutputPath that would 
internally do things like getOutputPath.getParent(), etc. and return you the 
actual configured directory. This will ensure that your apps don't break when 
the framework changes the directory structure of the output path, etc. Not the 
best solution but we have to arrive at a compromise between your requirement 
and what we already document and provide. Thoughts?

> Within a task, the value ofJobConf.getOutputPath() method is modified
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3041
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.1
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 0.16.2
>
>
> Until 0.16.0 the value of the getOutputPath() method, if queried within a 
> task, pointed to the part file assigned to the task. 
> For example: /user/foo/myoutput/part_00000
> In 0.16.1, now it returns an internal hadoop for the task output temporary 
> location.
> For the above example: /user/foo/myoutput/_temporary/part_00000
> This change breaks applications that use the getOutputPath() to compute other 
> directories.
> IMO, this has always being broken, Hadoop should not change the values of 
> properties injected by the client, instead it should use private properties 
> or internal helper methods. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3041) Within a task, the value ofJobConf.getOutputPath() method is modified

Reply via email to