You can use  FileInptuFormat.setInputPaths(configuration,
job1-output). This will overwrite the old input path(s).

-Joey

On Mon, Jan 16, 2012 at 7:16 PM, W.P. McNeill <bill...@gmail.com> wrote:
>
> It is possible to unset a configuration value? I think the answer is no,
> but I want to be sure.
>
> I know that you can set a configuration value to the empty string, but I
> have a scenario in which that is not an option. I have a top level Hadoop
> Tool that launches a series of other Hadoop jobs in its run() method. The
> output of the first sub-job becomes the input of the second one and so on.
> The top-level Tool takes a configuration file which specifies parameters
> used by all the sub-jobs. It also specifies a mapred.input.dir value which
> serves as the input directory to the first sub-job.
>
> TopLevelJob() {
>  job1 = createJob1(configuration);
>  // Run job 1
>  job2 = createJob2();
>  FileInputFormat.addInputPath(configuration, job1-output)
>  // Run job 2
> }
>
> The problem is that addInputPath() appends a value to the end of
> mapred.input.dir, erroneously leaving the input directory for Job 1 on the
> list for Job 2. If I try to delete Job 1's input dir by setting
> mapred.input.dir to the empty string like so:
>
> configuration.set("mapred.input.dir", "")
>
> the addInputPath() method appends the input path, giving the value
> ",job1-output". The first element of this list is the empty string, which
> causes an Exception.
>
> I can work around this by calling configuration.set("mapred.input.dir")
> directly when creating Job 2, but this feels like a hack. It seems like the
> proper way to set input paths is via a FileInputFormat method instead of by
> setting the property directly.




--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to