Parameters for the record reader

2008-05-08 Thread Derek Shaw
Is it possible for the record reader to get a copy of the job configuration or 
is it otherwise possible to send configuration values to the record reader?

-Derek


Fwd: Collecting output not to file

2008-05-07 Thread Derek Shaw
To clarify:
 
 static class TestOutputFormat
 implements OutputFormat 
 {
 static class TestRecordWriter
 implements RecordWriter 
 {
 TestOutputFormat output;
 
 public TestRecordWriter (TestOutputFormat output, 
org.apache.hadoop.fs.FileSystem ignored, JobConf job, String name, Progressable 
progress)
 {
 this.output = output;
 }
 
 public void close (Reporter reporter)
 {}
 
 public void write (Text key, Text value)
 {
 output.addResults (value.toString ());
 }
 }
 
 protected String results = "";
 
 public void checkOutputSpecs (org.apache.hadoop.fs.FileSystem ignored, 
JobConf job)
 throws IOException
 {}
 
 public RecordWriter  getRecordWriter 
(org.apache.hadoop.fs.FileSystem ignored, JobConf job, String name, 
Progressable progress)
 {
 return new TestRecordWriter (this, ignored, job, name, progress);
 }
 
 public void addResults (String r)
 {
 results += r + ",";
 }
 
 public String getResults ()
 {
 return results;
 }
 }

 And then running the task:
 public int run(String[] args) 
 throws Exception 
 {
 
 JobClient.runJob(job);
 
 // getOutputFormatcreates a new instance of the outputformat. I want to 
get the instance of the output format that the reduce function wrote to
 // The recordWriter that reduce wrote to would be just as good
 TestOutputFormat results = (TestOutputFormat) job.getOutputFormat ();  
   
 // Always prints the empty string, not the populated results
 System.out.println ("results: " + results.getResults ());   
     
     return 0;
 }

Derek Shaw <[EMAIL PROTECTED]> wrote: Date: Tue, 6 May 2008 23:26:30 -0400 (EDT)
From: Derek Shaw <[EMAIL PROTECTED]>
Subject: Collecting output not to file
To: core-user@hadoop.apache.org

 Hey,

>From the examples that I have seen thus far, all of the results from the 
>reduce function are being written to a file. Instead of writing results to a 
>file, I want to store them and inspect them after the job is completed. (I 
>think that I need to implement my own OutputCollector, but I don't know how to 
>tell hadoop to use it.) How can I do this?

-Derek



Re: Collecting output not to file

2008-05-07 Thread Derek Shaw
Good point.

I want to put the results of the reduce function in a multimap instead of 
writing them to a file.

-Derek

Amar Kamat <[EMAIL PROTECTED]> wrote: Derek Shaw wrote:
> Hey,
>
> From the examples that I have seen thus far, all of the results from the 
> reduce function are being written to a file. Instead of writing results to a 
> file, I want to store them
What do you mean by "store and inspect"?
>  and inspect them after the job is completed. (I think that I need to 
> implement my own OutputCollector, but I don't know how to tell hadoop to use 
> it.) How can I do this?
>
> -Derek
>
>   




Collecting output not to file

2008-05-06 Thread Derek Shaw
Hey,

>From the examples that I have seen thus far, all of the results from the 
>reduce function are being written to a file. Instead of writing results to a 
>file, I want to store them and inspect them after the job is completed. (I 
>think that I need to implement my own OutputCollector, but I don't know how to 
>tell hadoop to use it.) How can I do this?

-Derek


Dynamically generating jobs without an input file

2008-04-30 Thread Derek Shaw
I am a new hadoop user, so please forgive my naivety.

>From the examples I have seen thus far, all of the input data for each task is 
>a file that exists in hdfs. I am writing a program that will dynamically 
>generate input parameters for each task.

For instance, the generator may generate the input parameters 1, 3, 5, 7, 9, ...
After all the jobs start returning, the generator may change the pattern and 
start generating 2, 4, 6, 8,... until a solution to the problem is found. I do 
not want to have to create a file that will contain the parameters for the map 
function, how can I do it programmatically?

Thanks
-Derek