Map and reduce :Running multiple jobs and multiple outputs

ll_oz_ll Wed, 02 Sep 2009 12:17:35 -0700

Hi,
I'm a new user to hadoop and have been having difficulties adapting map and
reduce to my needs.
This is what I want to do:
1. Run multiple jobs chained to each other. The output of one Map and reduce
is the input to the one after it.
2. Get multiple outputs for each Map and reduce job (Ive mostly figured this
one out. I'll be using MultipleOutputFormat). For outputs from reduce that
have a certain key, I want them to append to a file so that the multiple
maps and reduces that follow, they always append to the output of the
previous ones.


For number 1. I did the following (In this case running 2 jobs one after the
other)
---
FileOutputFormat.setOutputPath(job, new Path("/user/h/out"));
for(int i=0;i<=1;i++)
        {
                System.out.println("Job number"+i);
                System.out.println((job.getConfiguration()).get("cus"));
                job.waitForCompletion(true);
                j = new Job(c);
                FileOutputFormat.setOutputPath(j, new Path("/user/h/out1"));
        }
--
The job ran without displaying any error but when I check the output files
for the jobs, I can see only the output directory for the first job and not
the second one:

$ hadoop dfs -ls /user/h/out
Found 2 items
drwxr-xr-x   - h supergroup          0 2009-09-02 13:40 /user/h/out/_logs
-rw-r--r--   1 h supergroup         18 2009-09-02 13:41
/user/h/out/part-r-00000

$ hadoop dfs -ls /user/h/out1
ls: Cannot access /user/h/out1: No such file or directory.

That is my first problem. Then for problem 2, I simply don't have any idea
how to get the successive map and reduce jobs to append to a file.

I've been breaking my head on this for the last 2 days. Would be really
great, if someone could help.

Thanks!
H 
-- 
View this message in context: 
http://www.nabble.com/Map-and-reduce-%3ARunning-multiple-jobs-and-multiple-outputs-tp25263907p25263907.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Map and reduce :Running multiple jobs and multiple outputs

Reply via email to