RE: HadoopStreaming

Hairong Kuang Tue, 17 Oct 2006 19:11:08 -0700

Hadoop streaming assumes that inputs are files. If kjv is a directory, you
may use the option "-input kjv/*".


Hairong

-----Original Message-----
From: Andrew McNabb [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 17, 2006 6:22 PM
To: [email protected]
Subject: Re: HadoopStreaming

On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote:
> Try changing your command to read
> 
> hadoop-streaming \
> -mapper "/usr/bin/python mapper.py" \
> -file "/home/amcnabb/svn/mrpso/python/mapper.py" \ -reducer 
> "/usr/bin/python reducer.py" \ -file 
> "/home/amcnabb/svn/mrpso/python/reducer.py"  \ -input kjv \ -output 
> kjvout

I'll try this first thing in the morning.

> I assume kjv is a file and kjvout is a directory - they should be.

Actually, I was doing it the same way as other Hadoop stuff I've done:
kjv is a directory in DFS.  Does HadoopStreaming do it in a different way
from most other Hadoop stuff?

In any case, how do I make it take a directory as input if that's what I
need?

> I also assume /usr/bin/python is the path to python *on the cluster 
> machines*. Otherwise, you can do -mapper "python mapper.py" -file 
> /usr/bin/python -file /home/amcnabb/svn/mrpso/python/mapper.py

> I recommend adding -jobconf mapred.job.name="kjv", to make the 
> jobtracker history more readable.
> 

I didn't know about that option.  I'll do that.

Thanks for all of the tips.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

RE: HadoopStreaming

Reply via email to