Hadoop streaming assumes that inputs are files. If kjv is a directory, you may use the option "-input kjv/*".
Hairong -----Original Message----- From: Andrew McNabb [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 6:22 PM To: [email protected] Subject: Re: HadoopStreaming On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote: > Try changing your command to read > > hadoop-streaming \ > -mapper "/usr/bin/python mapper.py" \ > -file "/home/amcnabb/svn/mrpso/python/mapper.py" \ -reducer > "/usr/bin/python reducer.py" \ -file > "/home/amcnabb/svn/mrpso/python/reducer.py" \ -input kjv \ -output > kjvout I'll try this first thing in the morning. > I assume kjv is a file and kjvout is a directory - they should be. Actually, I was doing it the same way as other Hadoop stuff I've done: kjv is a directory in DFS. Does HadoopStreaming do it in a different way from most other Hadoop stuff? In any case, how do I make it take a directory as input if that's what I need? > I also assume /usr/bin/python is the path to python *on the cluster > machines*. Otherwise, you can do -mapper "python mapper.py" -file > /usr/bin/python -file /home/amcnabb/svn/mrpso/python/mapper.py > I recommend adding -jobconf mapred.job.name="kjv", to make the > jobtracker history more readable. > I didn't know about that option. I'll do that. Thanks for all of the tips. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868
