So I've read up on -cacheFile and -File and I still can't quite get my
script to work.  I'm running it as follows:

hstream -input basedir/finegrain/validation.txt.head
 -output basedir/output
 -mapper "Evaluate_linux.pl segs.xml config.txt"
 -numReduceTasks 0
 -jobconf mapred.job.name="Evaluate"
 -file Evaluate_linux.pl
 -cacheFile
hdfs://servername:9008/user/tvan/basedir/custom/final_segs.20080305.xml#segs.xml
 -cacheFile hdfs://servername:9008/user/tvan/basedir/config.txt#config.txt

the job starts but all map jobs fail with the same code:

java.io.IOException: log:null
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=tvanrooy
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Fri Mar 07 15:47:37 EST 2008
java.io.IOException: Broken pipe


Is this an indication that my script isn't finding the files I pass it?


On Thu, Mar 6, 2008 at 5:17 PM, Lohit <[EMAIL PROTECTED]> wrote:

> you could use -cacheFile or -file option for this. Check streaming doc
> for examples.
>
>
>
>
>
> On Mar 6, 2008, at 2:32 PM, "Theodore Van Rooy" <[EMAIL PROTECTED]>
> wrote:
>
> > I would like to convert a perl script that currently uses argument
> > variables
> > to run with Hadoop Streaming.
> >
> > Normally I would use the script like
> >
> > 'cat datafile.txt | myscript.pl  folder/myfile1.txt  folder/
> > myfile2.txt'
> >
> > where the two argument variables are actually the names of
> > configuration
> > files for the myscript.pl.
> >
> > The question I have is, how do I get the perl script to either look
> > in the
> > local directory for the config files, or how would I go about
> > getting them
> > to look on the DFS for the config files? Once the configurations are
> > passed
> > in there is no problem using the STDIN to process the datafile
> > passed into
> > it by hadoop.
>
>


-- 
Theodore Van Rooy
Green living isn't just for hippies...
http://greentheo.scroggles.com

Reply via email to