So I've read up on -cacheFile and -File and I still can't quite get my script to work. I'm running it as follows:
hstream -input basedir/finegrain/validation.txt.head -output basedir/output -mapper "Evaluate_linux.pl segs.xml config.txt" -numReduceTasks 0 -jobconf mapred.job.name="Evaluate" -file Evaluate_linux.pl -cacheFile hdfs://servername:9008/user/tvan/basedir/custom/final_segs.20080305.xml#segs.xml -cacheFile hdfs://servername:9008/user/tvan/basedir/config.txt#config.txt the job starts but all map jobs fail with the same code: java.io.IOException: log:null R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null HOST=null USER=tvanrooy HADOOP_USER=null last Hadoop input: |null| last tool output: |null| Date: Fri Mar 07 15:47:37 EST 2008 java.io.IOException: Broken pipe Is this an indication that my script isn't finding the files I pass it? On Thu, Mar 6, 2008 at 5:17 PM, Lohit <[EMAIL PROTECTED]> wrote: > you could use -cacheFile or -file option for this. Check streaming doc > for examples. > > > > > > On Mar 6, 2008, at 2:32 PM, "Theodore Van Rooy" <[EMAIL PROTECTED]> > wrote: > > > I would like to convert a perl script that currently uses argument > > variables > > to run with Hadoop Streaming. > > > > Normally I would use the script like > > > > 'cat datafile.txt | myscript.pl folder/myfile1.txt folder/ > > myfile2.txt' > > > > where the two argument variables are actually the names of > > configuration > > files for the myscript.pl. > > > > The question I have is, how do I get the perl script to either look > > in the > > local directory for the config files, or how would I go about > > getting them > > to look on the DFS for the config files? Once the configurations are > > passed > > in there is no problem using the STDIN to process the datafile > > passed into > > it by hadoop. > > -- Theodore Van Rooy Green living isn't just for hippies... http://greentheo.scroggles.com