Re: streaming problem

Amareshwari Sriramadasu Tue, 18 Mar 2008 21:56:59 -0700

Hi Andreas,

Looks like your mapper is not available to the streaming jar. Where isyour mapper script? Did you use distributed cache to distribute the mapper?You can use -file <mapper-script-path on local fs> to make it part ofjar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute thescript. Distributing this way will add your script to the PATH.


So, now you command will be:

time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper 
workloadmf -reducer NONE -input testlogs/* -output testlogs-output -cacheFile 
/dist/wordloadmf#workloadmf

or

time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper workloadmf 
-reducer NONE -input testlogs/* -output testlogs-output -file <path-on-local-fs>

Thanks,
Amareshwari

Andreas Kostyrka wrote:

Some additional details if it's helping, the HDFS is hosted on AWS S3,
and the input file set consists of 152 gzipped Apache log files.

Thanks,

Andreas

Am Dienstag, den 18.03.2008, 22:17 +0100 schrieb Andreas Kostyrka:

Hi!

I'm trying to run a streaming job on Hadoop 1.16.0, I've distributed the
scripts to be used to all nodes:

time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper 
~/dist/workloadmf -reducer NONE -input testlogs/* -output testlogs-output

Now, this gives me:

java.io.IOException: log:null
R/W/S=1/0/0 in:0=1/2 [rec/s] out:0=0/2 [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=hadoop
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Tue Mar 18 21:06:13 GMT 2008
java.io.IOException: Broken pipe
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)


        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)

Any ideas what my problems could be?

TIA,

Andreas

Re: streaming problem

Reply via email to