Re: streaming problem

Andreas Kostyrka Wed, 19 Mar 2008 01:19:22 -0700

The /home/hadoop/dist/workloadmf script is available on all nodes.

But it missed one package to run correctly ;(


Anyway, I still have the problem, that running with
-reducer NONE, my output gets lost, it seems. Well, some of the
outputfiles contain a small number of output lines, but not many :(
(And the expected size of each output file was around 25MB or so :( )

Ah the joys,

Andreas

Am Mittwoch, den 19.03.2008, 10:13 +0530 schrieb Amareshwari
Sriramadasu:
> Hi Andreas,
>  Looks like your mapper is not available to the streaming jar. Where is 
> your mapper script? Did you use distributed cache to distribute the mapper?
> You can use -file <mapper-script-path on local fs> to make it part of 
> jar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute the 
> script. Distributing this way will add your script to the PATH.
> 
> So, now you command will be:
> 
> time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper 
> workloadmf -reducer NONE -input testlogs/* -output testlogs-output -cacheFile 
> /dist/wordloadmf#workloadmf
> 
> or
> 
> time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper 
> workloadmf -reducer NONE -input testlogs/* -output testlogs-output -file 
> <path-on-local-fs>
> 
> Thanks,
> Amareshwari
> 
> Andreas Kostyrka wrote:
> > Some additional details if it's helping, the HDFS is hosted on AWS S3,
> > and the input file set consists of 152 gzipped Apache log files.
> >
> > Thanks,
> >
> > Andreas
> >
> > Am Dienstag, den 18.03.2008, 22:17 +0100 schrieb Andreas Kostyrka:
> >   
> >> Hi!
> >>
> >> I'm trying to run a streaming job on Hadoop 1.16.0, I've distributed the
> >> scripts to be used to all nodes:
> >>
> >> time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper 
> >> ~/dist/workloadmf -reducer NONE -input testlogs/* -output testlogs-output
> >>
> >> Now, this gives me:
> >>
> >> java.io.IOException: log:null
> >> R/W/S=1/0/0 in:0=1/2 [rec/s] out:0=0/2 [rec/s]
> >> minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
> >> HOST=null
> >> USER=hadoop
> >> HADOOP_USER=null
> >> last Hadoop input: |null|
> >> last tool output: |null|
> >> Date: Tue Mar 18 21:06:13 GMT 2008
> >> java.io.IOException: Broken pipe
> >>    at java.io.FileOutputStream.writeBytes(Native Method)
> >>    at java.io.FileOutputStream.write(FileOutputStream.java:260)
> >>    at 
> >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> >>    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> >>    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
> >>    at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> >>    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96)
> >>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> >>    at 
> >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
> >>
> >>
> >>    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107)
> >>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> >>    at 
> >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
> >>
> >> Any ideas what my problems could be?
> >>
> >> TIA,
> >>
> >> Andreas
> >>

signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

Re: streaming problem

Reply via email to