The /home/hadoop/dist/workloadmf script is available on all nodes. But it missed one package to run correctly ;(
Anyway, I still have the problem, that running with -reducer NONE, my output gets lost, it seems. Well, some of the outputfiles contain a small number of output lines, but not many :( (And the expected size of each output file was around 25MB or so :( ) Ah the joys, Andreas Am Mittwoch, den 19.03.2008, 10:13 +0530 schrieb Amareshwari Sriramadasu: > Hi Andreas, > Looks like your mapper is not available to the streaming jar. Where is > your mapper script? Did you use distributed cache to distribute the mapper? > You can use -file <mapper-script-path on local fs> to make it part of > jar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute the > script. Distributing this way will add your script to the PATH. > > So, now you command will be: > > time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper > workloadmf -reducer NONE -input testlogs/* -output testlogs-output -cacheFile > /dist/wordloadmf#workloadmf > > or > > time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper > workloadmf -reducer NONE -input testlogs/* -output testlogs-output -file > <path-on-local-fs> > > Thanks, > Amareshwari > > Andreas Kostyrka wrote: > > Some additional details if it's helping, the HDFS is hosted on AWS S3, > > and the input file set consists of 152 gzipped Apache log files. > > > > Thanks, > > > > Andreas > > > > Am Dienstag, den 18.03.2008, 22:17 +0100 schrieb Andreas Kostyrka: > > > >> Hi! > >> > >> I'm trying to run a streaming job on Hadoop 1.16.0, I've distributed the > >> scripts to be used to all nodes: > >> > >> time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper > >> ~/dist/workloadmf -reducer NONE -input testlogs/* -output testlogs-output > >> > >> Now, this gives me: > >> > >> java.io.IOException: log:null > >> R/W/S=1/0/0 in:0=1/2 [rec/s] out:0=0/2 [rec/s] > >> minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null > >> HOST=null > >> USER=hadoop > >> HADOOP_USER=null > >> last Hadoop input: |null| > >> last tool output: |null| > >> Date: Tue Mar 18 21:06:13 GMT 2008 > >> java.io.IOException: Broken pipe > >> at java.io.FileOutputStream.writeBytes(Native Method) > >> at java.io.FileOutputStream.write(FileOutputStream.java:260) > >> at > >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > >> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > >> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124) > >> at java.io.DataOutputStream.flush(DataOutputStream.java:106) > >> at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96) > >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) > >> at > >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071) > >> > >> > >> at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107) > >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) > >> at > >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071) > >> > >> Any ideas what my problems could be? > >> > >> TIA, > >> > >> Andreas > >>
signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil