I've used wget with Hadoop Streaming without any problems. Based on the
error code you're getting, I suggest you make sure that you have the proper
write permissions for the directory in which Hadoop will process (e.g.,
download, convert, ...) on each of the task tracker machines. The location
where is processed on each machine is controlled by the "hadoop.tmp.dir"
variable. The default value set in $HADOOP_HOME/conf/hadoop-default.xml is
"/tmp/hadoop-${user.name}". Make sure that the user running hadoop has
permission to write to whatever directory you're using.

John

On Thu, Mar 12, 2009 at 10:02 PM, Nick Cen <cenyo...@gmail.com> wrote:

> Hi All,
>
> I am trying to use the hadoop straeming with "wget" to simulate a
> distributed downloader.
> The command line i use is
>
> ./bin/hadoop jar -D mapred.reduce.tasks=0
> contrib/streaming/hadoop-0.19.0-streaming.jar -input urli -output urlo
> -mapper /usr/bin/wget -outputformat
> org.apache.hadoop.mapred.lib.MultipleTextOutputFormat
>
> But it thrown an exception
>
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 1
>        at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:295)
>        at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:519)
>        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>        at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
> can somebody point me a way of why this happend. thanks.
>
>
>
> --
> http://daily.appspot.com/food/
>

Reply via email to