HTTP addressable files from HDFS?

2009-03-13 Thread David Michael

Hello

I realize that using HTTP, you can have a file in HDFS streamed - that  
is, the servlet responds to the following request with Content- 
Disposition: attachment, and a download is forced (at least from a  
browsers perspective) like so:


http://localhost:50075/streamFile?filename=/somewhere/image.jpg

Is there another way to get at this file more directly from HTTP 'out  
of the box'?


I'm imagining something like:

http://localhost:50075/somewhere/image.jpg

Is this sort of exposure of the HDFS namespace something I need to  
write into a server myself?


Thanks in advance
David

On Mar 13, 2009, at 10:12 PM, S D wrote:

I've used wget with Hadoop Streaming without any problems. Based on  
the
error code you're getting, I suggest you make sure that you have the  
proper
write permissions for the directory in which Hadoop will process  
(e.g.,
download, convert, ...) on each of the task tracker machines. The  
location
where is processed on each machine is controlled by the  
hadoop.tmp.dir
variable. The default value set in $HADOOP_HOME/conf/hadoop- 
default.xml is

/tmp/hadoop-${user.name}. Make sure that the user running hadoop has
permission to write to whatever directory you're using.

John

On Thu, Mar 12, 2009 at 10:02 PM, Nick Cen cenyo...@gmail.com wrote:


Hi All,

I am trying to use the hadoop straeming with wget to simulate a
distributed downloader.
The command line i use is

./bin/hadoop jar -D mapred.reduce.tasks=0
contrib/streaming/hadoop-0.19.0-streaming.jar -input urli -output  
urlo

-mapper /usr/bin/wget -outputformat
org.apache.hadoop.mapred.lib.MultipleTextOutputFormat

But it thrown an exception

java.lang.RuntimeException: PipeMapRed.waitOutputThreads():  
subprocess

failed with code 1
  at
org 
.apache 
.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:295)

  at
org 
.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java: 
519)
  at  
org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)

  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
  at org.apache.hadoop.mapred.Child.main(Child.java:155)

can somebody point me a way of why this happend. thanks.



--
http://daily.appspot.com/food/





Re: HTTP addressable files from HDFS?

2009-03-13 Thread jason hadoop
wget http://namenode:port/*data/*filename
will return the filename.

The namenode will redirect the http request to a datanode that has at least
some of the blocks in local storage to serve the actual request.
The key piece of course is the /data prefix on the file name.
port is the port that the webgui is running on, NOT the HDFS port.
commonly the port is 50070.

On Fri, Mar 13, 2009 at 7:54 PM, David Michael david.mich...@gmail.comwrote:

 Hello

 I realize that using HTTP, you can have a file in HDFS streamed - that is,
 the servlet responds to the following request with Content-Disposition:
 attachment, and a download is forced (at least from a browsers perspective)
 like so:

 http://localhost:50075/streamFile?filename=/somewhere/image.jpg

 Is there another way to get at this file more directly from HTTP 'out of
 the box'?

 I'm imagining something like:

 http://localhost:50075/somewhere/image.jpg

 Is this sort of exposure of the HDFS namespace something I need to write
 into a server myself?

 Thanks in advance
 David

 On Mar 13, 2009, at 10:12 PM, S D wrote:

  I've used wget with Hadoop Streaming without any problems. Based on the
 error code you're getting, I suggest you make sure that you have the
 proper
 write permissions for the directory in which Hadoop will process (e.g.,
 download, convert, ...) on each of the task tracker machines. The location
 where is processed on each machine is controlled by the hadoop.tmp.dir
 variable. The default value set in $HADOOP_HOME/conf/hadoop-default.xml is
 /tmp/hadoop-${user.name}. Make sure that the user running hadoop has
 permission to write to whatever directory you're using.

 John

 On Thu, Mar 12, 2009 at 10:02 PM, Nick Cen cenyo...@gmail.com wrote:

  Hi All,

 I am trying to use the hadoop straeming with wget to simulate a
 distributed downloader.
 The command line i use is

 ./bin/hadoop jar -D mapred.reduce.tasks=0
 contrib/streaming/hadoop-0.19.0-streaming.jar -input urli -output urlo
 -mapper /usr/bin/wget -outputformat
 org.apache.hadoop.mapred.lib.MultipleTextOutputFormat

 But it thrown an exception

 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
 failed with code 1
  at

 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:295)
  at

 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:519)
  at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
  at org.apache.hadoop.mapred.Child.main(Child.java:155)

 can somebody point me a way of why this happend. thanks.



 --
 http://daily.appspot.com/food/





-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422