Configuring Hadoop to use S3 for Nutch

Kevin MacDonald Tue, 30 Sep 2008 14:30:18 -0700

I am using Nutch for crawling and would like to configure Hadoop to use S3. I
made the appropriate changes to the Hadoop configuration and that appears to
be O.K. However, I *think* the problem I am hitting is that Hadoop now
expects ALL paths to be locations in S3. Below is a typical error I am
seeing. I think Hadoop expects there to be a /tmp folder in the S3 bucket.
Also, any parameters to Nutch that are directories are expected to be
available in S3. This makes me think there are things I need to do to
"prepare" the S3 bucket that I've specified in the Hadoop configuration so
that Hadoop has everything it needs to function. For example, I somehow have
to copy my seed urls file to the S3 bucket in a way that Hadoop can find it.
Can anyone point me in the right direction on how to do this?


2008-09-30 13:31:49,926 WARN  httpclient.RestS3Service - Response
'/%2Ftmp%2Fhadoop-Kevin%2Fmapred%2Fsystem%2Fjob_local_1' - Unexpected
response code 404, expected 200

Thanks

Kevin


-- 
View this message in context: 
http://www.nabble.com/Configuring-Hadoop-to-use-S3-for-Nutch-tp19750758p19750758.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Configuring Hadoop to use S3 for Nutch

Reply via email to