I am using Nutch for crawling and would like to configure Hadoop to use S3. I made the appropriate changes to the Hadoop configuration and that appears to be O.K. However, I *think* the problem I am hitting is that Hadoop now expects ALL paths to be locations in S3. Below is a typical error I am seeing. I think Hadoop expects there to be a /tmp folder in the S3 bucket. Also, any parameters to Nutch that are directories are expected to be available in S3. This makes me think there are things I need to do to "prepare" the S3 bucket that I've specified in the Hadoop configuration so that Hadoop has everything it needs to function. For example, I somehow have to copy my seed urls file to the S3 bucket in a way that Hadoop can find it. Can anyone point me in the right direction on how to do this?
2008-09-30 13:31:49,926 WARN httpclient.RestS3Service - Response '/%2Ftmp%2Fhadoop-Kevin%2Fmapred%2Fsystem%2Fjob_local_1' - Unexpected response code 404, expected 200 Thanks Kevin -- View this message in context: http://www.nabble.com/Configuring-Hadoop-to-use-S3-for-Nutch-tp19750758p19750758.html Sent from the Hadoop core-user mailing list archive at Nabble.com.