Hi all, I just inherited a hadoop/EC2/nutch project that has been running for a few weeks, and lo and behold, the task magically entered a state that is not running, nor complete nor failed.
I'd love to figure out why it is what it is right now, as well as resume it without losing what it's done, but that's a topic for later. Right now I'm just trying to back up what's already done to S3, I've already dealt w/ getting the key/secret passed in (s3://u:[EMAIL PROTECTED] didn't do it.. had to specify properties) and using buckets that don't have _. I'm running 0.17.0, but I don't think that's an issue, and furthermore, I'm not sure if 0.18.x is ready for EC2+nutch? Here's what fsck says: Status: HEALTHY Total size: 13980327325 B Total dirs: 111 Total files: 196 Total blocks: 384 (avg. block size 36407102 B) Minimally replicated blocks: 384 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 And the distcp [EMAIL PROTECTED] ~]# hadoop distcp -D fs.s3.awsAccessKey=... -D fs.s3.awsSecretAccessKey=... hdfs://ip-....ec2.internal:50001/ s3://com....hadoopbackup/2008-09-24/ fails on: ... 08/09/24 15:33:42 INFO mapred.JobClient: Task Id : task_200808301100_0041_m_000002_0, Status : FAILED java.lang.NullPointerException at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:196) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.blockExists(Jets3tFileSystemStore.java:178) ... 08/09/24 15:34:34 INFO mapred.JobClient: Task Id : task_200808301100_0041_m_000002_1, Status : FAILED java.io.IOException: Copied: 0 Skipped: 1 Failed: 3 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) ... With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604) Does anyone have any ideas what's going on? I tried the distcp a second time and got very similar failures. I haven't counted exactly how much got copied to S3, but seems on the order of a few hundred megs, not 13+ GB. I'm brand new to this stuff so I feel pretty clueless. Thanks, Gary