Hi, I'm trying to crawl NSF ACADIS with nutch-selenium. I meet a problem *with linkdb/current/part-00000/data does not exist. *I checked my directory and my files during crawling, and it appears this file sometimes exist and sometimes disappear. This is quite weird and stranger.
Another problem is when we crawl NSIDC ADE, it will give us a 403 forbidden error. Does this mean NSIDC ADE is blocking us? The log of first error is in the bottom of this email. Any help would be appreciated. Regards, Shuo Li LinkDb: merging with existing linkdb: nsfacadis3Crawl/linkdb LinkDb: java.io.FileNotFoundException: File file:/vagrant/nutch/runtime/local/nsfacadis3Crawl/linkdb/current/part-00000/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:47) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:208) at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:316) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:276)