I was using ./bin/crawl and not incremental crawling at that time. This file appears after I start crawling *.gif, *.jpg, *.mov, etc. I will provide more information if I can reproduce this error.
Thanks =) On Sun, Feb 22, 2015 at 4:47 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > What command are you using to crawl? Are you using bin/crawl, and/or > doing incremental crawling? > > Cheers, > Chris > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Shuo Li <sli...@usc.edu> > Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org> > Date: Friday, February 20, 2015 at 3:26 PM > To: "dev@nutch.apache.org" <dev@nutch.apache.org> > Subject: linkdb/current/part-00000/data does not exist > > >Hi, > > > > > >I'm trying to crawl NSF ACADIS with nutch-selenium. I meet a problem > >with linkdb/current/part-00000/data does not exist. I checked my > >directory and my files during crawling, and it appears this file > >sometimes exist and sometimes disappear. This is quite weird and stranger. > > > > > >Another problem is when we crawl NSIDC ADE, it will give us a 403 > >forbidden error. Does this mean NSIDC ADE is blocking us? > > > > > >The log of first error is in the bottom of this email. Any help would be > >appreciated. > > > > > >Regards, > >Shuo Li > > > > > > > > > > > > > > > > > > > > > >LinkDb: merging with existing linkdb: nsfacadis3Crawl/linkdb > >LinkDb: java.io.FileNotFoundException: File > >file:/vagrant/nutch/runtime/local/nsfacadis3Crawl/linkdb/current/part-0000 > >0/data does not exist. > >at > >org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.j > >ava:402) > >at > >org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java: > >255) > >at > >org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn > >putFormat.java:47) > >at > >org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:20 > >8) > >at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) > >at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) > >at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) > >at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) > >at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > >at java.security.AccessController.doPrivileged(Native Method) > >at javax.security.auth.Subject.doAs(Subject.java:415) > >at > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. > >java:1190) > >at > >org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > >at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) > >at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) > >at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:208) > >at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:316) > >at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:276) > > > >