data does not exist

Shuo Li Sun, 22 Feb 2015 16:56:05 -0800

I was using ./bin/crawl and not incremental crawling at that time. This
file appears after I start crawling *.gif, *.jpg, *.mov, etc. I will
provide more information if I can reproduce this error.


Thanks =)

On Sun, Feb 22, 2015 at 4:47 PM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> What command are you using to crawl? Are you using bin/crawl, and/or
> doing incremental crawling?
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Shuo Li <sli...@usc.edu>
> Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org>
> Date: Friday, February 20, 2015 at 3:26 PM
> To: "dev@nutch.apache.org" <dev@nutch.apache.org>
> Subject: linkdb/current/part-00000/data does not exist
>
> >Hi,
> >
> >
> >I'm trying to crawl  NSF ACADIS with nutch-selenium. I meet a problem
> >with linkdb/current/part-00000/data does not exist. I checked my
> >directory and my files during crawling, and it appears this file
> >sometimes exist and sometimes disappear. This is quite weird and stranger.
> >
> >
> >Another problem is when we crawl NSIDC ADE, it will give us a 403
> >forbidden error. Does this mean NSIDC ADE is blocking us?
> >
> >
> >The log of first error is in the bottom of this email. Any help would be
> >appreciated.
> >
> >
> >Regards,
> >Shuo Li
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >LinkDb: merging with existing linkdb: nsfacadis3Crawl/linkdb
> >LinkDb: java.io.FileNotFoundException: File
> >file:/vagrant/nutch/runtime/local/nsfacadis3Crawl/linkdb/current/part-0000
> >0/data does not exist.
> >at
> >org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.j
> >ava:402)
> >at
> >org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:
> >255)
> >at
> >org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn
> >putFormat.java:47)
> >at
> >org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:20
> >8)
> >at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
> >at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
> >at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
> >at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
> >at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
> >at java.security.AccessController.doPrivileged(Native Method)
> >at javax.security.auth.Subject.doAs(Subject.java:415)
> >at
> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
> >java:1190)
> >at
> >org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
> >at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
> >at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
> >at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:208)
> >at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:316)
> >at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:276)
> >
>
>

Re: linkdb/current/part-00000/data does not exist

Reply via email to