You must have crawled for several times, and some of them failed
before the parse phase. So the parse data was not generated.
You'd better delete the whole directory
file:/Users/ptomblin/nutch-1.0/crawl.blog, and recrawl it, then you
will know the exact reason why it failed in the parse phase from the
output information.

Xiao

On Fri, Jul 24, 2009 at 10:53 PM, Paul Tomblin<[email protected]> wrote:
> I installed nutch 1.0 on my laptop last night and set it running to crawl my
> blog with the command:  bin/nutch crawl urls -dir crawl.blog -depth 10
> it was still running strong when I went to bed several hours later, and this
> morning I woke up to this:
>
> activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> -activeThreads=0
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawl.blog/crawldb
> CrawlDb update: segments: [crawl.blog/segments/20090724010303]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> LinkDb: starting
> LinkDb: linkdb: crawl.blog/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723154530
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155106
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155122
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155303
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155812
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723161808
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723171215
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723193543
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723224936
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090724004250
> LinkDb: adding segment:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090724010303
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723154530/parse_data
> at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
> at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170)
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:129)
>
>
> --
> http://www.linkedin.com/in/paultomblin
>

Reply via email to