Re: [Nutch-general] NullPointerException during Fetch

Meryl Silverburgh Mon, 09 Apr 2007 20:24:51 -0700

Thanks . I attached my nutch-site.xml file.

But for some reason, I now get:


$ bin/nutch fetch $s1
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: java.io.IOException: Segment already fetched!
        at 
org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:45)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)



here is a complete log of what I did:
$ rm -Rf crawl/
$ bin/nutch crawl urls -dir crawl -depth 2 -topN 5
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 2
topN = 5
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20070409222252
Generator: filtering: false
Generator: topN: 5
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222252
Fetcher: threads: 10
fetching http://www.yahoo.com/
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/20070409222252]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20070409222306
Generator: filtering: false
Generator: topN: 5
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: threads: 10
fetching http://srd.yahoo.com/hp5-v
fetching http://www.yahoo.com/2.0.0
fetching http://www.yahoo.com/s/553079
fetching http://www.yahoo.com/+document.cookie+
fetching http://www.yahoo.com/1.0
fetch of http://www.yahoo.com/s/553079 failed with:
java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/20070409222306]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: crawl/segments/20070409222306
LinkDb: adding segment: crawl/segments/20070409222252
LinkDb: done
Indexer: starting
Indexer: linkdb: crawl/linkdb
Indexer: adding segment: crawl/segments/20070409222306
Indexer: adding segment: crawl/segments/20070409222252
 Indexing [http://srd.yahoo.com/hp5-v] with analyzer
[EMAIL PROTECTED] (null)
Optimizing index.
merging segments _ram_0 (1 docs) into _0 (1 docs)
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawl/indexes
Dedup: done
merging indexes to: crawl/index
Adding crawl/indexes/part-00000
done merging
crawl finished: crawl
$ s1=`ls -d crawl/segments/2* | tail -1`
$ echo $s1
crawl/segments/20070409222306/
$ bin/nutch fetch $s1
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: java.io.IOException: Segment already fetched!
        at 
org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:45)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)






On 4/8/07, Ratnesh,V2Solutions India
<[EMAIL PROTECTED]> wrote:
>
> open nutch-site.xml and nutch-default.xml
>
> and in the <plugin.includes>property set value like
> <value>index-basic|index-more|.............................................................</value>
>
> with the other values only include these plugins as extra.
>
> Ratnesh,V2Solutions India
>
>
> Meryl Silverburgh wrote:
> >
> > Thanks. but how to include the index-basic, index-more plugin?
> > I don' t can't find that in the documentation.
> >
> > Thank you.
> >
> > On 4/7/07, Ratnesh,V2Solutions India
> > <[EMAIL PROTECTED]> wrote:
> >>
> >> Check whether you have included index-basic & index-more plugin in your
> >> nutch-site.xml file
> >>
> >> the same problem was solved including this file.
> >>
> >>
> >> hope this will solve the issue...
> >>
> >> Ratnesh V2Solutions,India
> >>
> >> Meryl Silverburgh wrote:
> >> >
> >> > HI,
> >> >
> >> > I am following the http://lucene.apache.org/nutch/tutorial8.html to
> >> setup
> >> > nutch.
> >> >
> >> > But i get a NullPointerException during Fetch. Can you please tell me
> >> > what am i missing?
> >> >
> >> > $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5
> >> >
> >> > $ s1=`ls -d crawl/segments/2* | tail -1`
> >> >
> >> > $ echo $s1
> >> > crawl/segments/20070406202200/
> >> >
> >> > $ bin/nutch fetch $s1
> >> > Fetcher: starting
> >> > Fetcher: segment: crawl/segments/20070406202200
> >> > Fetcher: threads: 10
> >> > fetching http://www.yahoo.com/s/550957
> >> > fetching http://www.yahoo.com/r\/1m
> >> > fetching http://www.yahoo.com/2.0.0
> >> > fetching http://srd.yahoo.com/hp5-v
> >> > fetching http://www.yahoo.com/r/hq
> >> > fetching http://www.yahoo.com/+document.cookie+
> >> > fetching http://www.yahoo.com/s/550839
> >> > fetching http://www.yahoo.com/1.0
> >> > fetching http://www.yahoo.com/r/hf
> >> > fetch of http://www.yahoo.com/s/550957 failed with:
> >> > java.lang.NullPointerException
> >> > fetch of http://www.yahoo.com/s/550839 failed with:
> >> > java.lang.NullPointerException
> >> > fetch of http://www.yahoo.com/r/hq failed with:
> >> > java.lang.NullPointerException
> >> > Fetcher: done
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/NullPointerException-during-Fetch-tf3539577.html#a9882786
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context: 
> http://www.nabble.com/NullPointerException-during-Fetch-tf3539577.html#a9898432
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] NullPointerException during Fetch

Reply via email to