Re: NullPointerException during Fetch

Meryl Silverburgh Mon, 09 Apr 2007 20:24:44 -0700

Thanks . I attached my nutch-site.xml file.

But for some reason, I now get:


$ bin/nutch fetch $s1
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: java.io.IOException: Segment already fetched!
       at 
org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:45)
       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
       at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
       at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
       at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
       at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)



here is a complete log of what I did:
$ rm -Rf crawl/
$ bin/nutch crawl urls -dir crawl -depth 2 -topN 5
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 2
topN = 5
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20070409222252
Generator: filtering: false
Generator: topN: 5
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222252
Fetcher: threads: 10
fetching http://www.yahoo.com/
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/20070409222252]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20070409222306
Generator: filtering: false
Generator: topN: 5
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: threads: 10
fetching http://srd.yahoo.com/hp5-v
fetching http://www.yahoo.com/2.0.0
fetching http://www.yahoo.com/s/553079
fetching http://www.yahoo.com/+document.cookie+
fetching http://www.yahoo.com/1.0
fetch of http://www.yahoo.com/s/553079 failed with:
java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/20070409222306]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: crawl/segments/20070409222306
LinkDb: adding segment: crawl/segments/20070409222252
LinkDb: done
Indexer: starting
Indexer: linkdb: crawl/linkdb
Indexer: adding segment: crawl/segments/20070409222306
Indexer: adding segment: crawl/segments/20070409222252
Indexing [http://srd.yahoo.com/hp5-v] with analyzer
[EMAIL PROTECTED] (null)
Optimizing index.
merging segments _ram_0 (1 docs) into _0 (1 docs)
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawl/indexes
Dedup: done
merging indexes to: crawl/index
Adding crawl/indexes/part-00000
done merging
crawl finished: crawl
$ s1=`ls -d crawl/segments/2* | tail -1`
$ echo $s1
crawl/segments/20070409222306/
$ bin/nutch fetch $s1
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: java.io.IOException: Segment already fetched!
       at 
org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:45)
       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
       at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
       at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
       at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
       at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)






On 4/8/07, Ratnesh,V2Solutions India
<[EMAIL PROTECTED]> wrote:


open nutch-site.xml and nutch-default.xml

and in the <plugin.includes>property set value like
<value>index-basic|index-more|.............................................................</value>

with the other values only include these plugins as extra.

Ratnesh,V2Solutions India


Meryl Silverburgh wrote:
>
> Thanks. but how to include the index-basic, index-more plugin?
> I don' t can't find that in the documentation.
>
> Thank you.
>
> On 4/7/07, Ratnesh,V2Solutions India
> <[EMAIL PROTECTED]> wrote:
>>
>> Check whether you have included index-basic & index-more plugin in your
>> nutch-site.xml file
>>
>> the same problem was solved including this file.
>>
>>
>> hope this will solve the issue...
>>
>> Ratnesh V2Solutions,India
>>
>> Meryl Silverburgh wrote:
>> >
>> > HI,
>> >
>> > I am following the http://lucene.apache.org/nutch/tutorial8.html to
>> setup
>> > nutch.
>> >
>> > But i get a NullPointerException during Fetch. Can you please tell me
>> > what am i missing?
>> >
>> > $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5
>> >
>> > $ s1=`ls -d crawl/segments/2* | tail -1`
>> >
>> > $ echo $s1
>> > crawl/segments/20070406202200/
>> >
>> > $ bin/nutch fetch $s1
>> > Fetcher: starting
>> > Fetcher: segment: crawl/segments/20070406202200
>> > Fetcher: threads: 10
>> > fetching http://www.yahoo.com/s/550957
>> > fetching http://www.yahoo.com/r\/1m
>> > fetching http://www.yahoo.com/2.0.0
>> > fetching http://srd.yahoo.com/hp5-v
>> > fetching http://www.yahoo.com/r/hq
>> > fetching http://www.yahoo.com/+document.cookie+
>> > fetching http://www.yahoo.com/s/550839
>> > fetching http://www.yahoo.com/1.0
>> > fetching http://www.yahoo.com/r/hf
>> > fetch of http://www.yahoo.com/s/550957 failed with:
>> > java.lang.NullPointerException
>> > fetch of http://www.yahoo.com/s/550839 failed with:
>> > java.lang.NullPointerException
>> > fetch of http://www.yahoo.com/r/hq failed with:
>> > java.lang.NullPointerException
>> > Fetcher: done
>> >
>> >
>>
>> --
>> View this message in context:
>> 
http://www.nabble.com/NullPointerException-during-Fetch-tf3539577.html#a9882786
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>
>

--
View this message in context: 
http://www.nabble.com/NullPointerException-during-Fetch-tf3539577.html#a9898432
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: NullPointerException during Fetch

Reply via email to