Both db.ignore.internal.links and db.ignore.external.links are true in my case.
Since I crawl only one domain, I suppose setting
db.ignore.external.links to true is a good idea.

So db.ignore.internal.links should be false?

>From what I understand db.ignore.external.links is a setting for the
crawldb, while db.ignore.internal.links is a setting for the linkdb?



On Wed, Jun 6, 2012 at 10:02 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
>
> -----Original message-----
>> From:Matthias Paul <magethle.nu...@gmail.com>
>> Sent: Wed 06-Jun-2012 09:47
>> To: user@nutch.apache.org
>> Subject: Linkdb empty
>>
>> Hi all,
>
> hi
>
>>
>> I noticed that my linkdb is always empty although I use the generated
>> segments from the last crawl for the generation of the linkdb.
>
> Check the db.ignore.* settings in your config.
>
>> Do I have to keep more segments?
>> As I use Solr for indexing, I only keep the segments from the last crawl.
>
> You can discard the segments if you never do a full reindex.
>
>>
>> Thanks
>> Matthias
>>

Reply via email to