wow, setting immediately fixed my problem.  It looks 
like I totally mis-diagnosed things.

May I pose two questions:
1) how did you view all the outlinks?
2) how severe is NUTCH-119 - does it occur on a lot of sites?

----- Original Message ----
From: Doğacan Güney <[EMAIL PROTECTED]>
Sent: Tuesday, June 26, 2007 10:56:32 PM
Subject: Re: NUTCH-119 :: how hard to fix

On 6/27/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote:
> I am evaluating nutch+lucene as a crawl and search solution.
> However, I am finding major bugs in nutch right off the bat.
> In particular, NUTCH-119: nutch is not crawling relative URLs.  I have some 
> discussion of it here:
>[EMAIL PROTECTED]/msg08644.html
> Most of the links off, one of my main test sites, have 
> relative URLs.  It seems incredible that nutch, which is capable of 
> mapreduce, cannot fetch these URLs.
> It could be that I would fix this bug if, for other reasons, I decide to go 
> with nutch+lucene.  Has anyone tried fixing this problem?  Is it intractable? 
>  Or are the developers, who are just volunteers anyway, more interested in 
> fixing other problems?
> Could someone outline the issue for me a bit more clearly so I would know how 
> to evaluate it?

Both this one and the other site you were mentioning (sf911truth) have
more than 100 outlinks. Nutch, by default, only stores 100 outlinks
per page ( Link about.html happens to be
105th link or so, so nutch doesn't store it. All you have to do is
either increase or set it  to -1 (which
means, store all outlinks).

> ____________________________________________________________________________________
> Park yourself in front of a world of choices in alternative vehicles. Visit 
> the Yahoo! Auto Green Center.

Doğacan Güney

Be a better Heartthrob. Get better relationship answers from someone who knows. 
Yahoo! Answers - Check it out.

Reply via email to