The linkanalisis tool needs long time to process.
Doug wrotte some comments from it:
The fetchlist.score.by.link.count and the indexer.boost.by.link.count to true. And forgot using of linkanalysis tool.
I use these method from since 2005 June, without problem.
With the linkanalysis tool the scoring is better, but with the explained setup it is near scoring - without many resource usage.

Michael Ji wrotte:

Hi,

As my understanding, link anaylsis is neccessary to run whenever a new fetching is updated to webdb.
Because the link graphic is changed ( it is possible
that new links are added and old links are deleted ),
the score for each node is changed so a recaculation
is neccessary.
Link analysis will update the score for each node (by
page) in webdb, then updatesegmentfromdb needs to run
to copy recalculated score to segment.

I can't see a point that we can skip link anaylsis. Am
I missing something important? Let me know.

thanks,

Michael Ji,


--- AJ Chen <[EMAIL PROTECTED]> wrote:

I assume you mean UpdateSegmentFromDB, and there is
no need to run link analysis tool if I want to use the number of inlinks for nutch score. Right? I tried to find your patch, but couldn't find
it. How to find it?
-AJ

Piotr Kosiorowski wrote:

UpdateDB copies link information and score from
the WebDB to segments
so it is important to have score calculated before
updatedb is run.
One can use current standard nutch score (based on
number of inlinks)
or try to use analyze - I have committed a patch
for it some time ago
that might help a bit with it disk space
requirements so the best
approach would be to test it (it worked ok for me)
and if it is ok for
you - report it so others can also try it out.
Regards
Piotr
AJ Chen wrote:

In a whole-web or vertical crawling setting, is
it right that link
analysis and update segment from DB should be
performed in right
order before indexing the segments?

There's not much talk about update segment from
DB. I think it should
be an important step. Could someone point out
when it should be run
and what the benefits are?

I remember it was mentioned sometime ago that the
link analysis tool
does not work yet and the number of in-links
should be used instead.
Any update? If it's still not working, how to set
it to use link
numbers?

Thanks,
AJ





                
__________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com



Reply via email to