Boost is calculated during the indexing phase by plugins:

scoring.link
scoring.opik
scoring.tld

Also boost is calculated during query by corresponding query plugins and my
apply to certain fields. As I said boost is an essential part of the scoring
algorithm.

Digest is calculated during the crawl phase bu one of

MD5Signature
FeedSignature
TextProfileSignature

Digest *usually *but not always is page(content) md5 hash.

Look at implementation of each plugin to see what particularly is used to
calculate digest and boosts. Everything might be in the game.

Best Regards
Alexander Aristov


2009/6/16 Fabrice Estiévenart <[email protected]>

> Thank you,
>
> From which information are they computed ? My suppositions :
>
> Boost : inlinks, ... ?
> Digest : content, url, title, ... ?
>
> Fabrice
>
> Alexander Aristov a écrit :
>
>> Hi
>>
>> Boost is used to calculate document (field) score which is used by Lucene
>> in
>> queries to find the best results. It's part of the scoring algorithms.
>>
>> Digest is used to identify pages (like unique ID) and is used to remove
>> duplicates during the dedup procedure.
>>
>> Best Regards
>> Alexander Aristov
>>
>>
>> 2009/6/16 Fabrice Estiévenart <[email protected]>
>>
>>
>>
>>> Hello,
>>>
>>> How are computed the "boost" and the "digest" fields in a Nutch index ?
>>> What are they precisely using for ?
>>>
>>> I can't find this information, thanks.
>>>
>>> --
>>> Fabrice Estiévenart, Ingénieur R&D, CETIC
>>> Tél : +32 (0)71/49.07.28
>>> Web : http://www.cetic.be
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> --
> Fabrice Estiévenart, Ingénieur R&D, CETIC
> Tél : +32 (0)71/49.07.28
> Web : http://www.cetic.be
>
>

Reply via email to