[ 
https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kim Whitehall updated NUTCH-2125:
---------------------------------
    Summary: Metrics tool for relevancy  (was: Metrics)

> Metrics tool for relevancy
> --------------------------
>
>                 Key: NUTCH-2125
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2125
>             Project: Nutch
>          Issue Type: Improvement
>          Components: tool
>    Affects Versions: 1.10
>            Reporter: Kim Whitehall
>              Labels: memex
>
> Purpose: a metric for determining if the “relevancy” of a crawl after each 
> round and the “relevancy” of a page. NB: this is not a scoring plugin. By 
> default, the first 25 terms will be stored. 
> - Return the topN terms per a page 
> - Return the topN terms per a segment  based on tf-idf
> - Leverage Apache Lucene libs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to