[ https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kim Whitehall updated NUTCH-2125: --------------------------------- Summary: Metrics tool for relevancy (was: Metrics) > Metrics tool for relevancy > -------------------------- > > Key: NUTCH-2125 > URL: https://issues.apache.org/jira/browse/NUTCH-2125 > Project: Nutch > Issue Type: Improvement > Components: tool > Affects Versions: 1.10 > Reporter: Kim Whitehall > Labels: memex > > Purpose: a metric for determining if the “relevancy” of a crawl after each > round and the “relevancy” of a page. NB: this is not a scoring plugin. By > default, the first 25 terms will be stored. > - Return the topN terms per a page > - Return the topN terms per a segment based on tf-idf > - Leverage Apache Lucene libs -- This message was sent by Atlassian JIRA (v6.3.4#6332)