[ 
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-635:
-------------------------------

    Attachment: NUTCH-635-2-20080613.patch

Updated patch.  Contains a score updater for crawl db.  A scoring filter to 
work with the link analysis tool.  Updated the LinkAnalysis tool to handle 
reciprocal links, links from the same domain/subdomains, rank sinks, and link 
loops.  Also included a display tool to view inlinks/outlinks and scores for a 
given url.  Should be ready for large scale testing.  Tested on a dataset of 
25K pages and the results were promising.

> LinkAnalysis Tool for Nutch
> ---------------------------
>
>                 Key: NUTCH-635
>                 URL: https://issues.apache.org/jira/browse/NUTCH-635
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch
>
>
> This is a basic pagerank type link analysis tool for nutch which simulates a 
> sparse matrix using inlinks and outlinks and converges after a given number 
> of iterations.  This tool is mean to replace the current scoring system in 
> nutch with a system that converges instead of exponentially increasing 
> scores.  Also includes a tool to create an outlinkdb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to