[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dennis Kubes updated NUTCH-635: ------------------------------- Attachment: NUTCH-635-2-20080613.patch Updated patch. Contains a score updater for crawl db. A scoring filter to work with the link analysis tool. Updated the LinkAnalysis tool to handle reciprocal links, links from the same domain/subdomains, rank sinks, and link loops. Also included a display tool to view inlinks/outlinks and scores for a given url. Should be ready for large scale testing. Tested on a dataset of 25K pages and the results were promising. > LinkAnalysis Tool for Nutch > --------------------------- > > Key: NUTCH-635 > URL: https://issues.apache.org/jira/browse/NUTCH-635 > Project: Nutch > Issue Type: New Feature > Affects Versions: 1.0.0 > Environment: All > Reporter: Dennis Kubes > Assignee: Dennis Kubes > Fix For: 1.0.0 > > Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch > > > This is a basic pagerank type link analysis tool for nutch which simulates a > sparse matrix using inlinks and outlinks and converges after a given number > of iterations. This tool is mean to replace the current scoring system in > nutch with a system that converges instead of exponentially increasing > scores. Also includes a tool to create an outlinkdb. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.