[ 
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605142#action_12605142
 ] 

Dennis Kubes commented on NUTCH-635:
------------------------------------

Andrzej Bialecki wrote:

> One more question: you said the algorithm converges, but do you have a 
> reference set of values from this dataset, calculated using some other 
> pagerank impl? It would be worthwhile to make sure that the > > values are 
> indeed the PageRank, as described, and not yet another subtle variation such 
> as our OPIC

I was doing it low tech.  By turning on the debug logging, warning it is a 
large output, and using grep you can see the score converge after a few 
iterations ;)

> There are a few Java packages for computing PageRank, we could adapt one of 
> those to serve as a baseline:
> 
> http://law.dsi.unimi.it/
> http://webla.sourceforge.net/javadocs/pt/tumba/links/PageRank.html

I agree it would be a good comparison.  Strictly speaking though it is not just 
pagerank.  There are optimizations for multiple links from a given domain, 
penalties for very few inlinks, and a minimum score value.  All of which are 
able to be changed through the configuration.  Besides that it does follow the 
original pagerank algorithm closely.

> LinkAnalysis Tool for Nutch
> ---------------------------
>
>                 Key: NUTCH-635
>                 URL: https://issues.apache.org/jira/browse/NUTCH-635
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, 
> NUTCH-635-3-20080614.patch, NUTCH-635-4-20080615.patch
>
>
> This is a basic pagerank type link analysis tool for nutch which simulates a 
> sparse matrix using inlinks and outlinks and converges after a given number 
> of iterations.  This tool is mean to replace the current scoring system in 
> nutch with a system that converges instead of exponentially increasing 
> scores.  Also includes a tool to create an outlinkdb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to