[ https://issues.apache.org/jira/browse/NUTCH-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066649#comment-13066649 ]
Markus Jelsma commented on NUTCH-1044: -------------------------------------- Can you provide a patch? > Redirected URLs and possibly all of their outlinked URLs have invalid scores. > ----------------------------------------------------------------------------- > > Key: NUTCH-1044 > URL: https://issues.apache.org/jira/browse/NUTCH-1044 > Project: Nutch > Issue Type: Bug > Components: fetcher, parser > Affects Versions: 1.3 > Reporter: Nutch User - 1 > > 1.: > http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html > 2.: > http://lucene.472066.n3.nabble.com/A-possible-solution-to-my-URL-redirection-and-zero-scores-problem-td3162164.html > Please note that also URLs redirected by meta refresh redirection do have > invalid scores. For such URLs a CrawlDatum is created on the lines 157-177 of > ParseOutputFormat.java > (http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/parse/ParseOutputFormat.java?view=markup). > The new CrawlDatum's score isn't set anywhere after the creation so it's > 1.0f as can be seen on the line 122 of CrawlDatum.java > (http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/crawl/CrawlDatum.java?view=markup). > It's another question whether the redirected URL's score should be just > passed to the new URL or should the redirection be considered as a link in > which case the new URL's score would be 'originalScore' / ('numberOfOutlinks' > + 1). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira