doubling score causes by page internal anchors.
-----------------------------------------------

                 Key: NUTCH-332
                 URL: http://issues.apache.org/jira/browse/NUTCH-332
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 0.8-dev
            Reporter: Stefan Groschupf
            Priority: Blocker
             Fix For: 0.8-dev


When a page has no outlinks but several links to itself e.g. it has a set of 
anchors the scores of the page are distributed to its outlinks. But all this 
outlinks pointing to the page back. This causes that the page score is doubled. 
I'm not sure but may be this causes also a never ending fetching loop of this 
page, since outlinks with the status of CrawlDatum.STATUS_LINKED are set 
CrawlDatum.STATUS_DB_UNFETCHED in CrawlDBReducer line: 107. 
So may be the status fetched will be overwritten with unfetched. 
In such a case we fetch the page every-time again and also every-time double 
the score of this page what causes very high scores without any reasons.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to