[ http://issues.apache.org/jira/browse/NUTCH-332?page=all ]

Sami Siren updated NUTCH-332:
-----------------------------

    Fix Version/s: 0.9
                       (was: 0.8)

> doubling score causes by page internal anchors.
> -----------------------------------------------
>
>                 Key: NUTCH-332
>                 URL: http://issues.apache.org/jira/browse/NUTCH-332
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Stefan Groschupf
>            Priority: Blocker
>             Fix For: 0.9
>
>         Attachments: scoreDoubling.patch
>
>
> When a page has no outlinks but several links to itself e.g. it has a set of 
> anchors the scores of the page are distributed to its outlinks. But all this 
> outlinks pointing to the page back. This causes that the page score is 
> doubled. 
> I'm not sure but may be this causes also a never ending fetching loop of this 
> page, since outlinks with the status of CrawlDatum.STATUS_LINKED are set 
> CrawlDatum.STATUS_DB_UNFETCHED in CrawlDBReducer line: 107. 
> So may be the status fetched will be overwritten with unfetched. 
> In such a case we fetch the page every-time again and also every-time double 
> the score of this page what causes very high scores without any reasons.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to