[ http://issues.apache.org/jira/browse/NUTCH-235?page=all ]

Andrzej Bialecki  updated NUTCH-235:
------------------------------------

    Attachment: set-patch.txt

Same functionality, but using a HashSet.

> Duplicate Inlink values
> -----------------------
>
>          Key: NUTCH-235
>          URL: http://issues.apache.org/jira/browse/NUTCH-235
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: patch.txt, set-patch.txt
>
> Reading the code for LinkDb.reduce():  if we have page duplicates in input 
> segments, or if we have two copies of the same input segment, we will create 
> the same Inlink values (satisfying Inlink.equals()) multiple times. Since 
> Inlinks is a facade for List, and not a Set, we will get duplicate Inlink-s 
> in Inlinks (if you know what I mean  ;) .
> The problem is easy to test: create a new linkdb based on 2 identical 
> segments. This problem also makes it more difficult to properly implement 
> LinkDB updating mechanism (i.e. incremental invertlinks).
> I propose to change Inlinks to use a Set semantics, either explicitly by 
> using a HashSet or implicitly by checking if a value to be added already 
> exists. If there are no objections I'll commit this change shortly.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to