[Nutch-general] Need help with deleteduplicates

sdeck Tue, 19 Dec 2006 21:45:30 -0800

Hello,
  I am running nutch .8 against hadoop .4, just for reference
I want to add a delete duplicate based on a similarity algorithm, as opposed
to the hash method that is currently in there.
I would have to say I am pretty lost as to how the delete duplicates class
is working.
I would guess that I need to implement a compareTo method, but I am not
really sure what to return. Also, when I do return something, where do I
implement the functionality to say "yes, these are dupes, so remove the
first one)


Can anyone help out?
Thanks,
S
-- 
View this message in context: 
http://www.nabble.com/Need-help-with-deleteduplicates-tf2858127.html#a7985094
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Need help with deleteduplicates

Reply via email to