On Sep 23, 2009, at 2:55 PM, Jason Rutherglen wrote:
I think don't this handle near duplicates which would require some of
the methods mentioned recently on the Mahout list.
It's pluggable and I believe the TextProfileSignature is a fuzzy
implementation in Solr that was brought over from
Hi,
When we have news content crawled we face a problme of same content being
repeated in many documents. We want to add a near duplicate document filter
to detect such documents. Is there a way to do that in SOLR?
Regards,
Ninad Raut.
On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:
Hi,
When we have news content crawled we face a problme of same content being
repeated in many documents. We want to add a near duplicate document
filter
to detect such documents. Is there a way to do that in SOLR?
Is this feature included in SOLR 1.4??
On Wed, Sep 23, 2009 at 3:29 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.com
wrote:
Hi,
When we have news content crawled we face a problme of same content being
On Wed, Sep 23, 2009 at 3:50 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:
Is this feature included in SOLR 1.4??
Yep.
--
Regards,
Shalin Shekhar Mangar.