Re: Finding near duplicates which searching Documents

2009-09-24 Thread Grant Ingersoll
On Sep 23, 2009, at 2:55 PM, Jason Rutherglen wrote: I think don't this handle near duplicates which would require some of the methods mentioned recently on the Mahout list. It's pluggable and I believe the TextProfileSignature is a fuzzy implementation in Solr that was brought over from

Finding near duplicates which searching Documents

2009-09-23 Thread Ninad Raut
Hi, When we have news content crawled we face a problme of same content being repeated in many documents. We want to add a near duplicate document filter to detect such documents. Is there a way to do that in SOLR? Regards, Ninad Raut.

Re: Finding near duplicates which searching Documents

2009-09-23 Thread Shalin Shekhar Mangar
On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, When we have news content crawled we face a problme of same content being repeated in many documents. We want to add a near duplicate document filter to detect such documents. Is there a way to do that in SOLR?

Re: Finding near duplicates which searching Documents

2009-09-23 Thread Ninad Raut
Is this feature included in SOLR 1.4?? On Wed, Sep 23, 2009 at 3:29 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, When we have news content crawled we face a problme of same content being

Re: Finding near duplicates which searching Documents

2009-09-23 Thread Shalin Shekhar Mangar
On Wed, Sep 23, 2009 at 3:50 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Is this feature included in SOLR 1.4?? Yep. -- Regards, Shalin Shekhar Mangar.