On 22/10/2011 11:11, Grant Ingersoll wrote:
Hi All,

I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
(http://na11.apachecon.com/talks/18396).  It's based on my observation, that over the 
years, a number of us in the community have done some pretty cool things using Lucene 
that don't fit under the core premise of full text search.  I've got a fair number of 
ideas for the talk (easily enough for 1 hour), but I wanted to reach out to hear your 
stories of ways you've (ab)used Lucene and Solr to see if we couldn't extend the 
conversation to a bit more than the conference and also see if I can't inject more ideas 
beyond the ones I have.  I don't need deep technical details, but just high level use 
case and the basic insight that led you to believe Lucene could solve the problem.

Better late than never ... :) I briefly mentioned this use case to you at Eurocon, but here it is for the record.

I used Lucene in a duplicate-detection scenario where instead of documents individual sentences would be indexed (with a fuzz). A similarity-preserving hash function was calculated on each sentence, and the hash was added as a field. The property of the hash was that similar documents (sentences) would produce a similar hash, with only some bit-level perturbation. The challenge was to find a ranked list of possible duplicates with similar (not exact same) hashes, which in this case meant to find a ranked list of documents that have the smallest bit-level distance in their hashes from the query hash.

The solution is described in SOLR-1918 - Bit-wise scoring field type.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to