On 22/10/2011 11:11, Grant Ingersoll wrote:
Hi All,
I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..."
(http://na11.apachecon.com/talks/18396). It's based on my observation, that over the
years, a number of us in the community have done some pretty cool things using Lucene
that don't fit under the core premise of full text search. I've got a fair number of
ideas for the talk (easily enough for 1 hour), but I wanted to reach out to hear your
stories of ways you've (ab)used Lucene and Solr to see if we couldn't extend the
conversation to a bit more than the conference and also see if I can't inject more ideas
beyond the ones I have. I don't need deep technical details, but just high level use
case and the basic insight that led you to believe Lucene could solve the problem.
Better late than never ... :) I briefly mentioned this use case to you
at Eurocon, but here it is for the record.
I used Lucene in a duplicate-detection scenario where instead of
documents individual sentences would be indexed (with a fuzz). A
similarity-preserving hash function was calculated on each sentence, and
the hash was added as a field. The property of the hash was that similar
documents (sentences) would produce a similar hash, with only some
bit-level perturbation. The challenge was to find a ranked list of
possible duplicates with similar (not exact same) hashes, which in this
case meant to find a ranked list of documents that have the smallest
bit-level distance in their hashes from the query hash.
The solution is described in SOLR-1918 - Bit-wise scoring field type.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org