We're maintaining a large database of tagged images and had a need to
perform "fuzzy search" of the database. The existing search tool takes exact
queries only. So it was necessary to hack up a little tool to sit between
the query source and the engine and transform the query into a "fuzzy
query". You can think of it like this: the input query is something like x
AND y and the output is something like (x AND (y OR y2 OR y3)) OR (y AND (x
OR x2 OR x3)) where y2 and y3 are in some sense "near" y and x2 and x3
"near" x. The query transformation is simple enough, but it was really handy
being able to test it at a REPL, and even hot-swap modifications and do live
testing and tweaking of e.g. the neighborhood graphs used to "fuzzify" query
terms.

(Right now it only allows "slop" in one query term; sort of a
Hamming-distance-1 matcher. Extending it further would invite a
combinatorical explosion as well as lead to noisier, less useful results.
Distance 1 seems to be the "sweet spot" for our app.)

The code takes the input query, takes apart the terms, and generates seqs of
alternatives, with some (butlast (interleave foo (repeat "OR"))) type stuff
here and there, and cobbles them together again using seq functions, str,
and java.lang.String methods. (Some query terms need to be parsed, e.g. are
a hyphenated entity the first part of which should be fuzzy-matchable while
the second part should stay constant, etc.)

Rather boring? Little things like this incrementally improve services. Two
guys hacking like this in their garage went on to start Google. :) We don't
have such lofty aspirations, but it's still something to keep in mind.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to