On Apr 15, 2005, at 4:15 PM, Doug Cutting wrote:

Wolfgang Hoschek wrote:
The classic fuzzy fulltext search and similarity matching that Lucene is good for :-)

So you need a score that can be compared to other matches? This will be based on nothing but term frequency, which a regex can compute. With a single document there'll be no IDFs, so you could simply sum sqrt() of term regex match counts, and divide by the sqrt of the length of the string.

Is there a function f that can translate any lucene query (with all its syntax and fuzzy features) to a regex? E.g. how to translate StandardAnalyzer or stemming into a regex? If so, yes, but that seems unlikely, no?


My particular interest is to use XQuery for *precisely* locating information subsets in networked XML messages, and then to use Lucene's fulltext functionality for *fuzzy* searches within such a precise subset. Messages are classified and routed/forwarded accordingly. See http://dsd.lbl.gov/nux/ for background. [BTW, XQuery already has regexes built-in].


Yes, I'm playing devil's advocate...

Always a good thing to check assumptions :-)


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to