Re: [Performance] Streaming main memory indexing of single strings

Wolfgang Hoschek Fri, 15 Apr 2005 16:33:06 -0700

On Apr 15, 2005, at 4:15 PM, Doug Cutting wrote:

Wolfgang Hoschek wrote:
The classic fuzzy fulltext search and similarity matching that Lucene is good for :-)
So you need a score that can be compared to other matches? This will be based on nothing but term frequency, which a regex can compute. With a single document there'll be no IDFs, so you could simply sum sqrt() of term regex match counts, and divide by the sqrt of the length of the string.

Is there a function f that can translate any lucene query (with all its syntax and fuzzy features) to a regex? E.g. how to translate StandardAnalyzer or stemming into a regex? If so, yes, but that seems unlikely, no?

My particular interest is to use XQuery for *precisely* locating information subsets in networked XML messages, and then to use Lucene's fulltext functionality for *fuzzy* searches within such a precise subset. Messages are classified and routed/forwarded accordingly. See http://dsd.lbl.gov/nux/ for background. [BTW, XQuery already has regexes built-in].

Yes, I'm playing devil's advocate...


Always a good thing to check assumptions :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Performance] Streaming main memory indexing of single strings

Reply via email to