Re: [Performance] Streaming main memory indexing of single strings

Doug Cutting Fri, 15 Apr 2005 16:15:49 -0700

Wolfgang Hoschek wrote:

The classic fuzzy fulltext search and similarity matching that Lucene is good for :-)

So you need a score that can be compared to other matches? This will be based on nothing but term frequency, which a regex can compute. With a single document there'll be no IDFs, so you could simply sum sqrt() of term regex match counts, and divide by the sqrt of the length of the string.

Yes, I'm playing devil's advocate...

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Performance] Streaming main memory indexing of single strings

Reply via email to