Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

robert engels Tue, 06 Jan 2009 13:42:18 -0800

I don't think that is the case. You will have single deletionneighborhood. The number of unique terms in the field is going to bethe union of the deletion dictionaries of each source term.

For example, given the following documents A which have field 'X'with value best, and document B with value jest (and k == 1).


A will generate est bst, bet, bes, B will generate est, jest, jst, jes

so field FieldXFuzzy contains(est:AB,bst:A,bet:A,bes:A,jest:B,jst:B,jes)


I don't think the storage requirement is any greater doing it this way.


3.2.1 Indexing

For all words in a dictionary, and a given number of edit operationsk, FastSSgenerates all variant spellings recursively and save them as tuplesof typev′ ∈ Ud (v, k) → (v, x) where v is a dictionary word and x alist of deletion

positions.

Theorem 5. Index uses O(nmk+1) space, as it stores al l the variantsfor n

dictionary words of length m with k mismatches.


3.2.2 Retrieval

For a query p and edit distance k, first generate the neighborhood Ud(p, k).

Then compare the words in the neighborhood with the index, and find
matching candidates. Compare deletion positions for each candidate with
the deletion positions in U(p, k), using Theorem 4.

Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

Reply via email to