Marjan Celikik a écrit :
Mathieu Lecarme wrote:
wever I don't fully understand what do you mean by "iterate over
your query". I would like a conceptual answer how is this done with
Lucene, not a technical one..
Your query is a tree, with BooleanQuery as branch and other query as
leaf. If you wont to transforma query in "tolerant query", you have
to change Term query (the leaf), with a "OR" branch with variant
term as leaf.
To find variant of a term, you have to used a list of your Term and
apply a filter to its to group them. Common filter for that are
stemming, ngram+levenstein distance, phonetic ...
M.
OK, now it's more clear.. my final question is when is this filter
information incorporated.. at index time or at search time?
both. You've got two index, one for your data, one for your Term. The
second (dictionnary, lexicon ...) uses one Document per Term, and n
Field for informations like ngram or phonetic. When you search a near
word, you build data from the word, build a request with this data, and
sort result with levenstein distance. You've got an ordered list of
suggestion.
i.e. I want to know whether the levenshtein distance is computed at
query time or this information is precomputed in the index?
First lucene select candidate, after you pick the best from this list.
Levenstein distance is only apply is only apply on few words.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]