minFuzzyLength is the length in bytes, which is wrong, I think, because it is
expected to be in letters. In English the word "table" is 5 bytes, but in
Russian the word "книга" is 10 bytes, though it has only 5 letters. If I
have English and Russian words in one field I have to multiply
minFuzzyLength by 2 if the current query has Russian letters.

Though this hack works it is wrong, because you cannot swap bytes or
substitute bytes in Russian letters if you wish to guess whether it was a
typo. Every arc in FST should be a letter, not a byte.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-tp4067018.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Reply via email to