Happy New Year, Devs! Excuse me for the noob's question. I'm not able to get deep into FST internals. I run trivial benchmark and not really enjoyed by the results.
I'm looking for the ultra-fast spelling correction. Right now I use 3.x SpellChecker which is backed on separate Lucene Ngram index.FWIW, it's persistent, not in RAMDirectory. Now the bottleneck is I/O. Reading that Lucene Ngram index takes too much time. I guess it might be solved by loading Lucene Ngram index into RAMDirectory, but I want to exploit FST spell check from 4.0. What I see, and what makes me wonder. Every DirectSpellChecker.suggestSimilar() creates new FuzzyTermsEnum and every time it scans the termsEnum by FilteredTermsEnum.next(). And here I hit the same slow IO bummer. It might be necessary detail: I read 3.x index by 4.0 code. I don't think it changes something. I don't know anything about FST, but I've thought that it's a compact graph of syllables, which is visited for finding string similar to the given i.e. I expect it won't scan termsEnum for every lookup. Please tell me what's wrong in my expectations. Thanks! -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <[email protected]>
