Re: Merge multiple FSTs to build suggesters

Michael McCandless Thu, 09 Jul 2020 12:40:06 -0700

Hi Karthic,

There are known algorithms to take the union of FSTs, but unfortunately
they are not yet implemented in Lucene -- patches welcome!

The fun OpenFST library does implement it:
http://www.openfst.org/twiki/bin/view/FST/UnionDoc  Maybe that could be
used for inspiration/poaching?  It is also Apache licensed.

Unfortunately, building the FST is memory consuming.  There are a few
expert parameters to the FST.Builder that you could tweak to use less
memory (at the cost of producing a somewhat larger FST in the end).

Elasticsearch works around this limitation by writing an FST per segment,
and then at suggest time, it pulls best suggestions for each segment and
then does a partial/merge sort in the end to get the overall best.  This
lets the suggester remain near-real-time...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 6, 2020 at 8:17 AM Karthik zorfy <[email protected]>
wrote:

> Hi,
>
> I'm working on an application which uses fuzzy suggester to provide auto
> complete feature with fuzzy matching. I need to periodically build
> suggesters in order for the latest data to reflect in suggest results. As
> the index size grows, I frequently run into OutOfMemory issue when building
> suggesters and require manual intervention to increase the JVM heap size.
>
> I'm thinking about the following approach to overcome this issue.
>
> Split the search index(search documents) into multiple segments and build
> suggest at segment level and finally merge the suggest results(FSTs).
>
> Has anyone solved similar use case or have any suggestions.
>
>
> Best,
> Karthic
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Merge multiple FSTs to build suggesters

Reply via email to