You could create an alternate field in which you index the reverse tokens, using a reverse token filter appended to the analysis chain of the main field.
That leaves the question of how do you manage that in a service context, and what kind of query makes use of two fields? Otherwise you have to make a lower level change of some sort. I'm not sure where you would do that cleanly. On Wed, Jul 10, 2019, 7:40 AM Juan Caicedo <juan.caic...@cavorite.com> wrote: > I didn’t integrate it directly with a Lucene index. I used the FST class > (and related utilities) to build a stand-alone spellchecker. > > However, the FST for the term dictionary is modified at index time only. > At query time, we need to create a variation of the Levemshtein automata > and match it with the FST accordingly. > > I need to take a closer look at the Lucene code to see how can we > integrate it with the existing modules. I’ll do that this week. > > On Tue, Jul 9, 2019 at 03:18 Michael McCandless <luc...@mikemccandless.com> > wrote: > >> This sounds interesting! >> >> Did your patch build the FST at index time, so at search time all that >> was needed was to load the FST from the index directory? >> >> Is the FST per-segment? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Tue, Jul 9, 2019 at 1:43 AM Juan Caicedo <juan.caic...@cavorite.com> >> wrote: >> >>> Hi Michael, >>> >>> I guess that I should have added more details :-) >>> >>> The main benefit from the technique is to do approximate search in >>> large dictionaries. That is: find all the entries that are within 1 or >>> 2 edit steps from the query. It's more useful for longer (starting >>> from ~7 characters, iirc), but in general the technique can be applied >>> to queries of any length. The main requirement is that we build the >>> dictionary as an FST, using the backwards (reversed) keys of the >>> original dictionary. >>> >>> I initially used the technique to implement a stand-alone >>> spellchecker, but I think that it can also be used to optimize the >>> Lucene fuzzy queries (e.g. for the spelling/suggest module). However, >>> I'll need to look how can it be integrated with the part that creates >>> the dictionary. >>> >>> I'll take a look at the code this week and I'll try to publish it in a >>> public repository so that we can discuss about this with more concrete >>> details. >>> >>> I skimmed the paper trying to understand possible applications of the >>> technique. It sounds like efficient approximate (ie with some edits) >>> substring search is the main idea? I don't believe such a query exists >>> today in Lucene (nor any Suggester as far as I know). It sounds as if >>> this would be useful for searching within large strings, eg DNA >>> sequences or something like that, and maybe less applicable to typical >>> "full text" (ie tokenized) search where the strings being searched are >>> relatively shorter - does that sound right? >>> >>> On Sat, Jul 6, 2019 at 2:01 PM Michael Sokolov <msoko...@gmail.com> >>> wrote: >>> > >>> > Juan, that sounds intriguing. >>> > >>> > I skimmed the paper trying to understand possible applications of the >>> > technique. It sounds like efficient approximate (ie with some edits) >>> > substring search is the main idea? I don't believe such a query exists >>> > today in Lucene (nor any Suggester as far as I know). It sounds as if >>> > this would be useful for searching within large strings, eg DNA >>> > sequences or something like that, and maybe less applicable to typical >>> > "full text" (ie tokenized) search where the strings being searched are >>> > relatively shorter - does that sound right? >>> > >>> > On Sat, Jul 6, 2019 at 12:35 PM Juan Caicedo < >>> juan.caic...@cavorite.com> wrote: >>> > > >>> > > Hello, >>> > > >>> > > I've been working on a project for extending LevenshteinAutomata and >>> > > I'd like to know if it would be useful to add it to Lucene. >>> > > >>> > > I've implemented the 'backwards dictionary' technique (see [1], >>> > > section 6) for speeding up approximate search. This technique allows >>> > > us to narrow down the search and, therefore, reduce the running time >>> > > (at the expense of using more memory). >>> > > >>> > > I implemented it quite some time ago using an older version of >>> Lucene, >>> > > so I need to revisit the code. However, the implementation was >>> > > relatively simple and it didn't require major changes to the core >>> > > classes. I can share the code in a public repository and iterate on >>> > > it, while I make it compatible for new Lucene APIs, add benchmarks, >>> > > and more unit tests. >>> > > >>> > > Ideally, I'd like to contribute to Lucene, either as part of core, >>> > > suggest or a different module. >>> > > >>> > > What do you think? >>> > > >>> > > [1] >>> https://www.cis.uni-muenchen.de/download/publikationen/fastapproxsearch.pdf >>> > > >>> > > --------------------------------------------------------------------- >>> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> > > For additional commands, e-mail: dev-h...@lucene.apache.org >>> > > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> > For additional commands, e-mail: dev-h...@lucene.apache.org >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>>