[ https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784221#comment-13784221 ]
ASF subversion and git services commented on LUCENE-5214: --------------------------------------------------------- Commit 1528579 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1528579 ] LUCENE-5214: remove java-7 only @SafeVarargs > Add new FreeTextSuggester, to handle "long tail" suggestions > ------------------------------------------------------------ > > Key: LUCENE-5214 > URL: https://issues.apache.org/jira/browse/LUCENE-5214 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 5.0, 4.6 > > Attachments: LUCENE-5214.patch, LUCENE-5214.patch > > > The current suggesters are all based on a finite space of possible > suggestions, i.e. the ones they were built on, so they can only > suggest a full suggestion from that space. > This means if the current query goes outside of that space then no > suggestions will be found. > The goal of FreeTextSuggester is to address this, by giving > predictions based on an ngram language model, i.e. using the last few > tokens from the user's query to predict likely following token. > I got the idea from this blog post about Google's suggest: > http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html > This is very much still a work in progress, but it seems to be > working. I've tested it on the AOL query logs, using an interactive > tool from luceneutil to show the suggestions, and it seems to work well. > It's fun to use that tool to explore the word associations... > I don't think this suggester would be used standalone; rather, I think > it'd be a fallback for times when the primary suggester fails to find > anything. You can see this behavior on google.com, if you type "the > fast and the ", you see entire queries being suggested, but then if > the next word you type is "burning" then suddenly you see the > suggestions are only based on the last word, not the entire query. > It uses ShingleFilter under-the-hood to generate the token ngrams; > once LUCENE-5180 is in it will be able to properly handle a user query > that ends with stop-words (e.g. "wizard of "), and then stores the > ngrams in an FST. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org