[ https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated SOLR-3143: ------------------------------ Attachment: SOLR-3143.patch Wow, phrase suggestions are ridiculously complicated to get working. I think we need to add some configuration to the example (maybe commented out), because in my opinion this is really the default use case... but its a lot of configuration and the biggest traps imo are: # You need to write a custom queryconverter in java code (i provide one in this patch) configured as a plugin, and set as queryConverter (is this global or is there a way to set this per-suggester?!) # You need to make *sure* onlyMorePopular is true, even though it says it doesn't affect file-based spellcheckers, thats a lie. This controls whether results are alpha-sorted or ordered by relevance! # (Assuming your queryConverter is well-behaved and respects the analyzer), You need to define a custom fieldType in schema.xml, even though its likely not used by any actual solr fields, that uses KeywordTokenizer + lowercase or whatever you want, and set this via queryAnalyzerFieldType. If you don't do this, it will default to whitespacetokenizer. Anyway, attached is my patch, basically its a QueryConverter that just passes the whole string as-is to the query analyzer. In my test analyzer config, i added a horrible regexp that tries to emulate what google's autocomplete seems to do: lowercase, collapse runs of whitespace, remove query syntax etc. But maybe for a lot of people thats even overkill and they could just use Keyword+Lowercase or whatever. > Supply a phrase-oriented QueryConverter for Suggesters > ------------------------------------------------------ > > Key: SOLR-3143 > URL: https://issues.apache.org/jira/browse/SOLR-3143 > Project: Solr > Issue Type: New Feature > Components: spellchecker > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 3.6, 4.0 > > Attachments: SOLR-3143.patch > > > The supplied QueryConverter makes sense for Spellcheckers: > it tries to parse out the 'meat' of the query (using e.g. identifier rules), > and analyzes each parsed 'word' with the configured analyzer (separate > tokenstream). > {code} > words[] = splitByIdentifierRules(); > for (each word) { > tokenstream ts = analyzer.tokenStream(word) > for (each analyzedWord from tokenstream) { > tokens.add(analyzedWord) > } > } > {code} > However, for Suggesters this is not really optimal, because in the general > case they do not work one word at a time: they aren't really suggesting > individual words but instead an entire 'query' that matches a prefix. > so instead here, I think we just want a QueryConverter that creates a > single string containing all the 'meat', and we pass the whole thing to > the analyzer, then the suggester. > The current workaround on the wiki to this problem, is to ask the user to > write custom > code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats > not > great since this phrase-based suggesting is really the primary use case for > suggesters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org