[jira] [Updated] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

Robert Muir (Updated) (JIRA) Mon, 20 Feb 2012 06:05:09 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated SOLR-3143:
------------------------------

    Attachment: SOLR-3143.patch

Wow, phrase suggestions are ridiculously complicated to get working.

I think we need to add some configuration to the example (maybe commented out), 
because in my opinion this is really the default use case... but its a lot of 
configuration and the biggest traps imo are:

# You need to write a custom queryconverter in java code (i provide one in this 
patch) configured as a plugin, and set as queryConverter (is this global or is 
there a way to set this per-suggester?!)
# You need to make *sure* onlyMorePopular is true, even though it says it 
doesn't affect file-based spellcheckers, thats a lie. This controls whether 
results are alpha-sorted or ordered by relevance!
# (Assuming your queryConverter is well-behaved and respects the analyzer), You 
need to define a custom fieldType in schema.xml, even though its likely not 
used by any actual solr fields, that uses KeywordTokenizer + lowercase or 
whatever you want, and set this via queryAnalyzerFieldType. If you don't do 
this, it will default to whitespacetokenizer.

Anyway, attached is my patch, basically its a QueryConverter that just passes 
the whole string as-is to the query analyzer.

In my test analyzer config, i added a horrible regexp that tries to emulate 
what google's autocomplete seems to do: lowercase, collapse runs of whitespace, 
remove query syntax etc.

But maybe for a lot of people thats even overkill and they could just use 
Keyword+Lowercase or whatever.

                
> Supply a phrase-oriented QueryConverter for Suggesters
> ------------------------------------------------------
>
>                 Key: SOLR-3143
>                 URL: https://issues.apache.org/jira/browse/SOLR-3143
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3143.patch
>
>
> The supplied QueryConverter makes sense for Spellcheckers:
> it tries to parse out the 'meat' of the query (using e.g. identifier rules), 
> and analyzes each parsed 'word' with the configured analyzer (separate 
> tokenstream).
> {code}
> words[] = splitByIdentifierRules();
> for (each word) {
>  tokenstream ts = analyzer.tokenStream(word)
>  for (each analyzedWord from tokenstream) {
>    tokens.add(analyzedWord)
>  }
> }
> {code}
> However, for Suggesters this is not really optimal, because in the general
> case they do not work one word at a time: they aren't really suggesting 
> individual words but instead an entire 'query' that matches a prefix.
> so instead here, I think we just want a QueryConverter that creates a 
> single string containing all the 'meat', and we pass the whole thing to 
> the analyzer, then the suggester.
> The current workaround on the wiki to this problem, is to ask the user to 
> write custom
> code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats 
> not 
> great since this phrase-based suggesting is really the primary use case for
> suggesters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

Reply via email to