[ https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508304#comment-13508304 ]
Nolan Lawson commented on LUCENE-4499: -------------------------------------- @Robert: Thanks for the clarification. I've corrected my blog post. @Roman: Yes, I think it's a very common use case. Especially considering that your query expander seems to be doing the same thing as ours! My idea with the custom QueryParserPlugin was just to have a self-contained solution that didn't mess with the core Lucene/Solr logic too much. And I think it's still configurable enough that it can handle your case-insensitivity tweaks (which I totally understand - "MIT" is not the same thing as "mit"). You'd just have to have some pretty fancy XML in the "synonymAnalyzers" section. :) > Multi-word synonym filter (synonym expansion) > --------------------------------------------- > > Key: LUCENE-4499 > URL: https://issues.apache.org/jira/browse/LUCENE-4499 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other > Affects Versions: 4.1, 5.0 > Reporter: Roman Chyla > Priority: Minor > Labels: analysis, multi-word, synonyms > Fix For: 5.0 > > Attachments: LUCENE-4499.patch > > > I apologize for bringing the multi-token synonym expansion up again. There is > an old, unresolved issue at LUCENE-1622 [1] > While solving the problem for our needs [2], I discovered that the current > SolrSynonym parser (and the wonderful FTS) have almost everything to > satisfactorily handle both the query and index time synonym expansion. It > seems that people often need to use the synonym filter *slightly* differently > at indexing and query time. > In our case, we must do different things during indexing and querying. > Example sentence: Mirrors of the Hubble space telescope pointed at XA5 > This is what we need (comma marks position bump): > indexing: mirrors,hubble|hubble space > telescope|hst,space,telescope,pointed,xa5|astroobject#5 > querying: +mirrors +(hubble space telescope | hst) +pointed > +(xa5|astroboject#5) > This translated to following needs: > indexing time: > single-token synonyms => return only synonyms > multi-token synonyms => return original tokens *AND* the synonyms > query time: > single-token: return only synonyms (but preserve case) > multi-token: return only synonyms > > We need the original tokens for the proximity queries, if we indexed 'hubble > space telescope' > as one token, we cannot search for 'hubble NEAR telescope' > You may (not) be surprised, but Lucene already supports ALL of these > requirements. The patch is an attempt to state the problem differently. I am > not sure if it is the best option, however it works perfectly for our needs > and it seems it could work for general public too. Especially if the > SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and > people would just choose what situation they use. Please look at the unittest. > links: > [1] https://issues.apache.org/jira/browse/LUCENE-1622 > [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 > [3] seems to have similar request: > http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org