[
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507440#comment-13507440
]
Roman Chyla commented on LUCENE-4499:
-------------------------------------
Hi Nolan, your case seems to confirm a need for some solution. You have decided
to make a seaprate query parser, I have put the expanding logic into a query
parser as well.
See this for the working example:
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/analysis/TestAdsabsTypeFulltextParsing.java
And its config
https://github.com/romanchyla/montysolr/blob/master/contrib/examples/adsabs/solr/collection1/conf/schema.xml#L325
I see two added benefits (besides not needing a query parser plugin - in our
case, it must be plugged into our qparser):
1. you can use the filter at index/query time inside a standard query parser
2. special configuration for synonym expansion (for example, we have found it
very useful to be able to search for multi-tokens in case-insensitive manner,
but recognize single tokens only case-sensitively; or expand with multi-token
synonyms only for multi-word originals and output also the original words,
otherwise eat them (replace them))
Nice blog post, I wish I could write as instructively as well :)
> Multi-word synonym filter (synonym expansion)
> ---------------------------------------------
>
> Key: LUCENE-4499
> URL: https://issues.apache.org/jira/browse/LUCENE-4499
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/other
> Affects Versions: 4.1, 5.0
> Reporter: Roman Chyla
> Priority: Minor
> Labels: analysis, multi-word, synonyms
> Fix For: 5.0
>
> Attachments: LUCENE-4499.patch
>
>
> I apologize for bringing the multi-token synonym expansion up again. There is
> an old, unresolved issue at LUCENE-1622 [1]
> While solving the problem for our needs [2], I discovered that the current
> SolrSynonym parser (and the wonderful FTS) have almost everything to
> satisfactorily handle both the query and index time synonym expansion. It
> seems that people often need to use the synonym filter *slightly* differently
> at indexing and query time.
> In our case, we must do different things during indexing and querying.
> Example sentence: Mirrors of the Hubble space telescope pointed at XA5
> This is what we need (comma marks position bump):
> indexing: mirrors,hubble|hubble space
> telescope|hst,space,telescope,pointed,xa5|astroobject#5
> querying: +mirrors +(hubble space telescope | hst) +pointed
> +(xa5|astroboject#5)
> This translated to following needs:
> indexing time:
> single-token synonyms => return only synonyms
> multi-token synonyms => return original tokens *AND* the synonyms
> query time:
> single-token: return only synonyms (but preserve case)
> multi-token: return only synonyms
>
> We need the original tokens for the proximity queries, if we indexed 'hubble
> space telescope'
> as one token, we cannot search for 'hubble NEAR telescope'
> You may (not) be surprised, but Lucene already supports ALL of these
> requirements. The patch is an attempt to state the problem differently. I am
> not sure if it is the best option, however it works perfectly for our needs
> and it seems it could work for general public too. Especially if the
> SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and
> people would just choose what situation they use. Please look at the unittest.
> links:
> [1] https://issues.apache.org/jira/browse/LUCENE-1622
> [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
> [3] seems to have similar request:
> http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]