[
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055050#comment-13055050
]
Robert Muir commented on LUCENE-3130:
-------------------------------------
{quote}
Currently I use a separate field for phonetic normalization and include it with
a lower weight in DisMax. If phonetic variant instead was stored alongside the
original with posIncr=0 and tokenType=phonetic, I could instead specify a
deboost factor for phonetic terms and even highlighting would work ootb!
{quote}
This doesn't make any sense to me: how is this "better" shoved into one field
than two fields? I don't see any advantage at all. field A with original terms
and field B with phonetic terms is no less efficient in the index than having
field AB with both mixed up, but keeping them separate keeps code and
configurations simple.
As for the highlighting, that sounds like a highlighting problem, not an
analysis problem. If its often the case that users use things like copyField
and do this boosting, then highlighting in Solr needs to be fixed to correlate
the offsets back to the original stored field: but we need not make analysis
more complicated because of this limitation.
{quote}
If the LowerCaseFilter would keep the original token and add a lowercased token
on same posIncr with tokenType=lowercase, we could support case insensitive
match with preference for correct case.
{quote}
I don't think we should complicate our tokenfilters with such things: in this
case I think it would just make the code more complicated and make relevance
worse: often case is totally meaningless and boosting terms for some arbitrary
reason will skew scores.
This is for the same reason as above. If you want to do this, I think you
should use two fields, one with no case, and one with case, and boost one of
them.
> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should
> give lower boosts
> -----------------------------------------------------------------------------------------------
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that
> matches on the original term specified by the user would score higher then
> matches on the synonym. It occurred to me later that a float Attribute could
> be set by the SynonymFilter in such situations, and QueryParser could use
> that float as a boost in the resulting Query. IThis would be fairly
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have
> to decide how to handle the case of synonyms with multiple terms that produce
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied
> the boost attribute into the payload attribute, these same filters could give
> "penalizing" payloads to terms when used at index time) could give
> "penalizing" payloads to terms.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]