[
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067437#comment-13067437
]
Hoss Man commented on SOLR-2477:
--------------------------------
Having just looked at this code in SOLR-2663 i'm realizing that as we add more
types of analyzers, we should really clean up the semantics of how a analyzers
w/o "type" attributes are treated, and how each of hte analyzers default if
they aren't specified.
Consider the following (contrived) example...
{code}
<fieldType name="hoss" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
{code}
Right now (on trunk and with this patch) that config will result in all of the
analyzers (index/query[/phrase]) using KeywordTokenizerFactory because the
type-less analyzer is ignored if there is is an analyzer with type="index". I
don't think that makes much sense, and as we add more types of analyzers it
makes even less sense -- an analyzer w/o a type attribute should really be the
"default" for each other type
I think we should change the overall flow to be (psudeo-code) ...
{code}
// exactly what is in the config
Analyzer defaultA = readAnalyzer(xpath("./analyzer[not(@type)]"));
Analyzer indexA = readAnalyzer(xpath("./analyzer[@type='index']"));
Analyzer queryA = readAnalyzer(xpath("./analyzer[@type='query']"));
Analyzer phraseA = readAnalyzer(xpath("./analyzer[@type='phrase']"));
if (null != defaultA) {
// we have an explicit default
if (null == indexA) indexA = defaultA;
if (null == queryA) queryA = defaultA;
if (null == phraseA) phraseA = defaultA;
} else {
// implicit defaults, either historical or common sense
if (null == queryA) queryA = indexA;
if (null == phraseA) phraseA = queryA;
}
{code}
> add analyzer type="phrase"
> --------------------------
>
> Key: SOLR-2477
> URL: https://issues.apache.org/jira/browse/SOLR-2477
> Project: Solr
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: SOLR-2477.patch
>
>
> This is just exposing LUCENE-2892, so you can easily configure things
> so that if users put things in double quotes they get a more precise search.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]