[ https://issues.apache.org/jira/browse/SOLR-16930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753370#comment-17753370 ]
Chris M. Hostetter commented on SOLR-16930: ------------------------------------------- As of today, here are all the SPI names I could find in lucene {{branch_9x}} that are used by more then on class... {noformat} lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKWidthCharFilterFactory.java: public static final String NAME = "cjkWidth"; lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKWidthFilterFactory.java: public static final String NAME = "cjkWidth"; lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicFilterFactory.java: public static final String NAME = "classic"; lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerFactory.java: public static final String NAME = "classic"; lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.java: public static final String NAME = "edgeNGram"; lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java: public static final String NAME = "edgeNGram"; lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java: public static final String NAME = "nGram"; lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramFilterFactory.java: public static final String NAME = "nGram"; lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUNormalizer2FilterFactory.java: public static final String NAME = "icuNormalizer2"; lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUNormalizer2CharFilterFactory.java: public static final String NAME = "icuNormalizer2"; lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.java: public static final String NAME = "patternReplace"; lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.java: public static final String NAME = "patternReplace"; {noformat} > schema short class name support can use factories w/different names then > specified name > --------------------------------------------------------------------------------------- > > Key: SOLR-16930 > URL: https://issues.apache.org/jira/browse/SOLR-16930 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > > I recently encountered a schema "in the wild" that had a fieldType that > looked roughly like this... > {noformat} > <fieldType autoGeneratePhraseQueries="true" class="solr.TextField" > name="edgengram" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EdgeNGramTokenizerFactory" maxGramSize="25" > minGramSize="4"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > {noformat} > > I was going to explain to the user that this wouldn't work, because they were > trying to configure {{solr.EdgeNGramTokenizerFactory}} as a token > {_}filter{_}, but it's a _tokenizer_ – and that they needed to use > {{{}solr.EdgeNGramTokenFilterFactory{}}}. > But then I realized there schema loaded just fine, and did exactly what they > expected. Which made no sense to me. > Experimentation using the {{/analysis/field}} request handler confirmed that > – somehow – they were getting an > {{org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter}} instance. > ---- > I have not dug into this code, but I _suspect_ what's happening, is that the > logic for resolving {{solr.FooClassName}} "short" classnames is finding the > class with the name {{FooClassName}} and then checking what it's SPI name is > _with out checking if it implements the expected API_ and then using that SPI > name to actually create an instance of the factory. > So the resolution of {{solr.EdgeNGramTokenizerFactory}} finds > {{org.apache.lucene.analysis.ngram.EdgeNGramTokenizerFactory}} which has an > SPI name of {{edgeNGram}} which when resolved _in the context of a looking > for a TokenFilterFactory_ returns > {{org.apache.lucene.analysis.ngram.EdgeNGramFilterFactory}} because both > class have the *SAME* SPI name (but for different APIs) > ---- > I know we've moved away from suggesting the {{solr.FooClassName}} short > classname syntax (and will probably remove it completely at some point) in > favor of using the SPI registration names -- so maybe this isn't worth > worrying about, but it sure confused the hell out of me, and will likely > confuse the hell out of someone else at some point as well (hence i'm > creating a jira in case it helps anyone else confused about this) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org