Chris M. Hostetter created SOLR-16930:
-----------------------------------------

             Summary: schema short class name support can use factories 
w/different names then specified name
                 Key: SOLR-16930
                 URL: https://issues.apache.org/jira/browse/SOLR-16930
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Chris M. Hostetter


I recently encountered a schema "in the wild" that had a fieldType that looked 
roughly like this...
{noformat}
  <fieldType autoGeneratePhraseQueries="true" class="solr.TextField" 
name="edgengram" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramTokenizerFactory" maxGramSize="25" 
minGramSize="4"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
{noformat}
 

I was going to explain to the user that this wouldn't work, because they were 
trying to configure {{solr.EdgeNGramTokenizerFactory}} as a token {_}filter{_}, 
but it's a _tokenizer_ – and that they needed to use 
{{{}solr.EdgeNGramTokenFilterFactory{}}}.

But then I realized there schema loaded just fine, and did exactly what they 
expected. Which made no sense to me.

Experimentation using the {{/analysis/field}} request handler confirmed that – 
somehow – they were getting an 
{{org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter}} instance.

----

I have not dug into this code, but I _suspect_ what's happening, is that the 
logic for resolving {{solr.FooClassName}} "short" classnames is finding the 
class with the name {{FooClassName}} and then checking what it's SPI name is 
_with out checking if it implements the expected API_ and then using that SPI 
name to actually create an instance of the factory.

So the resolution of {{solr.EdgeNGramTokenizerFactory}} finds 
{{org.apache.lucene.analysis.ngram.EdgeNGramTokenizerFactory}} which has an SPI 
name of {{edgeNGram}} which when resolved _in the context of a looking for a 
TokenFilterFactory_ returns 
{{org.apache.lucene.analysis.ngram.EdgeNGramFilterFactory}} because both class 
have the *SAME* SPI name (but for different APIs)

 ----

I know we've moved away from suggesting the {{solr.FooClassName}} short 
classname syntax (and will probably remove it completely at some point) in 
favor of using the SPI registration names -- so maybe this isn't worth worrying 
about, but it sure confused the hell out of me, and will likely confuse the 
hell out of someone else at some point as well (hence i'm creating a jira in 
case it helps anyone else confused about this)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to