[jira] [Commented] (SOLR-16930) schema short class name support can use factories w/different names then specified name

Chris M. Hostetter (Jira) Fri, 11 Aug 2023 15:29:04 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-16930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753370#comment-17753370
 ]


Chris M. Hostetter commented on SOLR-16930:
-------------------------------------------

As of today, here are all the SPI names I could find in lucene {{branch_9x}} 
that are used by more then on class...

 
{noformat}
lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKWidthCharFilterFactory.java:
  public static final String NAME = "cjkWidth";
lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKWidthFilterFactory.java:
  public static final String NAME = "cjkWidth";

lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicFilterFactory.java:
  public static final String NAME = "classic";
lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerFactory.java:
  public static final String NAME = "classic";

lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.java:
  public static final String NAME = "edgeNGram";
lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java:
  public static final String NAME = "edgeNGram";

lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java:
  public static final String NAME = "nGram";
lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramFilterFactory.java:
  public static final String NAME = "nGram";

lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUNormalizer2FilterFactory.java:
  public static final String NAME = "icuNormalizer2";
lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUNormalizer2CharFilterFactory.java:
  public static final String NAME = "icuNormalizer2";

lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.java:
  public static final String NAME = "patternReplace";
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.java:
  public static final String NAME = "patternReplace";
{noformat}
 

> schema short class name support can use factories w/different names then 
> specified name
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-16930
>                 URL: https://issues.apache.org/jira/browse/SOLR-16930
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> I recently encountered a schema "in the wild" that had a fieldType that 
> looked roughly like this...
> {noformat}
>   <fieldType autoGeneratePhraseQueries="true" class="solr.TextField" 
> name="edgengram" positionIncrementGap="100">
>     <analyzer type="index">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EdgeNGramTokenizerFactory" maxGramSize="25" 
> minGramSize="4"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> {noformat}
>  
> I was going to explain to the user that this wouldn't work, because they were 
> trying to configure {{solr.EdgeNGramTokenizerFactory}} as a token 
> {_}filter{_}, but it's a _tokenizer_ – and that they needed to use 
> {{{}solr.EdgeNGramTokenFilterFactory{}}}.
> But then I realized there schema loaded just fine, and did exactly what they 
> expected. Which made no sense to me.
> Experimentation using the {{/analysis/field}} request handler confirmed that 
> – somehow – they were getting an 
> {{org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter}} instance.
> ----
> I have not dug into this code, but I _suspect_ what's happening, is that the 
> logic for resolving {{solr.FooClassName}} "short" classnames is finding the 
> class with the name {{FooClassName}} and then checking what it's SPI name is 
> _with out checking if it implements the expected API_ and then using that SPI 
> name to actually create an instance of the factory.
> So the resolution of {{solr.EdgeNGramTokenizerFactory}} finds 
> {{org.apache.lucene.analysis.ngram.EdgeNGramTokenizerFactory}} which has an 
> SPI name of {{edgeNGram}} which when resolved _in the context of a looking 
> for a TokenFilterFactory_ returns 
> {{org.apache.lucene.analysis.ngram.EdgeNGramFilterFactory}} because both 
> class have the *SAME* SPI name (but for different APIs)
>  ----
> I know we've moved away from suggesting the {{solr.FooClassName}} short 
> classname syntax (and will probably remove it completely at some point) in 
> favor of using the SPI registration names -- so maybe this isn't worth 
> worrying about, but it sure confused the hell out of me, and will likely 
> confuse the hell out of someone else at some point as well (hence i'm 
> creating a jira in case it helps anyone else confused about this)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16930) schema short class name support can use factories w/different names then specified name

Reply via email to