[ 
https://issues.apache.org/jira/browse/LUCENE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847486#comment-16847486
 ] 

Tomoko Uchida commented on LUCENE-8778:
---------------------------------------

Hi [~thetaphi],

I did a regression test and fixed incorrect SPI names (they had been mistakenly 
copypateted in previous commits).
 # List SPI names and their class names of all analysis components with master 
branch. ([^ListAnalysisComponents.java])
 # Make sure that all components can be looked up by (old) SPI names with my 
branch (pull request). ([^TestSPINames.java])

Also I modified {{AnalysisSPILoader}} to preserve service names' letter casing. 
Now documented SPI names are camel cased, so it would be better that we 
preserve original names as is. Instead of lowercasing when registering the 
names, we can perform case-insensitive lookup. Because the service map is 
small, I guess the performance degredation will not matter much in this case 
(I'm not quite sure, but there might be better ways?). 
([diff|https://github.com/apache/lucene-solr/pull/654/commits/fc903379b0a53b690adf1c1ca5843b92444895ec])

This branch passed ant test & precommit.

> Define analyzer SPI names as static final fields and document the names in 
> Javadocs
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-8778
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8778
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Tomoko Uchida
>            Priority: Minor
>         Attachments: ListAnalysisComponents.java, SPINamesGenerator.java, 
> Screenshot from 2019-04-26 02-17-48.png, TestSPINames.java
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Each built-in analysis component (factory of tokenizer / char filter / token 
> filter)  has a SPI name but currently this is not  documented anywhere.
> The goals of this issue:
>  * Define SPI names as static final field for each analysis component so that 
> users can get the component by name (via {{NAME}} static field.) This also 
> provides compile time safety.
>  * Officially document the SPI names in Javadocs.
>  * Add proper source validation rules to ant {{validate-source-patterns}} 
> target so that we can make sure that all analysis components have correct 
> field definitions and documentation
> and,
>  * Lookup SPI names on the new {{NAME}} fields. Instead deriving those from 
> class names.
> (Just for quick reference) we now have:
>  * *19* Tokenizers ({{TokenizerFactory.availableTokenizers()}})
>  * *6* CharFilters ({{CharFilterFactory.availableCharFilters()}})
>  * *118* TokenFilters ({{TokenFilterFactory.availableTokenFilters()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to