[ 
https://issues.apache.org/jira/browse/LUCENE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848225#comment-16848225
 ] 

Tomoko Uchida edited comment on LUCENE-8778 at 5/25/19 10:03 PM:
-----------------------------------------------------------------

I updated the pull request.
 * Service lookup is performed on the case-insensitive map keys (as before). 
Preserve original names in the auxiliary Set for reference. Also add a check to 
make sure that the size of the lookup map and the original name set.
 * Restrict characters that can be used in the SPI names: only allow alphabets, 
digits, and underscores. (The last one is added for possible future uses.)
 * Document about case-insensitive lookup in each Javadoc tag (I took a 
screenshot). It's a bit redundant but at least they are not likely to be 
overlooked.

!Screenshot from 2019-05-25 23-25-24.png!

I would like to delay allowing "multiple names" or "aliases", because I don't 
want to implement a feature this could never be used. If Elasticsearch team or 
someone else is interested in using the analysis service loader, I think the 
modification is easy and we can work together then.

Can you please review the last changes in the service loader class? Here are 
the diff: 
-[bf6fc2b|https://github.com/apache/lucene-solr/pull/654/commits/bf6fc2b4cc3db2848e2f79cfbb1fa917a834cf06],
 
[dab1f5a|https://github.com/apache/lucene-solr/pull/654/commits/dab1f5a9a8cd36ead1272ee99ef51200600a3b3b]-


was (Author: tomoko uchida):
I updated the pull request.
 * Service lookup is performed on the case-insensitive map keys (as before). 
Preserve original names in the auxiliary Set for reference. Also add a check to 
make sure that the size of the lookup map and the original name set.
 * Restrict characters that can be used in the SPI names: only allow alphabets, 
digits, and underscores. (The last one is added for possible future uses.)
 * Document about case-insensitive lookup in each Javadoc tag (I took a 
screenshot). It's a bit redundant but at least they are not likely to be 
overlooked.

!Screenshot from 2019-05-25 23-25-24.png!

I would like to delay allowing "multiple names" or "aliases", because I don't 
want to implement a feature this could never be used. If Elasticsearch team or 
someone else is interested in using the analysis service loader, I think the 
modification is easy and we can work together then.

Can you please review the last changes in the service loader class? Here are 
the diff: 
[bf6fc2b|https://github.com/apache/lucene-solr/pull/654/commits/bf6fc2b4cc3db2848e2f79cfbb1fa917a834cf06],
 
[dab1f5a|https://github.com/apache/lucene-solr/pull/654/commits/dab1f5a9a8cd36ead1272ee99ef51200600a3b3b]

> Define analyzer SPI names as static final fields and document the names in 
> Javadocs
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-8778
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8778
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Tomoko Uchida
>            Priority: Minor
>         Attachments: ListAnalysisComponents.java, SPINamesGenerator.java, 
> Screenshot from 2019-04-26 02-17-48.png, Screenshot from 2019-05-25 
> 23-25-24.png, TestSPINames.java
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Each built-in analysis component (factory of tokenizer / char filter / token 
> filter)  has a SPI name but currently this is not  documented anywhere.
> The goals of this issue:
>  * Define SPI names as static final field for each analysis component so that 
> users can get the component by name (via {{NAME}} static field.) This also 
> provides compile time safety.
>  * Officially document the SPI names in Javadocs.
>  * Add proper source validation rules to ant {{validate-source-patterns}} 
> target so that we can make sure that all analysis components have correct 
> field definitions and documentation
> and,
>  * Lookup SPI names on the new {{NAME}} fields. Instead deriving those from 
> class names.
> (Just for quick reference) we now have:
>  * *19* Tokenizers ({{TokenizerFactory.availableTokenizers()}})
>  * *6* CharFilters ({{CharFilterFactory.availableCharFilters()}})
>  * *118* TokenFilters ({{TokenFilterFactory.availableTokenFilters()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to