[ https://issues.apache.org/jira/browse/LUCENE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848225#comment-16848225 ]
Tomoko Uchida edited comment on LUCENE-8778 at 5/25/19 10:03 PM: ----------------------------------------------------------------- I updated the pull request. * Service lookup is performed on the case-insensitive map keys (as before). Preserve original names in the auxiliary Set for reference. Also add a check to make sure that the size of the lookup map and the original name set. * Restrict characters that can be used in the SPI names: only allow alphabets, digits, and underscores. (The last one is added for possible future uses.) * Document about case-insensitive lookup in each Javadoc tag (I took a screenshot). It's a bit redundant but at least they are not likely to be overlooked. !Screenshot from 2019-05-25 23-25-24.png! I would like to delay allowing "multiple names" or "aliases", because I don't want to implement a feature this could never be used. If Elasticsearch team or someone else is interested in using the analysis service loader, I think the modification is easy and we can work together then. Can you please review the last changes in the service loader class? Here are the diff: -[bf6fc2b|https://github.com/apache/lucene-solr/pull/654/commits/bf6fc2b4cc3db2848e2f79cfbb1fa917a834cf06], [dab1f5a|https://github.com/apache/lucene-solr/pull/654/commits/dab1f5a9a8cd36ead1272ee99ef51200600a3b3b]- was (Author: tomoko uchida): I updated the pull request. * Service lookup is performed on the case-insensitive map keys (as before). Preserve original names in the auxiliary Set for reference. Also add a check to make sure that the size of the lookup map and the original name set. * Restrict characters that can be used in the SPI names: only allow alphabets, digits, and underscores. (The last one is added for possible future uses.) * Document about case-insensitive lookup in each Javadoc tag (I took a screenshot). It's a bit redundant but at least they are not likely to be overlooked. !Screenshot from 2019-05-25 23-25-24.png! I would like to delay allowing "multiple names" or "aliases", because I don't want to implement a feature this could never be used. If Elasticsearch team or someone else is interested in using the analysis service loader, I think the modification is easy and we can work together then. Can you please review the last changes in the service loader class? Here are the diff: [bf6fc2b|https://github.com/apache/lucene-solr/pull/654/commits/bf6fc2b4cc3db2848e2f79cfbb1fa917a834cf06], [dab1f5a|https://github.com/apache/lucene-solr/pull/654/commits/dab1f5a9a8cd36ead1272ee99ef51200600a3b3b] > Define analyzer SPI names as static final fields and document the names in > Javadocs > ----------------------------------------------------------------------------------- > > Key: LUCENE-8778 > URL: https://issues.apache.org/jira/browse/LUCENE-8778 > Project: Lucene - Core > Issue Type: Task > Components: modules/analysis > Reporter: Tomoko Uchida > Priority: Minor > Attachments: ListAnalysisComponents.java, SPINamesGenerator.java, > Screenshot from 2019-04-26 02-17-48.png, Screenshot from 2019-05-25 > 23-25-24.png, TestSPINames.java > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Each built-in analysis component (factory of tokenizer / char filter / token > filter) has a SPI name but currently this is not documented anywhere. > The goals of this issue: > * Define SPI names as static final field for each analysis component so that > users can get the component by name (via {{NAME}} static field.) This also > provides compile time safety. > * Officially document the SPI names in Javadocs. > * Add proper source validation rules to ant {{validate-source-patterns}} > target so that we can make sure that all analysis components have correct > field definitions and documentation > and, > * Lookup SPI names on the new {{NAME}} fields. Instead deriving those from > class names. > (Just for quick reference) we now have: > * *19* Tokenizers ({{TokenizerFactory.availableTokenizers()}}) > * *6* CharFilters ({{CharFilterFactory.availableCharFilters()}}) > * *118* TokenFilters ({{TokenFilterFactory.availableTokenFilters()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org