[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151457#comment-15151457
 ] 

Robert Muir commented on LUCENE-6993:
-------------------------------------

Basically the old versions of the Tokenizer and Impl are just "saved" to a 
subdirectory, and in the Analyzer and TokenizerFactory we conditionally use 
them, if you request that compatibility version.

Have a look at branch_5x which still has {{std40}} containing 
StandardTokenizer40, StandardTokenizerImpl40, UAX29URLEmailTokenizer40, and so 
on. TestStandardAnalyzer and TestUAX29URLEmailAnalyzer also have a 
testBackcompat40 which calls {{setVersion}} and ensures it works. Finally, see 
StandardAnalyzer/TokenizerFactory.java, and 
UAXURLEmailAnalyzer/TokenizerFactory.java which conditionally use 
StandardTokenizer40 depending on version.

So we should do a similar thing with the current stuff in master before 
modifying the files, and make them {{std55}}. We can just test that it works at 
all (e.g. foo bar -> foo,bar) initially and later maybe add a test ensuring 
"old behavior" stays the same.

Then you can bump unicode version and tld lists and it won't change any 
behavior if someone asks for version < 6.0, because they will get the exact 
same tokenizer as before.

> Update TLDs to latest list
> --------------------------
>
>                 Key: LUCENE-6993
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6993
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Mike Drob
>            Assignee: Robert Muir
>             Fix For: 6.0
>
>         Attachments: LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to