[
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151457#comment-15151457
]
Robert Muir commented on LUCENE-6993:
-------------------------------------
Basically the old versions of the Tokenizer and Impl are just "saved" to a
subdirectory, and in the Analyzer and TokenizerFactory we conditionally use
them, if you request that compatibility version.
Have a look at branch_5x which still has {{std40}} containing
StandardTokenizer40, StandardTokenizerImpl40, UAX29URLEmailTokenizer40, and so
on. TestStandardAnalyzer and TestUAX29URLEmailAnalyzer also have a
testBackcompat40 which calls {{setVersion}} and ensures it works. Finally, see
StandardAnalyzer/TokenizerFactory.java, and
UAXURLEmailAnalyzer/TokenizerFactory.java which conditionally use
StandardTokenizer40 depending on version.
So we should do a similar thing with the current stuff in master before
modifying the files, and make them {{std55}}. We can just test that it works at
all (e.g. foo bar -> foo,bar) initially and later maybe add a test ensuring
"old behavior" stays the same.
Then you can bump unicode version and tld lists and it won't change any
behavior if someone asks for version < 6.0, because they will get the exact
same tokenizer as before.
> Update TLDs to latest list
> --------------------------
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Mike Drob
> Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the
> list of TLDs again. Comparing our old list with a new list indicates 800+ new
> domains, so it would be nice to include them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]