[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151457#comment-15151457 ]
Robert Muir commented on LUCENE-6993: ------------------------------------- Basically the old versions of the Tokenizer and Impl are just "saved" to a subdirectory, and in the Analyzer and TokenizerFactory we conditionally use them, if you request that compatibility version. Have a look at branch_5x which still has {{std40}} containing StandardTokenizer40, StandardTokenizerImpl40, UAX29URLEmailTokenizer40, and so on. TestStandardAnalyzer and TestUAX29URLEmailAnalyzer also have a testBackcompat40 which calls {{setVersion}} and ensures it works. Finally, see StandardAnalyzer/TokenizerFactory.java, and UAXURLEmailAnalyzer/TokenizerFactory.java which conditionally use StandardTokenizer40 depending on version. So we should do a similar thing with the current stuff in master before modifying the files, and make them {{std55}}. We can just test that it works at all (e.g. foo bar -> foo,bar) initially and later maybe add a test ensuring "old behavior" stays the same. Then you can bump unicode version and tld lists and it won't change any behavior if someone asks for version < 6.0, because they will get the exact same tokenizer as before. > Update TLDs to latest list > -------------------------- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Mike Drob > Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch, LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org