[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153131#comment-15153131 ] Steve Rowe commented on LUCENE-6993: bq. JFlex 1.6.1 currently only supports Unicode 7.0, not 8.0 - Steve Rowe, do you know what the jflex timeline for upgrading looks like? Unicode 8.0 support is committed on JFlex master, but no release includes it yet. (So if you want to test I think you could build JFlex locally, change the JFlex dependency in Lucene to use the snapshot, then run the Lucene build.) No timeline for release has been set. I'll ping JFlex founder Gerwin Klein, who has done all the releases, and get back to you here. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch, > LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153017#comment-15153017 ] Robert Muir commented on LUCENE-6993: - I can't even imagine us releasing a 5.6 after a 6.0, I really do not think we should drag that idea into this issue. Its a bad one. Lets target 6.0 here for all this stuff: these are major changes that impact backwards compatibility. The logic should be: {code} if (version.onOrAfter(LUCENE_6_0_0)) { // new tokenizer } else { // old tokenizer } {code} > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151606#comment-15151606 ] Robert Muir commented on LUCENE-6993: - I took care of the icu parts here: LUCENE-7035 please ping me here if you have trouble setting up the back compat. I can always do that part, if it gets too frustrating. But it is better if more people can do it. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch, LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151461#comment-15151461 ] Robert Muir commented on LUCENE-6993: - And i guess really we should call it {{std50}} to keep things simple. if someone asks for 5.4 compatibility, they should get this one and then the logic in the Analyzer will be clear that is the case even going forward. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch, LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151457#comment-15151457 ] Robert Muir commented on LUCENE-6993: - Basically the old versions of the Tokenizer and Impl are just "saved" to a subdirectory, and in the Analyzer and TokenizerFactory we conditionally use them, if you request that compatibility version. Have a look at branch_5x which still has {{std40}} containing StandardTokenizer40, StandardTokenizerImpl40, UAX29URLEmailTokenizer40, and so on. TestStandardAnalyzer and TestUAX29URLEmailAnalyzer also have a testBackcompat40 which calls {{setVersion}} and ensures it works. Finally, see StandardAnalyzer/TokenizerFactory.java, and UAXURLEmailAnalyzer/TokenizerFactory.java which conditionally use StandardTokenizer40 depending on version. So we should do a similar thing with the current stuff in master before modifying the files, and make them {{std55}}. We can just test that it works at all (e.g. foo bar -> foo,bar) initially and later maybe add a test ensuring "old behavior" stays the same. Then you can bump unicode version and tld lists and it won't change any behavior if someone asks for version < 6.0, because they will get the exact same tokenizer as before. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch, LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151375#comment-15151375 ] Robert Muir commented on LUCENE-6993: - OK, I can look into the icu part in a separate issue, since its somewhat unrelated but I think worthwhile for consistency. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151363#comment-15151363 ] Mike Drob commented on LUCENE-6993: --- That all makes sense. I was looking at the unicode spec changes between 6.3 and 8.0 and did not really understand what the impact to our grammars is. I'll add the current grammar to a std55 directory, but will need some help making sure that I've got all the right back-compat hooks. I'll post an updated patch shortly when I get stuck. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob >Assignee: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151332#comment-15151332 ] Robert Muir commented on LUCENE-6993: - I with a major release looming we should update all this stuff. Also the unicode version (and icu library) to Unicode 8.0 because java has already done this for JDK 9 (http://openjdk.java.net/jeps/267), and we should not fall so far behind. We should copy the current generated grammar with a 'std55' subdirectory and hook it in for backwards compatibility before applying grammar changes. Then I think just fix all this stuff at once? It sounds worse than it is, I think it can be done today, I will help. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob > Fix For: 6.0 > > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151296#comment-15151296 ] Mike Drob commented on LUCENE-6993: --- [~rcmuir] - Do you have any thoughts on this since you were involved in the previous patch too? > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143997#comment-15143997 ] Mike Drob commented on LUCENE-6993: --- Hi Steve - do you have any updates or would you like me to ping somebody else? Thanks! > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126654#comment-15126654 ] Steve Rowe commented on LUCENE-6993: Hi [~mdrob], sure, I'll try to look at it some time this week. > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6993) Update TLDs to latest list
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126636#comment-15126636 ] Mike Drob commented on LUCENE-6993: --- [~steve_rowe] - you did the previous incarnation of this fix, do you have time to look at this one? > Update TLDs to latest list > -- > > Key: LUCENE-6993 > URL: https://issues.apache.org/jira/browse/LUCENE-6993 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Mike Drob > Attachments: LUCENE-6993.patch > > > We did this once before in LUCENE-5357, but it might be time to update the > list of TLDs again. Comparing our old list with a new list indicates 800+ new > domains, so it would be nice to include them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org