[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-18 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153131#comment-15153131
 ] 

Steve Rowe commented on LUCENE-6993:


bq. JFlex 1.6.1 currently only supports Unicode 7.0, not 8.0 - Steve Rowe, do 
you know what the jflex timeline for upgrading looks like?

Unicode 8.0 support is committed on JFlex master, but no release includes it 
yet. (So if you want to test I think you could build JFlex locally, change the 
JFlex dependency in Lucene to use the snapshot, then run the Lucene build.) No 
timeline for release has been set.  I'll ping JFlex founder Gerwin Klein, who 
has done all the releases, and get back to you here.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch, 
> LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153017#comment-15153017
 ] 

Robert Muir commented on LUCENE-6993:
-

I can't even imagine us releasing a 5.6 after a 6.0, I really do not think we 
should drag that idea into this issue. Its a bad one.

Lets target 6.0 here for all this stuff: these are major changes that impact 
backwards compatibility. The logic should be:

{code}
if (version.onOrAfter(LUCENE_6_0_0)) {
  // new tokenizer
} else {
  // old tokenizer
}
{code}

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151606#comment-15151606
 ] 

Robert Muir commented on LUCENE-6993:
-

I took care of the icu parts here: LUCENE-7035

please ping me here if you have trouble setting up the back compat. I can 
always do that part, if it gets too frustrating. But it is better if more 
people can do it.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151461#comment-15151461
 ] 

Robert Muir commented on LUCENE-6993:
-

And i guess really we should call it {{std50}} to keep things simple. if 
someone asks for 5.4 compatibility, they should get this one and then the logic 
in the Analyzer will be clear that is the case even going forward.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151457#comment-15151457
 ] 

Robert Muir commented on LUCENE-6993:
-

Basically the old versions of the Tokenizer and Impl are just "saved" to a 
subdirectory, and in the Analyzer and TokenizerFactory we conditionally use 
them, if you request that compatibility version.

Have a look at branch_5x which still has {{std40}} containing 
StandardTokenizer40, StandardTokenizerImpl40, UAX29URLEmailTokenizer40, and so 
on. TestStandardAnalyzer and TestUAX29URLEmailAnalyzer also have a 
testBackcompat40 which calls {{setVersion}} and ensures it works. Finally, see 
StandardAnalyzer/TokenizerFactory.java, and 
UAXURLEmailAnalyzer/TokenizerFactory.java which conditionally use 
StandardTokenizer40 depending on version.

So we should do a similar thing with the current stuff in master before 
modifying the files, and make them {{std55}}. We can just test that it works at 
all (e.g. foo bar -> foo,bar) initially and later maybe add a test ensuring 
"old behavior" stays the same.

Then you can bump unicode version and tld lists and it won't change any 
behavior if someone asks for version < 6.0, because they will get the exact 
same tokenizer as before.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151375#comment-15151375
 ] 

Robert Muir commented on LUCENE-6993:
-

OK, I can look into the icu part in a separate issue, since its somewhat 
unrelated but I think worthwhile for consistency.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151363#comment-15151363
 ] 

Mike Drob commented on LUCENE-6993:
---

That all makes sense. I was looking at the unicode spec changes between 6.3 and 
8.0 and did not really understand what the impact to our grammars is.

I'll add the current grammar to a std55 directory, but will need some help 
making sure that I've got all the right back-compat hooks. I'll post an updated 
patch shortly when I get stuck.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Robert Muir
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151332#comment-15151332
 ] 

Robert Muir commented on LUCENE-6993:
-

I with a major release looming we should update all this stuff. Also the 
unicode version (and icu library) to Unicode 8.0 because java has already done 
this for JDK 9 (http://openjdk.java.net/jeps/267), and we should not fall so 
far behind. 

We should copy the current generated grammar with a 'std55' subdirectory and 
hook it in for backwards compatibility before applying grammar changes. Then I 
think just fix all this stuff at once? It sounds worse than it is, I think it 
can be done today, I will help.



> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
> Fix For: 6.0
>
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-17 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151296#comment-15151296
 ] 

Mike Drob commented on LUCENE-6993:
---

[~rcmuir] - Do you have any thoughts on this since you were involved in the 
previous patch too?

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-11 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143997#comment-15143997
 ] 

Mike Drob commented on LUCENE-6993:
---

Hi Steve - do you have any updates or would you like me to ping somebody else? 
Thanks!

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-01 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126654#comment-15126654
 ] 

Steve Rowe commented on LUCENE-6993:


Hi [~mdrob], sure, I'll try to look at it some time this week.

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6993) Update TLDs to latest list

2016-02-01 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126636#comment-15126636
 ] 

Mike Drob commented on LUCENE-6993:
---

[~steve_rowe] - you did the previous incarnation of this fix, do you have time 
to look at this one?

> Update TLDs to latest list
> --
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
> Attachments: LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org