[jira] Commented: (LUCENE-2911) synchronize grammar/token types across StandardTokenizer, UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.

2011-02-08 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992014#comment-12992014
 ] 

Steven Rowe commented on LUCENE-2911:
-

The generated top-level domain macro file has a bunch of new entries when I run 
this, but these are not included in your patch, and I think we should keep this 
list up-to-date.

The patch is missing HangulSupp macro generation in 
modules/icu/src/tools/.../GenerateJFlexSupplementaryMacros.java, but since the 
Hangul macro is not used in the jflex grammar, this doesn't cause a problem.

It would be nice to remove the hard-coded ranges for the intersection of Hangul 
& ALetter, but when I tried to use JFlex negation and union to produce the 
equivalent, memory usage exploded and I couldn't get JFlex to generate, so I 
guess we'll have to wait on native JFlex supplementary character support before 
we can change it.


> synchronize grammar/token types across StandardTokenizer, 
> UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.
> --
>
> Key: LUCENE-2911
> URL: https://issues.apache.org/jira/browse/LUCENE-2911
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2911.patch
>
>
> I'd like to do LUCENE-2906 (better cjk support for these tokenizers) for a 
> future target such as 3.2
> But, in 3.1 I would like to do a little cleanup first, and synchronize all 
> these token types, etc.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2911) synchronize grammar/token types across StandardTokenizer, UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.

2011-02-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992066#comment-12992066
 ] 

Robert Muir commented on LUCENE-2911:
-

{quote}
The generated top-level domain macro file has a bunch of new entries when I run 
this, but these are not included in your patch, and I think we should keep this 
list up-to-date.
{quote}

Yeah, i would re-run it before committing? in general i didn't "re-generate" so 
you wouldnt see a lot of generated differences in the patch.

{quote}
The patch is missing HangulSupp macro generation in 
modules/icu/src/tools/.../GenerateJFlexSupplementaryMacros.java, but since the 
Hangul macro is not used in the jflex grammar, this doesn't cause a problem.
{quote}

Oh i did actually mean to include this, sorry I forgot... its a one liner 
though, I can include this easily.


> synchronize grammar/token types across StandardTokenizer, 
> UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.
> --
>
> Key: LUCENE-2911
> URL: https://issues.apache.org/jira/browse/LUCENE-2911
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2911.patch
>
>
> I'd like to do LUCENE-2906 (better cjk support for these tokenizers) for a 
> future target such as 3.2
> But, in 3.1 I would like to do a little cleanup first, and synchronize all 
> these token types, etc.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2911) synchronize grammar/token types across StandardTokenizer, UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.

2011-02-09 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992531#comment-12992531
 ] 

Steven Rowe commented on LUCENE-2911:
-

bq. I think this one is ready to commit.

+1 

I applied the patch, jflex generates properly, tests pass

> synchronize grammar/token types across StandardTokenizer, 
> UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.
> --
>
> Key: LUCENE-2911
> URL: https://issues.apache.org/jira/browse/LUCENE-2911
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2911.patch, LUCENE-2911.patch
>
>
> I'd like to do LUCENE-2906 (better cjk support for these tokenizers) for a 
> future target such as 3.2
> But, in 3.1 I would like to do a little cleanup first, and synchronize all 
> these token types, etc.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2911) synchronize grammar/token types across StandardTokenizer, UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.

2011-02-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992601#comment-12992601
 ] 

Robert Muir commented on LUCENE-2911:
-

Committed revision 1068979. Now backporting...

> synchronize grammar/token types across StandardTokenizer, 
> UAX29EmailURLTokenizer, ICUTokenizer, add CJK types.
> --
>
> Key: LUCENE-2911
> URL: https://issues.apache.org/jira/browse/LUCENE-2911
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2911.patch, LUCENE-2911.patch
>
>
> I'd like to do LUCENE-2906 (better cjk support for these tokenizers) for a 
> future target such as 3.2
> But, in 3.1 I would like to do a little cleanup first, and synchronize all 
> these token types, etc.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org