[ 
https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794964#action_12794964
 ] 

Robert Muir commented on LUCENE-2183:
-------------------------------------

I thought about this some, but i am worried about one thing:

Consider LetterTokenizer, which is non-final subclass of CharTokenizer.
Lets say you make LetterAndNumberTokenizer which extends LetterTokenizer, but 
you do not implement the int-based method.

{code}
public boolean isTokenChar(char c) {
  return super.isTokenChar(c) || Character.isNumber(c);
}
{code}

we have fixed LetterTokenizer so it has isTokenChar(int), but that means if 
someone tries to use this LettterAndNumberTokenizer with Version.LUCENE_31, it 
will not work, because it will not throw UOE, and silently discard numbers...  
since it will call the LetterTokenizer int-based method.

of course it will work correctly with Version.LUCENE_30, so it is not a back 
compat problem, but it will not throw UOE and silently behave incorrectly for 
LUCENE_31 until the 'int' method is implemented.

so i think this is a problem in this design, and i do not know how to fix 
without reflection.

> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>
>                 Key: LUCENE-2183
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2183
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Simon Willnauer
>             Fix For: 3.1
>
>         Attachments: LUCENE-2183.patch
>
>
> CharTokenizer is an abstract base class for all Tokenizers operating on a 
> character level. Yet, those tokenizers still use char primitives instead of 
> int codepoints. CharTokenizer should operate on codepoints and preserve bw 
> compatibility. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to