[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794886#action_12794886 ]
Robert Muir commented on LUCENE-2183: ------------------------------------- Hello Simon, another option very similar to yours (I am not sure if it really would work, but just thinking out loud somewhat) could be: {code} /** this method will be declared abstract in Lucene 4.0 */ public int isTokenChar(int ch) { throw UOE(); } /** @deprecated will be removed in Lucene 5.0 */ public int isTokenChar(char ch) { return isTokenChar((int)ch); } {code} and do the same for normalize(). The rest would be the same as your patch: * Use CharacterUtils for io-buffering * Use CharacterUtils for character/codepoint iteration. * Use Version to decide which method to call instead of reflection: this should not be conditional upon each call to isTokenChar() but instead two private inner classes or whatever. The difference would be that the api would appear more natural in my opinion, and once deprecations are removed we would end out with an abstract class with the int-equivalent of what we have now. If someone attempts to use a CharTokenizer that does *not* support int-based methods (only implements the char-based methods) with Version.LUCENE_31 then this would throw UOE, which in my opinion is correct, as it does not support the behavior of that version. > Supplementary Character Handling in CharTokenizer > ------------------------------------------------- > > Key: LUCENE-2183 > URL: https://issues.apache.org/jira/browse/LUCENE-2183 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Simon Willnauer > Fix For: 3.1 > > Attachments: LUCENE-2183.patch > > > CharTokenizer is an abstract base class for all Tokenizers operating on a > character level. Yet, those tokenizers still use char primitives instead of > int codepoints. CharTokenizer should operate on codepoints and preserve bw > compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org