[ 
https://issues.apache.org/jira/browse/LUCENE-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782479#action_12782479
 ] 

Robert Muir commented on LUCENE-2090:
-------------------------------------

bq. I'd actually rather lock it down for now, and then only open up flexibility 
when/if we get there... patch looks good!

Ok, I will commit it.

Just as a side note, maybe i can add a comment if you need it... the existing 
startsWith(), and now the new endsWith() are correct against byte[] for any 
Unicode encoding form.
However, some other encodings (including alternate encodings someone might flex 
to), do not have the properties of non-overlap, etc.

if someone was to implement a codec to store the index in one of those other 
encodings, they would have to write significantly more complex code that is 
aware of character boundaries, depending upon the properties of said encoding.
oh yeah, and their sort order would be different, too... (I suppose we should 
also fix compareTerm here for UTF-16 ordering at some point?)


> convert automaton to char[] based processing and TermRef / TermsEnum api
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-2090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2090
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2090_TermRef_flex.patch, 
> LUCENE-2090_TermRef_flex2.patch, LUCENE-2090_TermRef_flex3.patch
>
>
> The automaton processing is currently done with String, mostly because 
> TermEnum is based on String.
> it is easy to change the processing to work with char[], since behind the 
> scenes this is used anyway.
> in general I think we should make sure char[] based processing is exposed in 
> the automaton pkg anyway, for things like pattern-based tokenizers and such.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to