[
https://issues.apache.org/jira/browse/LUCENE-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782479#action_12782479
]
Robert Muir commented on LUCENE-2090:
-------------------------------------
bq. I'd actually rather lock it down for now, and then only open up flexibility
when/if we get there... patch looks good!
Ok, I will commit it.
Just as a side note, maybe i can add a comment if you need it... the existing
startsWith(), and now the new endsWith() are correct against byte[] for any
Unicode encoding form.
However, some other encodings (including alternate encodings someone might flex
to), do not have the properties of non-overlap, etc.
if someone was to implement a codec to store the index in one of those other
encodings, they would have to write significantly more complex code that is
aware of character boundaries, depending upon the properties of said encoding.
oh yeah, and their sort order would be different, too... (I suppose we should
also fix compareTerm here for UTF-16 ordering at some point?)
> convert automaton to char[] based processing and TermRef / TermsEnum api
> ------------------------------------------------------------------------
>
> Key: LUCENE-2090
> URL: https://issues.apache.org/jira/browse/LUCENE-2090
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Robert Muir
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2090_TermRef_flex.patch,
> LUCENE-2090_TermRef_flex2.patch, LUCENE-2090_TermRef_flex3.patch
>
>
> The automaton processing is currently done with String, mostly because
> TermEnum is based on String.
> it is easy to change the processing to work with char[], since behind the
> scenes this is used anyway.
> in general I think we should make sure char[] based processing is exposed in
> the automaton pkg anyway, for things like pattern-based tokenizers and such.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]