[
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053319#comment-13053319
]
Robert Muir commented on LUCENE-3080:
-------------------------------------
Well, personally i am hesitant to introduce any encodings or bytes into our
current analysis chain, because its unnecessary complexity that will introduce
bugs (at the moment, its the users responsibility to create the appropriate
Reader etc).
Furthermore, not all character sets can be 'corrected' with a linear conversion
like this: for example some actually order the text in a different direction,
and things like that... there are many quirks to non-unicode character sets.
Maybe as a start, it would be useful to prototype some simple experiments with
a "binary analysis chain" and hackup a highlighter to work with them? This way
we would have an understanding of what the potential performance gain is.
Here's some example code for a dead simple binary analysis chain that only uses
bytes the whole way through, you could take these ideas and prototype one with
just all ascii-terms and split on the space byte and such:
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestBinaryTerms.java
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/BinaryTokenStream.java
> cutover highlighter to BytesRef
> -------------------------------
>
> Key: LUCENE-3080
> URL: https://issues.apache.org/jira/browse/LUCENE-3080
> Project: Lucene - Java
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Michael McCandless
>
> Highlighter still uses char[] terms (consumes tokens from the analyzer as
> char[] not as BytesRef), which is causing problems for merging SOLR-2497 to
> trunk.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]