[
https://issues.apache.org/jira/browse/ACCUMULO-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167552#comment-13167552
]
Billie Rinaldi commented on ACCUMULO-209:
-----------------------------------------
ByteArrayBackedCharSequence doesn't appear to be used by anything else. Keith
pointed out that the new code copies the data each time, which the old code did
not. We may want to take a look at the performance difference.
> RegExFilter does not properly regex when using multi-byte characters
> --------------------------------------------------------------------
>
> Key: ACCUMULO-209
> URL: https://issues.apache.org/jira/browse/ACCUMULO-209
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Affects Versions: 1.3.5
> Reporter: Jim Klucar
> Assignee: Billie Rinaldi
> Fix For: 1.4.0, 1.5.0
>
> Attachments: accumulo-209-RegExFilter.patch,
> accumulo-209-RegExFilterTest.patch, accumulo-209.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The current RegExFilter class uses a ByteArrayBackedCharSequence to set the
> data to match against. The ByteArrayBackedCharSequence contains a line of
> code that prevents the matcher from properly matching multi-byte characters.
> Line 49 of ByteArrayBackedCharSequence.java is:
> return (char) (0xff & data[offset + index]);
>
> This incorrectly casts a single byte from the byte array to a char, which is
> 2 bytes in Java. This prevents the RegExFilter from properly performing
> Regular Expressions on multi-byte character encoded values.
> A patch for the RegExFilter.java file has been created and will be submitted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira