[
https://issues.apache.org/jira/browse/FLUME-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505714#comment-14505714
]
Jameel Al-Aziz commented on FLUME-2215:
---------------------------------------
[~sodre], I haven't managed to get an isolated test case where the
BufferUnderflow gets triggered. I can trigger it reliably given a full file,
but when isolating the line where the exception occurs, it seems to work.
That being said, I believe I found an issue with the patches submitted so far.
If you have malformed input where one character is a high surrogate, but the
next character is not the complementing lower surrogate, you get a case where
we have two characters in the buffer that are not part of a surrogate pair.
This seems to work, except that the patches "past" the the second character.
Therefore, an error occurring between the first and second getChar() won't
recover from the correct position.
It also appears that this problem is further complicated if you choose to
ignore malformed input (although, I'm still testing the edge cases here).
> ResettableFileInputStream can't support ucs-4 character
> --------------------------------------------------------
>
> Key: FLUME-2215
> URL: https://issues.apache.org/jira/browse/FLUME-2215
> Project: Flume
> Issue Type: Bug
> Affects Versions: v1.5.0
> Reporter: syntony liu
> Assignee: Santiago M. Mola
> Priority: Critical
> Labels: patch
> Attachments:
> 0001-FLUME-2215-Fixes-reading-surrogate-based-chars.patch,
> FLUME-2215-0-README.txt, FLUME-2215-0.patch, FLUME-2215-1-README.txt,
> FLUME-2215-1.patch
>
>
> ResettableFileInputStream.java:readChar() not handle ucs-4 character. it need
> 2 charBuf. it cause an unexpected termination。
> a temporary solution:
> if (res.isOverflow() && !charBuf.hasRemaining()){
> logger.warn("decoder ucs-4 at postion: {}" , buf.position());
> tmpBuf.clear();
> res = decoder.decode(buf, tmpBuf, isEndOfInput);
> incrPosition( buf.position() - start, false);
> return '?';
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)