[ https://issues.apache.org/jira/browse/FLUME-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505714#comment-14505714 ]
Jameel Al-Aziz commented on FLUME-2215: --------------------------------------- [~sodre], I haven't managed to get an isolated test case where the BufferUnderflow gets triggered. I can trigger it reliably given a full file, but when isolating the line where the exception occurs, it seems to work. That being said, I believe I found an issue with the patches submitted so far. If you have malformed input where one character is a high surrogate, but the next character is not the complementing lower surrogate, you get a case where we have two characters in the buffer that are not part of a surrogate pair. This seems to work, except that the patches "past" the the second character. Therefore, an error occurring between the first and second getChar() won't recover from the correct position. It also appears that this problem is further complicated if you choose to ignore malformed input (although, I'm still testing the edge cases here). > ResettableFileInputStream can't support ucs-4 character > -------------------------------------------------------- > > Key: FLUME-2215 > URL: https://issues.apache.org/jira/browse/FLUME-2215 > Project: Flume > Issue Type: Bug > Affects Versions: v1.5.0 > Reporter: syntony liu > Assignee: Santiago M. Mola > Priority: Critical > Labels: patch > Attachments: > 0001-FLUME-2215-Fixes-reading-surrogate-based-chars.patch, > FLUME-2215-0-README.txt, FLUME-2215-0.patch, FLUME-2215-1-README.txt, > FLUME-2215-1.patch > > > ResettableFileInputStream.java:readChar() not handle ucs-4 character. it need > 2 charBuf. it cause an unexpected termination。 > a temporary solution: > if (res.isOverflow() && !charBuf.hasRemaining()){ > logger.warn("decoder ucs-4 at postion: {}" , buf.position()); > tmpBuf.clear(); > res = decoder.decode(buf, tmpBuf, isEndOfInput); > incrPosition( buf.position() - start, false); > return '?'; > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)