[ 
https://issues.apache.org/jira/browse/FLUME-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505714#comment-14505714
 ] 

Jameel Al-Aziz commented on FLUME-2215:
---------------------------------------

[~sodre], I haven't managed to get an isolated test case where the 
BufferUnderflow gets triggered. I can trigger it reliably given a full file, 
but when isolating the line where the exception occurs, it seems to work.

That being said, I believe I found an issue with the patches submitted so far. 
If you have malformed input where one character is a high surrogate, but the 
next character is not the complementing lower surrogate, you get a case where 
we have two characters in the buffer that are not part of a surrogate pair. 
This seems to work, except that the patches "past" the the second character. 
Therefore, an error occurring between the first and second getChar() won't 
recover from the correct position.

It also appears that this problem is further complicated if you choose to 
ignore malformed input (although, I'm still testing the edge cases here).

> ResettableFileInputStream can't support  ucs-4 character
> --------------------------------------------------------
>
>                 Key: FLUME-2215
>                 URL: https://issues.apache.org/jira/browse/FLUME-2215
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.5.0
>            Reporter: syntony liu
>            Assignee: Santiago M. Mola
>            Priority: Critical
>              Labels: patch
>         Attachments: 
> 0001-FLUME-2215-Fixes-reading-surrogate-based-chars.patch, 
> FLUME-2215-0-README.txt, FLUME-2215-0.patch, FLUME-2215-1-README.txt, 
> FLUME-2215-1.patch
>
>
> ResettableFileInputStream.java:readChar() not handle ucs-4 character. it need 
> 2 charBuf. it cause an unexpected termination。
>  a temporary solution:
>      if (res.isOverflow() && !charBuf.hasRemaining()){ 
>          logger.warn("decoder ucs-4 at postion: {}" , buf.position()); 
>         tmpBuf.clear();  
>         res = decoder.decode(buf, tmpBuf, isEndOfInput); 
>         incrPosition( buf.position() - start, false); 
>        return '?'; 
>      } 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to