[ 
https://issues.apache.org/jira/browse/FLUME-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199523#comment-14199523
 ] 

Johny Rufus edited comment on FLUME-2538 at 11/12/14 5:09 PM:
--------------------------------------------------------------

One way to see the difference in the behavior is by running the below code in 
Jdk 7 and 8
@Test
  public void testTest() {
    CharsetDecoder decoder = Charsets.UTF_8.newDecoder();

    decoder.onMalformedInput(CodingErrorAction.REPLACE);
    decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);

    ByteBuffer buf = ByteBuffer.allocate(20);
    buf.put(new byte[] { (byte)0xf8, (byte)0xa1, (byte)0xa1, (byte)0xa1,
            (byte)0xa1 });
    buf.flip();

    CharBuffer cbuf = CharBuffer.allocate(1);
    CoderResult res = decoder.decode(buf, cbuf, false);
    System.out.println(decoder.getClass().getName());
    System.out.println("Pos --- "+buf.position()+ "  cbuf pos 
--"+cbuf.position());
}

Jdk 7 output -->Pos --- 5  cbuf pos --1
Jdk 8 output -->Pos --- 1  cbuf pos --1

In Jdk7: If there is a invalid byte sequence and CodingErrorAction.Replace is 
specified, then the complete set of invalid bye sequence is treated as one 
malformed character and replaced by one replacement character in the output 
buffer  [Hence the position is advanced by 5 as seen in the output as its a 5 
byte invalid sequence]

In Jdk8: Each invalid byte in the sequence is treated as a malformed character 
and hence we see the buffer being advanced by only one position. So for every 
malformed character, we see the replacement character included in the output 
buffer

Attaching a patch that accommodates the above modified behavior 


was (Author: jrufus):
One way to see the difference in the behavior is by running the below code in 
Jdk 7 and 8
@Test
  public void testTest() {
    CharsetDecoder decoder = Charsets.UTF_8.newDecoder();

    decoder.onMalformedInput(CodingErrorAction.REPLACE);
    decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);

    ByteBuffer buf = ByteBuffer.allocate(20);
    buf.put(new byte[] { (byte)0xf8, (byte)0xa1, (byte)0xa1, (byte)0xa1,
            (byte)0xa1 });
    buf.flip();

    CharBuffer cbuf = CharBuffer.allocate(1);
    CoderResult res = decoder.decode(buf, cbuf, false);
    System.out.println(decoder.getClass().getName());
    System.out.println("Pos --- "+buf.position()+ "  cbuf pos 
--"+cbuf.position());
}

Jdk 7 output -->Pos --- 5  cbuf pos --1
Jdk 8 output -->Pos --- 1  cbuf pos --1

In Jdk7: If there are a group of malformed characters and 
CodingErrorAction.Replace is specified, then the complete set of adjacent 
malformed characters in the buffer are replaced [Hence the position is advanced 
by 5 as seen in the output as there are 5 malformed chars in the buffer]

In Jdk8: Each malformed character is treated as a separate entity and hence we 
see the buffer being advanced by only one position. So for every malformed 
character, we see the replacement character included in the output buffer

Attaching a patch that accomodates the above modified behavior 

> TestResettableFileInputStream fails on JDK 8
> --------------------------------------------
>
>                 Key: FLUME-2538
>                 URL: https://issues.apache.org/jira/browse/FLUME-2538
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.5.0.1
>            Reporter: Johny Rufus
>            Assignee: Johny Rufus
>             Fix For: v1.6.0
>
>         Attachments: FLUME-2538.patch
>
>
> TestResettableFileInputStream.testUtf8DecodeErrorHandlingReplace fails in JDK 
> 8
> "testUtf8DecodeErrorHandlingReplace(org.apache.flume.serialization.TestResettableFileInputStream)
>   Time elapsed: 6 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<...(���)
> NonUnicode: (�[])
> > but was:<...(���)
> NonUnicode: (�[����]) "
> Charsetdecoder.decode has changed in its behavior, as to how it handles 
> CodingErrorAction.Replace policy 
> Will submit a patch today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to