Mark Payne created NIFI-2874:
--------------------------------

             Summary: StreamDemarcator can return wrong data for token
                 Key: NIFI-2874
                 URL: https://issues.apache.org/jira/browse/NIFI-2874
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
            Reporter: Mark Payne
            Assignee: Mark Payne
            Priority: Critical
             Fix For: 1.1.0, 0.7.1


There is a case where StreamDemarcator can return the wrong data for a token. 
If a token ends at the end of the buffer, and the next token is smaller than 
the previous, it can result in the next token keeping part of the buffer's 
content. The code below is a unit test that exposes this:

{code}
    @Test
    public void testOnBufferSplitNoTrailingDelimiter() throws IOException {
        final byte[] inputData = "Yes\nNo".getBytes(StandardCharsets.UTF_8);
        ByteArrayInputStream is = new ByteArrayInputStream(inputData);
        StreamDemarcator scanner = new StreamDemarcator(is, "\n".getBytes(), 
1000, 3);

        final byte[] first = scanner.nextToken();
        final byte[] second = scanner.nextToken();
        assertNotNull(first);
        assertNotNull(second);

        assertArrayEquals(first, new byte[] {'Y', 'e', 's'});
        assertArrayEquals(second, new byte[] {'N', 'o'});
    }
{code}

In this case, the second token, which should be 'No' comes back as 'Nos' 
because it contains the 's' from the previous token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to