[ https://issues.apache.org/jira/browse/NIFI-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Payne updated NIFI-2874: ----------------------------- Status: Patch Available (was: Open) > StreamDemarcator can return wrong data for token > ------------------------------------------------ > > Key: NIFI-2874 > URL: https://issues.apache.org/jira/browse/NIFI-2874 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Critical > Fix For: 1.1.0, 0.7.1 > > > There is a case where StreamDemarcator can return the wrong data for a token. > If a token ends at the end of the buffer, and the next token is smaller than > the previous, it can result in the next token keeping part of the buffer's > content. The code below is a unit test that exposes this: > {code} > @Test > public void testOnBufferSplitNoTrailingDelimiter() throws IOException { > final byte[] inputData = "Yes\nNo".getBytes(StandardCharsets.UTF_8); > ByteArrayInputStream is = new ByteArrayInputStream(inputData); > StreamDemarcator scanner = new StreamDemarcator(is, "\n".getBytes(), > 1000, 3); > final byte[] first = scanner.nextToken(); > final byte[] second = scanner.nextToken(); > assertNotNull(first); > assertNotNull(second); > assertArrayEquals(first, new byte[] {'Y', 'e', 's'}); > assertArrayEquals(second, new byte[] {'N', 'o'}); > } > {code} > In this case, the second token, which should be 'No' comes back as 'Nos' > because it contains the 's' from the previous token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)