[ https://issues.apache.org/jira/browse/FLINK-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Yao reassigned FLINK-13589: -------------------------------- Assignee: Arvid Heise > DelimitedInputFormat index error on multi-byte delimiters with whole file > input splits > -------------------------------------------------------------------------------------- > > Key: FLINK-13589 > URL: https://issues.apache.org/jira/browse/FLINK-13589 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, > ORC, SequenceFile) > Affects Versions: 1.8.1 > Reporter: Adric Eckstein > Assignee: Arvid Heise > Priority: Blocker > Fix For: 1.9.2, 1.10.0 > > Attachments: delimiter-bug.patch > > > The DelimitedInputFormat can drops bytes when using input splits that have a > length of -1 (for reading the whole file). It looks like this is a simple > bug in handing the delimiter on buffer boundaries where the logic is > inconsistent for different split types. > Attached is a possible patch with fix and test. > -- This message was sent by Atlassian Jira (v8.3.4#803005)