[ https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026312#comment-15026312 ]
Wilfred Spiegelenburg commented on MAPREDUCE-6549: -------------------------------------------------- Test failures are not related and tracked in different jiras: testIpcWithReaderQueuing is tracked by HADOOP-10406 testGangliaMetrics2 is tracked in HADOOP-12588 testDeprecatedUmask is tracked in HDFS-9451 > multibyte delimiters with LineRecordReader cause duplicate records > ------------------------------------------------------------------ > > Key: MAPREDUCE-6549 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 > Affects Versions: 2.7.2 > Reporter: Dustin Cote > Assignee: Wilfred Spiegelenburg > Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch, > MAPREDUCE-6549.3.patch > > > LineRecorderReader currently produces duplicate records under certain > scenarios such as: > 1) input string: "abc+++def++ghi++" > delimiter string: "+++" > test passes with all sizes of the split > 2) input string: "abc++def+++ghi++" > delimiter string: "+++" > test fails with a split size of 4 > 2) input string: "abc+++def++ghi++" > delimiter string: "++" > test fails with a split size of 5 > 3) input string "abc+++defg++hij++" > delimiter string: "++" > test fails with a split size of 4 > 4) input string "abc++def+++ghi++" > delimiter string: "++" > test fails with a split size of 9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)