[ https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Junping Du updated MAPREDUCE-6549: ---------------------------------- Fix Version/s: 2.8.0 > multibyte delimiters with LineRecordReader cause duplicate records > ------------------------------------------------------------------ > > Key: MAPREDUCE-6549 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 > Affects Versions: 2.7.2 > Reporter: Dustin Cote > Assignee: Wilfred Spiegelenburg > Fix For: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch, > MAPREDUCE-6549.3.patch > > > LineRecorderReader currently produces duplicate records under certain > scenarios such as: > 1) input string: "abc+++def++ghi++" > delimiter string: "+++" > test passes with all sizes of the split > 2) input string: "abc++def+++ghi++" > delimiter string: "+++" > test fails with a split size of 4 > 2) input string: "abc+++def++ghi++" > delimiter string: "++" > test fails with a split size of 5 > 3) input string "abc+++defg++hij++" > delimiter string: "++" > test fails with a split size of 4 > 4) input string "abc++def+++ghi++" > delimiter string: "++" > test fails with a split size of 9 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org