[ https://issues.apache.org/jira/browse/CASSANDRA-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889986#action_12889986 ]
Jeremy Hanna commented on CASSANDRA-1042: ----------------------------------------- +1 > ColumnFamilyRecordReader returns duplicate rows > ----------------------------------------------- > > Key: CASSANDRA-1042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1042 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 0.6 > Reporter: Joost Ouwerkerk > Assignee: Jonathan Ellis > Fix For: 0.6.4 > > Attachments: 1042-0_6.txt, 1042-test.txt, 1042-v2.txt, > Cassandra-1042-0_6-branch.patch.txt, CASSANDRA-1042-trunk.patch.txt, > cassandra.tar.gz, duplicate_keys.rtf > > > There's a bug in ColumnFamilyRecordReader that appears when processing a > single split (which happens in most tests that have small number of rows), > and potentially in other cases. When the start and end tokens of the split > are equal, duplicate rows can be returned. > Example with 5 rows: > token (start and end) = 53193025635115934196771903670925341736 > Tokens returned by first get_range_slices iteration (all 5 rows): > 16955237001963240173058271559858726497 > 40670782773005619916245995581909898190 > 99079589977253916124855502156832923443 > 144992942750327304334463589818972416113 > 166860289390734216023086131251507064403 > Tokens returned by next iteration (first token is last token from > previous, end token is unchanged) > 16955237001963240173058271559858726497 > 40670782773005619916245995581909898190 > Tokens returned by final iteration (first token is last token from > previous, end token is unchanged) > [] (empty) > In this example, the mapper has processed 7 rows in total, 2 of which > were duplicates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.