[ https://issues.apache.org/jira/browse/CASSANDRA-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872004#action_12872004 ]
Jeremy Hanna commented on CASSANDRA-1042: ----------------------------------------- To clarify: to fix the problem - this removes some ordering in StorageProxy.getRangeIterator since getRestricedRanges should already have returned the right thing. > ColumnFamilyRecordReader returns duplicate rows > ----------------------------------------------- > > Key: CASSANDRA-1042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1042 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 0.6 > Reporter: Joost Ouwerkerk > Assignee: Jonathan Ellis > Fix For: 0.6.3 > > Attachments: CASSANDRA-1042.patch.txt, cassandra.tar.gz > > > There's a bug in ColumnFamilyRecordReader that appears when processing a > single split (which happens in most tests that have small number of rows), > and potentially in other cases. When the start and end tokens of the split > are equal, duplicate rows can be returned. > Example with 5 rows: > token (start and end) = 53193025635115934196771903670925341736 > Tokens returned by first get_range_slices iteration (all 5 rows): > 16955237001963240173058271559858726497 > 40670782773005619916245995581909898190 > 99079589977253916124855502156832923443 > 144992942750327304334463589818972416113 > 166860289390734216023086131251507064403 > Tokens returned by next iteration (first token is last token from > previous, end token is unchanged) > 16955237001963240173058271559858726497 > 40670782773005619916245995581909898190 > Tokens returned by final iteration (first token is last token from > previous, end token is unchanged) > [] (empty) > In this example, the mapper has processed 7 rows in total, 2 of which > were duplicates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.