[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099234#comment-13099234
 ] 

Mck SembWever commented on CASSANDRA-3150:
------------------------------------------

Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is 
batchRowCount rows).

What happens is split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and 
get_rage_slices(..) will start returning wrapping ranges. This will still 
return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals 
split.getEndToken(), or b) get_range_slices(..) is called with startToken 
equals split.getEndToken() OR a gap so small there exists no rows in between.

> ColumnFormatRecordReader loops forever
> --------------------------------------
>
>                 Key: CASSANDRA-3150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.8.4
>            Reporter: Mck SembWever
>            Assignee: Mck SembWever
>            Priority: Critical
>         Attachments: CASSANDRA-3150.patch
>
>
> From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
> {quote}
> bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
> bq. CFIF's inputSplitSize=196608
> bq. 3 map tasks (from 4013) is still running after read 25 million rows.
> bq. Can this be a bug in StorageService.getSplits(..) ?
> getSplits looks pretty foolproof to me but I guess we'd need to add
> more debug logging to rule out a bug there for sure.
> I guess the main alternative would be a bug in the recordreader paging.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to