[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102062#comment-13102062 ]
Mck SembWever edited comment on CASSANDRA-3150 at 9/10/11 6:02 PM: ------------------------------------------------------------------- Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} which in turn depends on "index_interval". Reading {{StorageService.getSplits(..)}} i would guess that the split can in fact contain many more keys with the default sampling of 128. Question is how low can/should i bring index_interval ? was (Author: michaelsembwever): Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} > ColumnFormatRecordReader loops forever > -------------------------------------- > > Key: CASSANDRA-3150 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 0.8.4 > Reporter: Mck SembWever > Assignee: Mck SembWever > Priority: Critical > Attachments: CASSANDRA-3150.patch, > attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log > > > From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 > {quote} > bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner > bq. CFIF's inputSplitSize=196608 > bq. 3 map tasks (from 4013) is still running after read 25 million rows. > bq. Can this be a bug in StorageService.getSplits(..) ? > getSplits looks pretty foolproof to me but I guess we'd need to add > more debug logging to rule out a bug there for sure. > I guess the main alternative would be a bug in the recordreader paging. > {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira