[ https://issues.apache.org/jira/browse/CASSANDRA-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906578#action_12906578 ]
Philip (flip) Kromer commented on CASSANDRA-1434: ------------------------------------------------- Right now the code does { buffer n mutations, holding each acc. to its endpoint. After n writes, check that all endpoint writes are finished, and dispatch to each endpoint its share of the n mutations } This is non-blocking at the socket level but ends up being blocking at the app level, and the wide variance in size has bad effects on gc at the cassandra end. I think the ColumnFamilyRecordWriter would see a speedup & improved stability with { buffer mutations, holding each acc. to its endpoint. When an endpoint has seen n writes, check that any previous write has finished, and dispatch to this endpoint a full buffer of N mutations }. > ColumnFamilyOutputFormat performs blocking writes for large batches > ------------------------------------------------------------------- > > Key: CASSANDRA-1434 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1434 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Reporter: Stu Hood > Assignee: Stu Hood > Fix For: 0.7 beta 2 > > Attachments: 0001-Switch-to-TFramedTransport-in-TestRingCache.patch, > 0002-Add-kth-endpoint-method-to-RingCache-and-improve-con.patch, > 0003-Remove-nesting-in-RingCache.patch, > 0004-Fix-regression-introduced-on-1322-add-all-replicas-o.patch > > > By default, ColumnFamilyOutputFormat batches > {{mapreduce.output.columnfamilyoutputformat.batch.threshold}} or > {{Long.MAX_VALUE}} mutations, and then performs a blocking write. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.