[ 
https://issues.apache.org/jira/browse/CASSANDRA-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906578#action_12906578
 ] 

Philip (flip) Kromer commented on CASSANDRA-1434:
-------------------------------------------------

Right now the code does { buffer n mutations, holding each  acc. to its 
endpoint. After n writes, check that all endpoint writes are finished, and 
dispatch to each endpoint its share of the n mutations }

This is non-blocking at the socket level but ends up being blocking at the app 
level, and the wide variance in size has bad effects on gc at the cassandra end.

I think the ColumnFamilyRecordWriter would see a speedup & improved stability 
with  { buffer mutations, holding each acc. to its endpoint. When an endpoint 
has seen n writes, check that any previous write has finished, and dispatch to 
this endpoint a full buffer of N mutations }.

> ColumnFamilyOutputFormat performs blocking writes for large batches
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1434
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.7 beta 2
>
>         Attachments: 0001-Switch-to-TFramedTransport-in-TestRingCache.patch, 
> 0002-Add-kth-endpoint-method-to-RingCache-and-improve-con.patch, 
> 0003-Remove-nesting-in-RingCache.patch, 
> 0004-Fix-regression-introduced-on-1322-add-all-replicas-o.patch
>
>
> By default, ColumnFamilyOutputFormat batches 
> {{mapreduce.output.columnfamilyoutputformat.batch.threshold}} or 
> {{Long.MAX_VALUE}} mutations, and then performs a blocking write.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to