[ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138305#comment-16138305 ]
Jason Brown commented on CASSANDRA-13630: ----------------------------------------- bq. I thought worst case memory amplification from this NIO approach was 2x message size which is worse than our current 1x message size, but it's not, it's cluster size * message size if a message is fanned out to all nodes in the cluster. We do not have 1x amplification in pre-4.0 code; it's always been messageSize times the number of target peers. In `OutboundTcpConnector` we wrote into a [backing buffer of 64k|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L457] for each outbound peer and flushed when the buffer filled up (see `BufferedDataOutputStreamPlus`). The cost of the amplification is hidden by that reusable backing buffer, but it's still there. With CASSANDRA-8457, everything gets it's own distinct buffer, allocated once per-message, which is serialized to and then flushed. With this ticket we'll move back to the previous model where there's a backing buffer that's used for aggregating small messages or chunks of larger messages. That buffer, of course, is not reused, but that's because of the asynchronous nature of NIO vs blocking IO. (FTR, I have thought about moving serialization outside of the "outbound connections" (either `OutboundTcpConnection` or netty handlers) - where we serialize before sending to the outbound channels and send a slice of a buffer to those channels. That way you only serialize once (less repetitive CPU work), as well as potentially consume less memory. But I think that's a different ticket.) bq. I really wonder if that be a shared pool of threads and we size it generously yeah, i thought about this. The problem is that because the deserialization is blocking, you basically need one thread in the pool for each "blocker"; else you starve some deserialization activities. Hence, i just used a background thread. Not my favorite choice, but I'm not sure a "well-sized" pool will be sufficient. Reading over your comments on the code itself this morning. > support large internode messages with netty > ------------------------------------------- > > Key: CASSANDRA-13630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13630 > Project: Cassandra > Issue Type: Task > Components: Streaming and Messaging > Reporter: Jason Brown > Assignee: Jason Brown > Fix For: 4.0 > > > As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the > scope of that ticket. However, we still need that functionality to ship a > correctly operating internode messaging subsystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org