[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511331#comment-14511331
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
-------------------------------------------

I was able to reproduce the OOM once in 2.1.2. I have found that the mutation 
stage is filling up with tasks and they look like responses to writes. In 2.1.2 
when it succeeds it kind of looks like it is just dropping the messages. 

The reason it fails at 300k is that some 50k or so get processed and 250k back 
up causing OOM. We could try and do some things to make this more robust 
against overload. Say by having the producer (IncomingTcpConnection) detect 
overload and start dropping messages without relying on the consumer 
(MutationStage) to drop them.

I am leaning towards not trying to fix this wart because it requires somewhat 
unrealistic conditions. There has to be no load balancing, a heap that is too 
small, and an oversubscribed instance.

[~mkjellman] I created a CASSANDRA-9237 for the issue of Gossip sharing a 
connection with most traffic.

> OutboundTcpConnectionPool should route messages to sockets by size not type
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 3.0
>
>         Attachments: 8789.diff
>
>
> I was looking at this trying to understand what messages flow over which 
> connection.
> For reads the request goes out over the command connection and the response 
> comes back over the ack connection.
> For writes the request goes out over the command connection and the response 
> comes back over the command connection.
> Reads get a dedicated socket for responses. Mutation commands and responses 
> both travel over the same socket along with read requests.
> Sockets are used uni-directional so there are actually four sockets in play 
> and four threads at each node (2 inbounded, 2 outbound).
> CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
> If someone remembers what situations were made better it would be good to 
> know.
> I am not clear on when/how this is helpful. The consumer side shouldn't be 
> blocking so the only head of line blocking issue is the time it takes to 
> transfer data over the wire.
> If message size is the cause of blocking issues then the current design mixes 
> small messages and large messages on the same connection retaining the head 
> of line blocking.
> Read requests share the same connection as write requests (which are large), 
> and write acknowledgments (which are small) share the same connections as 
> write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to