[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900005#comment-15900005
 ] 

Ariel Weisberg commented on CASSANDRA-13265:
--------------------------------------------

Awesome.
* [This needs to be configurable from the YAML and via JMX. Add it to 
{{StorageProxyMBean}}/{{StorageProxy}}, store the value in {{Config}} then get 
it from {{DatabaseDescriptor}} it will need to be a volatile field in 
{{Config}}. Add a setter and a getter. I have the setter return the previous 
value, but a lot of the existing ones don't. 10 seconds is also way too high as 
a default. I propose 200 milliseconds. It's an improvement over today, but 
still aggressive at trying to free memory. Also use 
{{java.util.concurrent.TimeUnit}} to convert to the value you 
want.|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R134]
* [It should include drained messages as well. To be 100% correct you would 
have to count how many messages you have iterated over and sent and then 
subtract.|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R261]
* [Typo, "thus 
letting"|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R604]
* [Extra line 
break|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R625]
* [We don't do/allow author 
tags|https://github.com/apache/cassandra/pull/95/files#diff-c16fff43d2aafe61a4219656d1ab6f9eR26]
* [Use 
TimeUnit|https://github.com/apache/cassandra/pull/95/files#diff-c16fff43d2aafe61a4219656d1ab6f9eR36]
* [It isn't using the 
constant.|https://github.com/apache/cassandra/pull/95/files#diff-c16fff43d2aafe61a4219656d1ab6f9eR118]
* [Just in case maybe assert the droppable/non-droppable status of the verbs. 
Or does it not matter since the tests will fail 
anyways?|https://github.com/apache/cassandra/pull/95/files#diff-c16fff43d2aafe61a4219656d1ab6f9eR33]

I had a sad and unfortunate thought. We are going about expiring wrong by 
counting messages. We really want to expire based on the weight of the queue. 
Also I really really hate that bounded queues that aren't weighted are a thing. 
Let's not do that here though since I want to backport this to other versions.


> Expiration in OutboundTcpConnection can block the reader Thread
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-13265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>            Reporter: Christian Esken
>            Assignee: Christian Esken
>         Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -----
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (100000 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to