[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924658#comment-15924658
 ] 

Ariel Weisberg edited comment on CASSANDRA-13265 at 3/14/17 7:31 PM:
---------------------------------------------------------------------

bq. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been 
there for quite some time hard coded, and in the long term I do not think it 
should be kept as-is. For example it would better to purge on the number of 
actually DROPPABLE messages in the queue (or their weight if you want to extend 
even further)
I agree but I don't want to add more to this change set and I want to backport 
it to at least 2.2. I suggest it primarily because it's very low effort to add 
a property as opposed to a full YAML + JMX config option.

bq. At Github? I can do so. But no PR, right? I saw it mentioned that one 
should not open PR's for Cassandra on Github as they cannot be handled (it's 
just a mirror).
Project policy is to not use PRs even for comments. Code review comments go in 
JIRA. What I and some others do is link to a compare view of what we intend to 
merge. Don't delete the branches your links point to because it invalidates the 
information in JIRA.

bq. I looked in more detail, and I think this a flaw in the original code 
"suggested" me to do this: drainedMessages.clear() is called twice, and one 
time would be enough. IMO it would be better to only keep the one at the end of 
the method and also do the drop-counting for the drained messages there. This 
would also cover a rather exotic case of the catch (Exception e) in the run() 
method. If an Exception is thrown, then there is a danger of nothing being 
counted.
So there is the issue of not counting drops as they happen. The thread can 
block for a long time writing messages so if we wait to count drops they won't 
show up quite as promptly. Not a huge deal, but I think we should log the drops 
especially due to timeouts as they happen rather than at the end. It's 
definitely true that we don't need to clear the backlog twice and we could log 
the drops due to items remaining in the backlog there. The loop decrements a 
counter every time it runs so you know how many remaining elements are being 
dropped if you hit break inner.


was (Author: aweisberg):
bq. I still think it's a good idea to avoid hard coding this kind of value so 
operators have options without recompiling.
I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been 
there for quite some time hard coded, and in the long term I do not think it 
should be kept as-is. For example it would better to purge on the number of 
actually DROPPABLE messages in the queue (or their weight if you want to extend 
even further)
I agree but I don't want to add more to this change set and I want to backport 
it to at least 2.2. I suggest it primarily because it's very low effort to add 
a property as opposed to a full YAML + JMX config option.

bq. At Github? I can do so. But no PR, right? I saw it mentioned that one 
should not open PR's for Cassandra on Github as they cannot be handled (it's 
just a mirror).
Project policy is to not use PRs even for comments. Code review comments go in 
JIRA. What I and some others do is link to a compare view of what we intend to 
merge. Don't delete the branches your links point to because it invalidates the 
information in JIRA.

bq. I looked in more detail, and I think this a flaw in the original code 
"suggested" me to do this: drainedMessages.clear() is called twice, and one 
time would be enough. IMO it would be better to only keep the one at the end of 
the method and also do the drop-counting for the drained messages there. This 
would also cover a rather exotic case of the catch (Exception e) in the run() 
method. If an Exception is thrown, then there is a danger of nothing being 
counted.
So there is the issue of not counting drops as they happen. The thread can 
block for a long time writing messages so if we wait to count drops they won't 
show up quite as promptly. Not a huge deal, but I think we should log the drops 
especially due to timeouts as they happen rather than at the end. It's 
definitely true that we don't need to clear the backlog twice and we could log 
the drops due to items remaining in the backlog there. The loop decrements a 
counter every time it runs so you know how many remaining elements are being 
dropped if you hit break inner.

> Expiration in OutboundTcpConnection can block the reader Thread
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-13265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>            Reporter: Christian Esken
>            Assignee: Christian Esken
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>         Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -----
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (100000 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to