[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924135#comment-15924135
 ] 

Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:55 PM:
----------------------------------------------------------------------

Done. Hint: Not everything is committed yet, as I have to remove my debug code 
from it.
- (/) A smaller value could potentially ...
- (/)  You shouldn't need the check for null? Usually we "just" make sure its 
not null
    OK.  I thought it might be possible to set this to null, but even JConsole 
refuses it.
- (/) Using a boxed integer makes it a bit confusing ...
  ACK. Happily changed that. Looks like I followed bad examples.
- (/) Avoid unrelated whitespace changes.
  OK. I missed that after moving the field.
- (?)  I still think it's a good idea to avoid hard coding this kind of value 
so operators have options without recompiling.
 I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been 
there for quite some time hard coded, and in the long term I do not think it 
should be kept as-is. For example it would better to purge on the number of 
actually DROPPABLE messages in the queue (or their weight if you want to extend 
even further)
- (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You 
can piggyback on backlogExpirationActive to get the desired effects from the 
Java memory model. [...] I wouldn't change it ...
Yes.  I am  aware of that and using that technique often. Here I did not like 
it as visibility effects would not be obvious, unless explicitly documented. 
You are probably aware what Brian Goetz says about piggybacking in his JCIP 
book. BTW: A more obvious usage is for me in status fields, e.g. to make the 
results of a Future visible. I won't change it either, so marking this as done.
- (?)    Breaking out the uber bike shedding this could be maybeExpireMessages.
Nope, I am not going back that road. I had expireMessagesConditionally() before 
and changed it on request. If we do this, then a Set should not have an add() 
method but only a maybeAdd(), because it might not add the entry. Also I added 
clear documentation, so it should be fine. 
- (/)    Swap the order of these two stores so it doesn't do extra expirations.
  Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block 
which would disable expiration infinitely. I was quite tired yesterday. I am 
swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. 
:-|
- (/)  This is not quite correct you can't count drainCount as dropped because 
some of the drained messages may have been sent during iteration.
  In progress. I am wondering if we should include fixing the drop count it in 
this patch, as it will likely create even more conflicts. OTOH I have to touch 
some related methods anyhow. I will think about it.



was (Author: cesken):
Done. Hint: Not everything is committed yet, as I have to remove my debug code 
from it.
- (/) A smaller value could potentially ...
- (/)  You shouldn't need the check for null? Usually we "just" make sure its 
not null
    OK.  I thought it might be possible to set this to null, but even JConsole 
refuses it.
- (/) Using a boxed integer makes it a bit confusing ...
  ACK. Happily changed that. Looks like I followed bad examples.
- (/) Avoid unrelated whitespace changes.
  OK. I missed that after moving the field.
- (?)  I still think it's a good idea to avoid hard coding this kind of value 
so operators have options without recompiling.
 I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been 
there for quite some time hard coded, and in the long term I do not think it 
should be kept as-is. For example it would better to purge on the number of 
actually DROPPABLE messages in the queue (or their weight if you want to extend 
even further)
- (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You 
can piggyback on backlogExpirationActive to get the desired effects from the 
Java memory model. [...] I wouldn't change it ...
Yes.  I am  aware of that and using that technique often. Here I did not like 
it as visibility effects would not be obvious, unless explicitly documented. 
You are probably aware what Brian Goetz says about piggybacking in his JCIP 
book. BTW: A more obvious usage is for me in status fields, e.g. to make the 
results of a Future visible. I won't change it either, so marking this as done.
- (?)    Breaking out the uber bike shedding this could be maybeExpireMessages.
Nope, I am not going back that road. I had expireMessagesConditionally() before 
and changed it on request. If we do this, then a Set should not have an add() 
method but only a maybeAdd(), because it might not add the entry. Also I added 
clear documentation, so it should be fine. 
- (x)    Swap the order of these two stores so it doesn't do extra expirations.
  Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block 
which would disable expiration infinitely. I was quite tired yesterday. I am 
swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. 
:-|
- (/)  This is not quite correct you can't count drainCount as dropped because 
some of the drained messages may have been sent during iteration.
  In progress. I am wondering if we should include fixing the drop count it in 
this patch, as it will likely create even more conflicts. OTOH I have to touch 
some related methods anyhow. I will think about it.


> Expiration in OutboundTcpConnection can block the reader Thread
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-13265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>            Reporter: Christian Esken
>            Assignee: Christian Esken
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>         Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -----
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (100000 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to