[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924135#comment-15924135 ]
Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:55 PM: ---------------------------------------------------------------------- Done. Hint: Not everything is committed yet, as I have to remove my debug code from it. - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?) Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (/) Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (/) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Done. Hint: Not everything is committed yet, as I have to remove my debug code from it. - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?) Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x) Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (/) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. > Expiration in OutboundTcpConnection can block the reader Thread > --------------------------------------------------------------- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 > Reporter: Christian Esken > Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > ----- > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (100000 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)