[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512587#comment-14512587
 ] 

Benedict commented on CASSANDRA-8789:
-

bq. Gossip always contended with mutation *responses* and read responses.

I suspect there may be an issue with nomenclature here. These statements made 
by Ariel are both true, but the internal nomenclature for both of these is 
REQUEST_RESPONSE.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512597#comment-14512597
 ] 

Benedict commented on CASSANDRA-8789:
-

FTR, my current perception of this is:

* it does look to me like the increased throughput of the new code is a 
plausible cause of server degradation in these localhost tests, since we know 
that the server has no extra shedding logic in place beyond the normal timeout. 
** improved shedding should be addressed separately, e.g. CASSANDRA-8518
* that doesn't mean head of line blocking isn't a real concern, especially for 
low bandwidth links
** it does seem likely already an issue in 2.0/2.1 to some greater or lesser 
degree given the existing combination of gossip with read response data
** however this does change the exposure profile, and especially for smart 
routed clients it might exacerbate this problem in certain cases
** i don't think the exposure profile is sufficiently different to consider 
this a regression or to revert the other positive improvements delivered by 
this change
* I do think we can quite easily manage this by opening a new connection, 
managed by netty or raw NIO, over which we communicate only gossip messages (or 
other low frequency, high urgency messages) 
** this would in the typical case mean we are using no more connections than 
2.1 (though with large mutations/responses we may end up using 50% more 
connections), but:
*** these connections would not have significant threading impacts
*** nor would they have any impact on the improved throughput delivered by 
coalescing
* CASSANDRA-9237 is IMO a good place to continue this discussion

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512523#comment-14512523
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

I ran exactly what you suggested, except I routed gossip on the large message 
socket and set the large message threshold to Integer.MAX_VALUE.

getConnection() looked like
{noformat}
/**
 * returns the appropriate connection based on message type.
 * returns null if a connection could not be established.
 */
OutboundTcpConnection getConnection(MessageOut msg)
{
if (msg.getStage() == Stage.GOSSIP) {
return largeMessages;
}
return msg.payloadSize(smallMessages.getTargetVersion())  
LARGE_MESSAGE_THRESHOLD
   ? largeMessages
   : smallMessages;
}
{noformat}

And it fails in the exact same way. The fact that you have to pull in the 
coalescing fixes to get it to fail further confirms my belief that messaging 
got faster (when there are no network issues) not slower and that is leading to 
the node hanging. 2.0 doesn't log pending tasks in each stage so I would have 
to instrument some more to confirm this is the issue.

Trying to further prove that thesis I cherry-picked only 
144644bbf77a546c45db384e2dbc18e13f65c9ce and it ran 10 million writes no 
problem. Doesn't mean there isn't a head of line blocking issue when network 
connections are genuinely slow. That's why I created CASSANDRA-9237 and I have 
a couple ideas of how to make FD less dependent on heartbeats or how to get 
gossip messages to not be blocked.

Taking it one more step further I added back coalescing, but not the full deal. 
I just fixed a bug in OutboundTcpConnection where it would never write multiple 
messages at once without flushing.
{noformat}
diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java 
b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
index cddce07..e90cef8 100644
--- a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
+++ b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
@@ -132,7 +132,7 @@ public class OutboundTcpConnection extends Thread
 outer:
 while (true)
 {
-if (backlog.drainTo(drainedMessages, drainedMessages.size()) == 0)
+if (backlog.drainTo(drainedMessages, 128) == 0)
 {
 try
 {
@@ -142,7 +142,7 @@ public class OutboundTcpConnection extends Thread
 {
 throw new AssertionError(e);
 }
-
+backlog.drainTo(drainedMessages, 127);
 }
 currentMsgBufferCount = drainedMessages.size();
 {noformat}
With this change it fails.

Fundamentally the changes in this ticket as Benedict pointed out are not 
completely new. Gossip always contended with mutation responses and read 
responses. The big change is that small mutations share a socket with gossip 
messages.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512525#comment-14512525
 ] 

Benedict commented on CASSANDRA-8789:
-

I'm confused by that output and your analysis, so if you could clarify it would 
be appreciated. The message on node1 doesn't indicate anything about the TCP 
connection, only that we have 3 gossip messages on the node that have yet to be 
processed, meaning the gossip _stage_ (thread pool) is backed up for some 
reason. Possibly due to the node being overloaded.

The second messages on the other hand seem to indicate the node1 really is 
suffering difficulty, though? Because it cannot reconnect its connection to it, 
after it was forcibly closed by the gossiper (though it is possible we have 
some other problems wrt reconnection that I'm not aware of).


 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512580#comment-14512580
 ] 

Michael Kjellman commented on CASSANDRA-8789:
-

{quote}
Gossip always contended with mutation responses and read responses.
{quote}

No, they didn't. This is why there were two sockets in the first place. A 
Command and socket and a Data socket. I have said since day one when I raised 
this as a concern that with changes to Gossip (large and definitely outside the 
scope of 3.0) could be made so this might not be an issue.

Today with these changes and today's Gossip implementation -- this is a 
regression.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511879#comment-14511879
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

[~xedin] I tried to reproduce what Michael described and I found a root cause 
that is different and it seems to be an issue across multiple versions. IOW I 
think it is unrelated to this ticket.

It's definitely worth reproducing the problem Michael is talking about which is 
why I created a ticket for that specific issue. AFAIK no one besides myself has 
tested with and without this change on trunk and found that it has an impact.

[~mkjellman] if you try and run this using your reproducer steps if you let it 
hang long enough do you get the heap dump and OOM? If you revert the change are 
you saying everything starts to work for you?

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512147#comment-14512147
 ] 

Michael Kjellman commented on CASSANDRA-8789:
-

I tried to cleanly revert the following commits to demonstrate  that stress 
functions as expected without the changes from CASSANDRA-8789 but I got into 
conflict hell.

{noformat}
ebd0ae820a3fc7c13d58b6ddb48ba4d26b3fcd65
144644bbf77a546c45db384e2dbc18e13f65c9ce
1caa4f942662cd49609e86e2cd747421a9d71700
16499ca9b0080ea4d3c4ed3bc55c753bacc3c24e
828496492c51d7437b690999205ecc941f41a0a9
{noformat}

I tried to checkout 21bdf8700601f8150e8c13e0b4f71e061822c802, however the build 
is broken in that commit and it was reverted by jbellis in 
b25adc765769869d16410f1ca156227745d9b17b. I tried to next checkout 
21bdf8700601f8150e8c13e0b4f71e061822c802-1 
(1279009e0e29267d8fc3300071034e2ede6065ca) which I could build and unlike a few 
other commits I tried there were no exceptions logged while inserting data. In 
this commit though I do see issues with stress around 300k rows even with the 
OutboundTcpConnection changes backed out.

I next checked out 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 which is the 
previous commit to 828496492c51d7437b690999205ecc941f41a0a9 for 
OutboundTcpConnection. Testing against that commit I'm able to insert all rows 
as expected and Gossiper does not down any nodes during the duration of stress.

This commit however was logging intermittant NPEs (however otherwise load after 
stress looks sane...)

{noformat}
ERROR [CompactionExecutor:2] 2015-04-24 17:48:20,741 CassandraDaemon.java:182 - 
Exception in thread Thread[CompactionExecutor:2,1,main]
java.lang.NullPointerException: null
at 
org.apache.cassandra.io.sstable.format.SSTableReader$Tidier.tidy(SSTableReader.java:1798)
 ~[main/:na]
{noformat}

h4. Strees Output
{noformat}
Michaels-MacBook-Pro:cassandra-aml mkjellman$ tools/bin/cassandra-stress -l 3
null
total,interval_op_rate,interval_key_rate,latency,95th,99.9th,elapsed_time
31769,3176,3176,1.3,100.7,562.8,10
84820,5305,5305,1.1,76.7,452.8,20
152130,6731,6731,2.1,50.0,2137.6,30
267053,11492,11492,2.4,19.2,2137.6,40
346529,7947,7947,2.4,14.9,2159.1,51
455677,10914,10914,2.3,9.6,2131.6,61
619967,16429,16429,1.8,7.2,203.1,71
796739,17677,17677,1.3,5.3,202.4,81
967800,17106,17106,0.9,5.1,202.2,91
100,3220,3220,0.9,4.8,202.2,95


Averages from the middle 80% of values:
interval_op_rate  : 11865
interval_key_rate : 11865
latency median: 2.0
latency 95th percentile   : 17.7
latency 99.9th percentile : 1495.2
Total operation time  : 00:01:35
END
{noformat}

h4. nodetool ring output
{noformat}
Datacenter: datacenter1
==
AddressRackStatus State   LoadOwnsToken
  
3074457345618258602
127.0.0.1  rack1   Up Normal  294.67 MB   ?   
-9223372036854775808
127.0.0.2  rack1   Up Normal  246.91 MB   ?   
-3074457345618258603
127.0.0.3  rack1   Up Normal  247.19 MB   ?   
3074457345618258602
{noformat}

So -- I would agree that until the source of the regression (unrelated to this 
ticket) that is causing stress to fail even without the changes to 
OutboundTcpConnection reverted, we can't move forward evaluating the merits of 
changes to the actual TCP socket handling affecting the timely delivery of 
GOSSIP messages/verbs.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it 

[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166
 ] 

Michael Kjellman commented on CASSANDRA-8789:
-

I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able reproduces and have 2.0 stress fail with the changes 
to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which I can 
successfully run cassandra-stress -l 3 without failure.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511331#comment-14511331
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

I was able to reproduce the OOM once in 2.1.2. I have found that the mutation 
stage is filling up with tasks and they look like responses to writes. In 2.1.2 
when it succeeds it kind of looks like it is just dropping the messages. 

The reason it fails at 300k is that some 50k or so get processed and 250k back 
up causing OOM. We could try and do some things to make this more robust 
against overload. Say by having the producer (IncomingTcpConnection) detect 
overload and start dropping messages without relying on the consumer 
(MutationStage) to drop them.

I am leaning towards not trying to fix this wart because it requires somewhat 
unrealistic conditions. There has to be no load balancing, a heap that is too 
small, and an oversubscribed instance.

[~mkjellman] I created a CASSANDRA-9237 for the issue of Gossip sharing a 
connection with most traffic.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511595#comment-14511595
 ] 

Pavel Yaskevich commented on CASSANDRA-8789:


[~aweisberg] 

I kind of lost track of what is going on in this ticket. On one side Michael is 
saying that is the problem with prioritization and he had never got anything to 
OOM at all which [~benedict] seem to confirm (?), but you keep saying that this 
is an OOM for you every time, so maybe it's worth a while to try to figure out 
how to reproduce exact problem Michael is talking about instead? 

Also CASSANDRA-9237 seems to try to address the same problem which is caused by 
this ticket so why do you need a separate ticket for it instead of re-opening 
this one and working here? I would understand if you open a separate ticket for 
OOM tho which sounds to be a different problem...

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509731#comment-14509731
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

[~mkjellman] I tried this reverting the socket change and initially I thought 
it mattered, but I think I was swapping when it passed with the change reverted.

I tried it three times and they do the same thing. The first node OOMs and the 
heap dump blames tasks sitting in SEPExecutor.

I also ran with flight recorder and checked the node serving client traffic and 
one of the other nodes. There is some significant blocking on the coordinating 
node, but the longest pause was 300 milliseconds and total duration was 2 
seconds for a 1 minute period (200 pauses). If I chased those down I bet they 
are correlated with GC pauses.

I was able to get 2.1.2 to write hints, but not to fail the same way that trunk 
does with SEPExecutor OOM. Still digging into why trunk fares worse.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509954#comment-14509954
 ] 

Michael Kjellman commented on CASSANDRA-8789:
-

I'm less concerned about hints being generated in general. With the old stress 
+ defaults (and RF=3 to generate lots of MUTATIONS between nodes) hints will be 
generated bc we never will be able to keep up and send all of the 
REQUEST_RESPONSE before we see timeouts.

The real concern I have is that Gossiper/FD will kick in and DOWN healthy up 
nodes simply because we can't get gossip messages out onto the wire as they 
backup behind all of the REQUEST_RESPONSE messages...

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510016#comment-14510016
 ] 

Benedict commented on CASSANDRA-8789:
-

I should clarify here that I do think MUTATION messages could plausibly delay 
gossip messages where they couldn't before. However REQUEST_RESPONSE messages 
as mentioned above as the potential cause could always cause head of line 
blocking for gossip messages. So my position is only that the head of line 
blocking concern is not a new one, not that its characteristics are identical.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509975#comment-14509975
 ] 

Michael Kjellman commented on CASSANDRA-8789:
-

You should be able to run the old stress for all 1 million rows without 
FD/Gossip down'ing any nodes. I generally just tail the logs while stress runs 
to ensure there are no logs from the Gossiper class (assuming the log level is 
set to the default INFO level).

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509969#comment-14509969
 ] 

Aleksey Yeschenko commented on CASSANDRA-8789:
--

Should we look int the logs before and after for failure detector mentions then?

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509978#comment-14509978
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

This is just an OOM. Nothing special going on WRT to Gossip/FD.

Benedict and I have been positing is that there is no change in behavior from 
previous versions in terms of what messages are contending for access to the 
socket for this workload, and I think that I have confirmed that.

That doesn't mean there aren't some conditions where head of line blocking 
would be an issue for gossip, but I am guessing that they would have to be 
pretty weird. Even then the real solution might actually be to base failure 
detection on all incoming messages and not just Gossip. It's a little weird to 
me that only heartbeats count as liveness/reachability, but it really depends 
on what you are trying to prove.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509992#comment-14509992
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

I can do that with 2.1.2 (I went to 10 million) so it's not a head of line 
blocking issue since gossip is sharing a socket with mutation responses. I 
think flight recorder confirms that by showing a rough bound on how long 
threads are waiting to write to sockets.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-20 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503945#comment-14503945
 ] 

Michael Kjellman commented on CASSANDRA-8789:
-

My testing has shown that relying on message size as a heuristic to determine 
the channel/socket to write to has adverse effects under load. The problem is 
this mixes high priority Command verbs (e.g 
GOSSIP_DIGEST_SYN/GOSSIP_DIGEST_ACK) - that cannot be delayed in any way due to 
the current implementation of FailureDetector - with lower priority 
Response/Data (e.g MUTATION/READ/REQUEST_RESPONSE) verbs. The effect of this 
is that nodes will flap and be considered incorrectly DOWN due to failure in 
sending Gossip verbs which are now queued behind lower priority messages.

The implementation of MessagingService is fire and forget, however we do 
expect for most messages some form of ACK. For instance, each MUTATION expects 
a REQUEST_RESPONSE within a given timeout; otherwise a hint is generated. Here 
lies the problem: the REQUEST_RESPONSE verb is 6 bytes (with no payload -- so 
now considered small). We also have INTERNAL_RESPONSE (also 6 bytes). By 
using size instead of priority, or the old hard coded Command/Data 
implementation, (sending high priority messages like GOSSIP over one channel 
and normal/low priority messages over another) this means the REQUEST_RESPONSE 
for each MUTATION after this change will now be sent over the same channel that 
used to be reserved for GOSSIP (or other high priority Command) verbs.

If the kernel buffers backup sufficiently (although we have the NO_DELAY option 
on the socket, it isn't very difficult under moderate/high load to still 
saturate the NIC) we've now moved an ACK message for every MUTATION onto the 
same socket that is sending GOSSIP messages. Eventually if we backup with 
enough small messages we likely will end up unable to send *important* messages 
(e.g GOSSIP_DIGEST_SYN/GOSSIP_DIGEST_ACK), and FD will falsely be triggered and 
nodes will be marked DOWN incorrectly. Additionally, once we hit this 
condition, we end up flapping as GOSSIP messages eventually get thru which 
compounds the problem.

h4. How to reproduce:
I'm unable to figure out the new stress so I ran the stress from 2.0 against 
trunk (commit sha 1fab7b785dc5e440a773828ff17e927a1f3c2e5f from 4/20/15) with 
all defaults except for changing the replication factor from it's default of 1 
to 3. I'm pretty sure the reason I can't easily reproduce with the new stress 
is I seem to be failing to figure out the command line parsing to change it 
from the default of 8 threads back to the 30 threads default that was in the 
old stress. While it's crazy to run with 30 threads, this simulates enough 
traffic on my 2014 MacBook Pro to actually backup the kernel buffers on 
loopback which will trigger this.

1) Setup a 3 node ccm cluster locally with all defaults (ccm create tcptest 
--install-dir=/Users/username/pathto/cassandra-apache/  ccm populate -n 3  
ccm start)
2) Run stress from 2.0 using all defaults aside from specifying a RF=3 
(tools/bin/cassandra-stress -l 3)
3) Monitor FailureDetector messages in the logs, overall load written, etc

h4. Expected Results:
# Without these changes, stress will not timeout while inserting data. With 
this change, I've now observed timeouts starting 50% of the way thru the 1 
million records. 
{noformat}
Operation [303198] retried 10 times - error inserting key 0303198 
((TTransportException): java.net.SocketException: Broken pipe)
{noformat}

# Although MUTATION messages should/are expected to be dropped under high load 
etc, GOSSIP messages should not fail in being written to the socket in a timely 
manner to avoid FD (FailureDetector) from incorrectly marking nodes DOWN 
incorrectly.
# Amount of inserted load reported in nodetool ring should be ~250MB using the 
2.0 stress tool. On my machine I saw a final load of 1.44MB on node(1), and 
only ~65MB on node(2,3). This is due to FD marking the nodes down and dropping 
mutations and creating hints. (Additionally, once in this state, memory 
overhead get's even worse as we generate unnecessary hints because in the prior 
design we were able to actually write to the socket.)

h4. Alternative Proposal
I'm 100% on board with using a more priority based system to better utilize the 
two channels/sockets we have. For instance: 
MUTATION(2),
READ_REPAIR(3),
REQUEST_RESPONSE(2),
REPLICATION_FINISHED(1),
INTERNAL_RESPONSE(1),
COUNTER_MUTATION(2),
GOSSIP_DIGEST_SYN(1),
GOSSIP_DIGEST_ACK(1),
GOSSIP_DIGEST_ACK2(1),

That way we can use the priorities to route small messages like SNAPSHOT, 
TRUNCATE, GOSSIP_DIGEST_SYN over the high-priority channel and the 
normal-priority messages over the other channel/socket.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 

[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-20 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504050#comment-14504050
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

I can reproduce this using the 2.0 version of stress which is interesting. It 
didn't reproduce with a write only workload of stress on trunk. The why of that 
is probably interesting is well. I will look into it more tomorrow.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-20 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503977#comment-14503977
 ] 

Brandon Williams commented on CASSANDRA-8789:
-

I agree on priority-based messaging.  Gossip is fairly low throughput, but also 
very important to get delivered and should take priority.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504005#comment-14504005
 ] 

Benedict commented on CASSANDRA-8789:
-

I don't doubt there are problems with this, but I'm not sure they're 
significantly worse under the new scheme than the old... Currently messages are 
split along the following boundaries:

REQUEST_RESPONSE,
INTERNAL_RESPONSE,
GOSSIP,

READ,
MUTATION,
COUNTER_MUTATION,
ANTI_ENTROPY,
MIGRATION,
MISC,
TRACING,
READ_REPAIR;

READ_RESPONSE is half of the problem messages you highlighted, and in many 
workloads likely significantly more of a problem than mutations (since with 
clustering data they have the potential to deliver much larger payloads), and 
they currently operate on the same channel as gossip. The main difference is 
that you won't see them on a pure stress write workload; a mixed workload you 
would. So if this is a potentially serious problem, it is likely already being 
exhibited. I should make clear that I'm not disputing there's a problem - this 
seems very clearly something we want to avoid. But I don't think we have made 
matters _worse_ with this ticket (though the profile has perhaps changed).

Introducing extra channels that are managed via NIO for whom we have no 
throughput requirements, only latency, seems like a potential solution to this. 
Or a priority queue and a capped send buffer size (capped low for slow WAN 
connections, for instance). I would quite like to see us abstract 
MessagingService so that not only the transport can be pluggable, but it can be 
different per end-point (e.g. cross-dc), and per message type. I think all of 
these endeavours are orthogonal to this ticket, though, and deserve their own. 

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504158#comment-14504158
 ] 

Benedict commented on CASSANDRA-8789:
-

2.0 stress, AFAICR, does not load balance. By default 2.1 does (smart thrift 
routing round-robins the owning nodes for any token). So all of the writes to 
the cluster are likely being piped through a single node in the 2.0 experiment 
(so over just two tcp connections), instead of evenly spread over six.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-07 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483270#comment-14483270
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

Good catch.

Agreed, it's just for the size estimate which we don't use for anything other 
then a heuristic so current version is fine. I'll set that as the initial value 
for OutboundTcpConnection.targetVersion.

[Code on 
github|https://github.com/apache/cassandra/compare/trunk...aweisberg:C-8789-3?expand=1]

Running unit tests now.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-03-05 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349390#comment-14349390
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

Yes it's in this comment 
https://issues.apache.org/jira/browse/CASSANDRA-8789?focusedCommentId=14320467page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14320467

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-03-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349381#comment-14349381
 ] 

Aleksey Yeschenko commented on CASSANDRA-8789:
--

Looks neat indeed. Not trying to stall progress here, but do we have numbers on 
this (standalone, and/or with 8692 included)? Mostly just curious.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-02-19 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328285#comment-14328285
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

I added a comment to MessageOut.payloadSize.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-02-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324683#comment-14324683
 ] 

Benedict commented on CASSANDRA-8789:
-

It's a nice neat patch. It might be worth commenting on the payloadSize 
memoization that we piggyback on visibility guarantees of the queue we use to 
pass the message to another thread, since we do always pass it, and that once 
handed over we should never call payloadSize() again on the thread that has 
handed off ownership.

When I commit I'll also clean up some legacy cruft, like some generic 
parameters, and normalising the operation over both connections (in one place 
we just list them both, in the other two we construct an array and iterate, I'd 
prefer to do just one). But these are unrelated to this patch.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-02-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324597#comment-14324597
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

I should also add that I originally did this off of C-8692 so that the 
performance measurements would be meaningful since coalescing is a pre-req for 
this to a degree.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-02-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324577#comment-14324577
 ] 

Ariel Weisberg commented on CASSANDRA-8789:
---

Well... I avoid rebasing trunk frequently because a good chunk of the time I do 
that I get something that is not working. Meaning I can't run a benchmark to 
evaluate performance. It also means my baseline is slightly more suspect as 
various things change and I have to take earlier performance numbers with a 
grain of salt.

I rebased off of trunk 
https://github.com/aweisberg/cassandra/compare/C-8789-2?expand=1

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-02-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324537#comment-14324537
 ] 

Benedict commented on CASSANDRA-8789:
-

is this based on latest trunk? got a failed apply. Much prefer github links so 
this isn't a problem :)

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)