[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512587#comment-14512587
 ] 

Benedict edited comment on CASSANDRA-8789 at 4/25/15 4:50 PM:
--

bq. Gossip always contended with mutation *responses* and read responses.

I suspect there may be an issue with nomenclature here. These statements made 
by Ariel are both true, but the internal nomenclature for both of these is 
REQUEST_RESPONSE. But to further clarify, the distinction has never been 
command/data, but between command/acknowledgement. Where acknowledgement in the 
case of a read request includes the entire data for serving that read request.


was (Author: benedict):
bq. Gossip always contended with mutation *responses* and read responses.

I suspect there may be an issue with nomenclature here. These statements made 
by Ariel are both true, but the internal nomenclature for both of these is 
REQUEST_RESPONSE.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-25 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512580#comment-14512580
 ] 

Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 4:37 PM:
--

{quote}
Gossip always contended with mutation responses and read responses.
{quote}

No, they didn't. This is why there were two sockets in the first place. A 
Command socket and a Data socket. I have said since day one when I raised this 
as a concern that with changes to Gossip (large and definitely outside the 
scope of 3.0) could be made so this might not be an issue.

Today with these changes and today's Gossip implementation -- this is a 
regression.


was (Author: mkjellman):
{quote}
Gossip always contended with mutation responses and read responses.
{quote}

No, they didn't. This is why there were two sockets in the first place. A 
Command and socket and a Data socket. I have said since day one when I raised 
this as a concern that with changes to Gossip (large and definitely outside the 
scope of 3.0) could be made so this might not be an issue.

Today with these changes and today's Gossip implementation -- this is a 
regression.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511879#comment-14511879
 ] 

Ariel Weisberg edited comment on CASSANDRA-8789 at 4/24/15 10:09 PM:
-

[~xedin] I tried to reproduce what Michael described and I found a root cause 
that is different and it seems to be an issue across multiple versions, both 
with and without the changes to OTCP. IOW I think it is unrelated to this 
ticket.

It's definitely worth reproducing the problem Michael is talking about which is 
why I created a ticket for that specific issue. AFAIK no one besides myself has 
tested with and without this change on trunk and found that it has an impact.

[~mkjellman] if you try and run this using your reproducer steps if you let it 
hang long enough do you get the heap dump and OOM? If you revert the change are 
you saying everything starts to work for you?

The reason I think you are seeing the same thing I am is that it flakes out at 
300k for the reason I mentioned earlier (only 250k fits on heap).


was (Author: aweisberg):
[~xedin] I tried to reproduce what Michael described and I found a root cause 
that is different and it seems to be an issue across multiple versions, both 
with and without the changes to OTCP. IOW I think it is unrelated to this 
ticket.

It's definitely worth reproducing the problem Michael is talking about which is 
why I created a ticket for that specific issue. AFAIK no one besides myself has 
tested with and without this change on trunk and found that it has an impact.

[~mkjellman] if you try and run this using your reproducer steps if you let it 
hang long enough do you get the heap dump and OOM? If you revert the change are 
you saying everything starts to work for you?

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511879#comment-14511879
 ] 

Ariel Weisberg edited comment on CASSANDRA-8789 at 4/24/15 10:09 PM:
-

[~xedin] I tried to reproduce what Michael described and I found a root cause 
that is different and it seems to be an issue across multiple versions, both 
with and without the changes to OTCP. IOW I think it is unrelated to this 
ticket.

It's definitely worth reproducing the problem Michael is talking about which is 
why I created a ticket for that specific issue. AFAIK no one besides myself has 
tested with and without this change on trunk and found that it has an impact.

[~mkjellman] if you try and run this using your reproducer steps if you let it 
hang long enough do you get the heap dump and OOM? If you revert the change are 
you saying everything starts to work for you?


was (Author: aweisberg):
[~xedin] I tried to reproduce what Michael described and I found a root cause 
that is different and it seems to be an issue across multiple versions. IOW I 
think it is unrelated to this ticket.

It's definitely worth reproducing the problem Michael is talking about which is 
why I created a ticket for that specific issue. AFAIK no one besides myself has 
tested with and without this change on trunk and found that it has an impact.

[~mkjellman] if you try and run this using your reproducer steps if you let it 
hang long enough do you get the heap dump and OOM? If you revert the change are 
you saying everything starts to work for you?

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166
 ] 

Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:41 AM:
--

I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

h4. ccm node1 showlog
{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
{noformat}

h4. ccm node2 showlog
{noformat}
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous comment on this ticket) I was able to successfully run 
cassandra-stress -l 3 against without failure.


was (Author: mkjellman):
I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous comment on this ticket) I was able to successfully run 

[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166
 ] 

Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:46 AM:
--

I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

h4. ccm node1 showlog
{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
{noformat}

h4. ccm node2 showlog
{noformat}
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous comment on this ticket) I was able to successfully run 
cassandra-stress -l 3 against without failure.


was (Author: mkjellman):
I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

h4. ccm node1 showlog
{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
{noformat}

h4. ccm node2 showlog
{noformat}
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous 

[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166
 ] 

Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:39 AM:
--

I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous comment on this ticket) I was able to successfully run 
cassandra-stress -l 3 against without failure.


was (Author: mkjellman):
I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able reproduces and have 2.0 stress fail with the changes 
to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which I can 
successfully run cassandra-stress -l 3 without failure.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 

[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511331#comment-14511331
 ] 

Ariel Weisberg edited comment on CASSANDRA-8789 at 4/24/15 4:58 PM:


I was able to reproduce the OOM once in 2.1.2. I have found that the mutation 
stage is filling up with tasks and they look like responses to writes. In 2.1.2 
when it succeeds it kind of looks like it is just dropping the messages. 

The reason it fails at 300k is that some 50k or so get processed and 250k back 
up causing OOM. We could try and do some things to make this more robust 
against overload. Say by having the producer (IncomingTcpConnection) detect 
overload and start dropping messages without relying on the consumer 
(MutationStage) to drop them.

I am leaning towards not trying to fix this wart because it requires somewhat 
unrealistic conditions. There has to be no load balancing, a heap that is too 
small, and an oversubscribed instance. The appropriate (if flawed) load 
shedding mechanism is in place, and there are already tickets to deal with the 
issue of having to much in flight data.

[~mkjellman] I created a CASSANDRA-9237 for the issue of Gossip sharing a 
connection with most traffic.


was (Author: aweisberg):
I was able to reproduce the OOM once in 2.1.2. I have found that the mutation 
stage is filling up with tasks and they look like responses to writes. In 2.1.2 
when it succeeds it kind of looks like it is just dropping the messages. 

The reason it fails at 300k is that some 50k or so get processed and 250k back 
up causing OOM. We could try and do some things to make this more robust 
against overload. Say by having the producer (IncomingTcpConnection) detect 
overload and start dropping messages without relying on the consumer 
(MutationStage) to drop them.

I am leaning towards not trying to fix this wart because it requires somewhat 
unrealistic conditions. There has to be no load balancing, a heap that is too 
small, and an oversubscribed instance.

[~mkjellman] I created a CASSANDRA-9237 for the issue of Gossip sharing a 
connection with most traffic.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509731#comment-14509731
 ] 

Ariel Weisberg edited comment on CASSANDRA-8789 at 4/23/15 8:46 PM:


[~mkjellman] I tried this reverting the socket change and initially I thought 
it mattered, but I think I was swapping when it passed with the change reverted.

I tried it three times and they do the same thing. The first node OOMs and the 
heap dump blames tasks sitting in SEPExecutor.

I also ran with flight recorder and checked the node serving client traffic and 
one of the other nodes. There is some significant blocking on the coordinating 
node, but the longest pause was 300 milliseconds and total duration was 2 
seconds for a 1 minute period (200 pauses). If I chased those down I bet they 
are correlated with GC pauses.

I was able to get 2.1.2 to write hints, but not to fail the same way that trunk 
does with SEPExecutor OOM. Still digging into why trunk fares worse.

I checked and disabling coalescing and reverting the change to 
OutboundTcpConnectionPool doesn't make things better.


was (Author: aweisberg):
[~mkjellman] I tried this reverting the socket change and initially I thought 
it mattered, but I think I was swapping when it passed with the change reverted.

I tried it three times and they do the same thing. The first node OOMs and the 
heap dump blames tasks sitting in SEPExecutor.

I also ran with flight recorder and checked the node serving client traffic and 
one of the other nodes. There is some significant blocking on the coordinating 
node, but the longest pause was 300 milliseconds and total duration was 2 
seconds for a 1 minute period (200 pauses). If I chased those down I bet they 
are correlated with GC pauses.

I was able to get 2.1.2 to write hints, but not to fail the same way that trunk 
does with SEPExecutor OOM. Still digging into why trunk fares worse.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510016#comment-14510016
 ] 

Benedict edited comment on CASSANDRA-8789 at 4/23/15 10:54 PM:
---

I should clarify here that I do think MUTATION messages could plausibly delay 
gossip messages where they couldn't before. However REQUEST_RESPONSE messages 
as mentioned above as the potential cause could always cause head of line 
blocking for gossip messages. So my position is only that the head of line 
blocking concern is not a new one, not that its characteristics are identical. 
I don't however have any data/position on what these theoretical analyses have 
on the perceived issue.


was (Author: benedict):
I should clarify here that I do think MUTATION messages could plausibly delay 
gossip messages where they couldn't before. However REQUEST_RESPONSE messages 
as mentioned above as the potential cause could always cause head of line 
blocking for gossip messages. So my position is only that the head of line 
blocking concern is not a new one, not that its characteristics are identical.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type

2015-04-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504158#comment-14504158
 ] 

Benedict edited comment on CASSANDRA-8789 at 4/21/15 2:05 AM:
--

2.0 stress, AFAICR, does not load balance. By default 2.1 does (smart thrift 
routing round-robins the owning nodes for any token). So all of the writes to 
the cluster are likely being piped through a single node in the 2.0 experiment 
(so over just two tcp connections), instead of evenly spread all three (i.e. 
six tcp connections).


was (Author: benedict):
2.0 stress, AFAICR, does not load balance. By default 2.1 does (smart thrift 
routing round-robins the owning nodes for any token). So all of the writes to 
the cluster are likely being piped through a single node in the 2.0 experiment 
(so over just two tcp connections), instead of evenly spread over six.

 OutboundTcpConnectionPool should route messages to sockets by size not type
 ---

 Key: CASSANDRA-8789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 8789.diff


 I was looking at this trying to understand what messages flow over which 
 connection.
 For reads the request goes out over the command connection and the response 
 comes back over the ack connection.
 For writes the request goes out over the command connection and the response 
 comes back over the command connection.
 Reads get a dedicated socket for responses. Mutation commands and responses 
 both travel over the same socket along with read requests.
 Sockets are used uni-directional so there are actually four sockets in play 
 and four threads at each node (2 inbounded, 2 outbound).
 CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
 If someone remembers what situations were made better it would be good to 
 know.
 I am not clear on when/how this is helpful. The consumer side shouldn't be 
 blocking so the only head of line blocking issue is the time it takes to 
 transfer data over the wire.
 If message size is the cause of blocking issues then the current design mixes 
 small messages and large messages on the same connection retaining the head 
 of line blocking.
 Read requests share the same connection as write requests (which are large), 
 and write acknowledgments (which are small) share the same connections as 
 write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)