[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512587#comment-14512587 ] Benedict edited comment on CASSANDRA-8789 at 4/25/15 4:50 PM: -- bq. Gossip always contended with mutation *responses* and read responses. I suspect there may be an issue with nomenclature here. These statements made by Ariel are both true, but the internal nomenclature for both of these is REQUEST_RESPONSE. But to further clarify, the distinction has never been command/data, but between command/acknowledgement. Where acknowledgement in the case of a read request includes the entire data for serving that read request. was (Author: benedict): bq. Gossip always contended with mutation *responses* and read responses. I suspect there may be an issue with nomenclature here. These statements made by Ariel are both true, but the internal nomenclature for both of these is REQUEST_RESPONSE. OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512580#comment-14512580 ] Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 4:37 PM: -- {quote} Gossip always contended with mutation responses and read responses. {quote} No, they didn't. This is why there were two sockets in the first place. A Command socket and a Data socket. I have said since day one when I raised this as a concern that with changes to Gossip (large and definitely outside the scope of 3.0) could be made so this might not be an issue. Today with these changes and today's Gossip implementation -- this is a regression. was (Author: mkjellman): {quote} Gossip always contended with mutation responses and read responses. {quote} No, they didn't. This is why there were two sockets in the first place. A Command and socket and a Data socket. I have said since day one when I raised this as a concern that with changes to Gossip (large and definitely outside the scope of 3.0) could be made so this might not be an issue. Today with these changes and today's Gossip implementation -- this is a regression. OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511879#comment-14511879 ] Ariel Weisberg edited comment on CASSANDRA-8789 at 4/24/15 10:09 PM: - [~xedin] I tried to reproduce what Michael described and I found a root cause that is different and it seems to be an issue across multiple versions, both with and without the changes to OTCP. IOW I think it is unrelated to this ticket. It's definitely worth reproducing the problem Michael is talking about which is why I created a ticket for that specific issue. AFAIK no one besides myself has tested with and without this change on trunk and found that it has an impact. [~mkjellman] if you try and run this using your reproducer steps if you let it hang long enough do you get the heap dump and OOM? If you revert the change are you saying everything starts to work for you? The reason I think you are seeing the same thing I am is that it flakes out at 300k for the reason I mentioned earlier (only 250k fits on heap). was (Author: aweisberg): [~xedin] I tried to reproduce what Michael described and I found a root cause that is different and it seems to be an issue across multiple versions, both with and without the changes to OTCP. IOW I think it is unrelated to this ticket. It's definitely worth reproducing the problem Michael is talking about which is why I created a ticket for that specific issue. AFAIK no one besides myself has tested with and without this change on trunk and found that it has an impact. [~mkjellman] if you try and run this using your reproducer steps if you let it hang long enough do you get the heap dump and OOM? If you revert the change are you saying everything starts to work for you? OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511879#comment-14511879 ] Ariel Weisberg edited comment on CASSANDRA-8789 at 4/24/15 10:09 PM: - [~xedin] I tried to reproduce what Michael described and I found a root cause that is different and it seems to be an issue across multiple versions, both with and without the changes to OTCP. IOW I think it is unrelated to this ticket. It's definitely worth reproducing the problem Michael is talking about which is why I created a ticket for that specific issue. AFAIK no one besides myself has tested with and without this change on trunk and found that it has an impact. [~mkjellman] if you try and run this using your reproducer steps if you let it hang long enough do you get the heap dump and OOM? If you revert the change are you saying everything starts to work for you? was (Author: aweisberg): [~xedin] I tried to reproduce what Michael described and I found a root cause that is different and it seems to be an issue across multiple versions. IOW I think it is unrelated to this ticket. It's definitely worth reproducing the problem Michael is talking about which is why I created a ticket for that specific issue. AFAIK no one besides myself has tested with and without this change on trunk and found that it has an impact. [~mkjellman] if you try and run this using your reproducer steps if you let it hang long enough do you get the heap dump and OOM? If you revert the change are you saying everything starts to work for you? OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166 ] Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:41 AM: -- I just tried the following. Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 100k records without issue) and then apply 828496492c51d7437b690999205ecc941f41a0a9 and 144644bbf77a546c45db384e2dbc18e13f65c9ce I started seeing failures 1/3 of the way thru stress with messages like the following in the logs h4. ccm node1 showlog {noformat} WARN [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage has 3 pending tasks; skipping status check (no nodes will be marked down) INFO [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress /127.0.0.1 is now DOWN {noformat} h4. ccm node2 showlog {noformat} INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 {noformat} So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool (828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce) applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in previous comment on this ticket) I was able to successfully run cassandra-stress -l 3 against without failure. was (Author: mkjellman): I just tried the following. Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 100k records without issue) and then apply 828496492c51d7437b690999205ecc941f41a0a9 and 144644bbf77a546c45db384e2dbc18e13f65c9ce I started seeing failures 1/3 of the way thru stress with messages like the following in the logs {noformat} WARN [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage has 3 pending tasks; skipping status check (no nodes will be marked down) INFO [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress /127.0.0.1 is now DOWN INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 {noformat} So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool (828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce) applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in previous comment on this ticket) I was able to successfully run
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166 ] Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:46 AM: -- I just tried the following. Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 100k records without issue) and then apply 828496492c51d7437b690999205ecc941f41a0a9 and 144644bbf77a546c45db384e2dbc18e13f65c9ce I started seeing failures 1/3 of the way thru stress with messages like the following in the logs h4. ccm node1 showlog {noformat} WARN [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage has 3 pending tasks; skipping status check (no nodes will be marked down) {noformat} h4. ccm node2 showlog {noformat} INFO [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress /127.0.0.1 is now DOWN INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 {noformat} So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool (828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce) applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in previous comment on this ticket) I was able to successfully run cassandra-stress -l 3 against without failure. was (Author: mkjellman): I just tried the following. Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 100k records without issue) and then apply 828496492c51d7437b690999205ecc941f41a0a9 and 144644bbf77a546c45db384e2dbc18e13f65c9ce I started seeing failures 1/3 of the way thru stress with messages like the following in the logs h4. ccm node1 showlog {noformat} WARN [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage has 3 pending tasks; skipping status check (no nodes will be marked down) INFO [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress /127.0.0.1 is now DOWN {noformat} h4. ccm node2 showlog {noformat} INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 {noformat} So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool (828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce) applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in previous
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512166#comment-14512166 ] Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:39 AM: -- I just tried the following. Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 100k records without issue) and then apply 828496492c51d7437b690999205ecc941f41a0a9 and 144644bbf77a546c45db384e2dbc18e13f65c9ce I started seeing failures 1/3 of the way thru stress with messages like the following in the logs {noformat} WARN [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage has 3 pending tasks; skipping status check (no nodes will be marked down) INFO [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress /127.0.0.1 is now DOWN INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 {noformat} So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool (828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce) applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in previous comment on this ticket) I was able to successfully run cassandra-stress -l 3 against without failure. was (Author: mkjellman): I just tried the following. Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 100k records without issue) and then apply 828496492c51d7437b690999205ecc941f41a0a9 and 144644bbf77a546c45db384e2dbc18e13f65c9ce I started seeing failures 1/3 of the way thru stress with messages like the following in the logs {noformat} WARN [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage has 3 pending tasks; skipping status check (no nodes will be marked down) INFO [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress /127.0.0.1 is now DOWN INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1 INFO [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1 {noformat} So, in summary, I am able reproduces and have 2.0 stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool (828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce) applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which I can successfully run cassandra-stress -l 3 without failure. OutboundTcpConnectionPool should route messages to sockets by size not type
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511331#comment-14511331 ] Ariel Weisberg edited comment on CASSANDRA-8789 at 4/24/15 4:58 PM: I was able to reproduce the OOM once in 2.1.2. I have found that the mutation stage is filling up with tasks and they look like responses to writes. In 2.1.2 when it succeeds it kind of looks like it is just dropping the messages. The reason it fails at 300k is that some 50k or so get processed and 250k back up causing OOM. We could try and do some things to make this more robust against overload. Say by having the producer (IncomingTcpConnection) detect overload and start dropping messages without relying on the consumer (MutationStage) to drop them. I am leaning towards not trying to fix this wart because it requires somewhat unrealistic conditions. There has to be no load balancing, a heap that is too small, and an oversubscribed instance. The appropriate (if flawed) load shedding mechanism is in place, and there are already tickets to deal with the issue of having to much in flight data. [~mkjellman] I created a CASSANDRA-9237 for the issue of Gossip sharing a connection with most traffic. was (Author: aweisberg): I was able to reproduce the OOM once in 2.1.2. I have found that the mutation stage is filling up with tasks and they look like responses to writes. In 2.1.2 when it succeeds it kind of looks like it is just dropping the messages. The reason it fails at 300k is that some 50k or so get processed and 250k back up causing OOM. We could try and do some things to make this more robust against overload. Say by having the producer (IncomingTcpConnection) detect overload and start dropping messages without relying on the consumer (MutationStage) to drop them. I am leaning towards not trying to fix this wart because it requires somewhat unrealistic conditions. There has to be no load balancing, a heap that is too small, and an oversubscribed instance. [~mkjellman] I created a CASSANDRA-9237 for the issue of Gossip sharing a connection with most traffic. OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509731#comment-14509731 ] Ariel Weisberg edited comment on CASSANDRA-8789 at 4/23/15 8:46 PM: [~mkjellman] I tried this reverting the socket change and initially I thought it mattered, but I think I was swapping when it passed with the change reverted. I tried it three times and they do the same thing. The first node OOMs and the heap dump blames tasks sitting in SEPExecutor. I also ran with flight recorder and checked the node serving client traffic and one of the other nodes. There is some significant blocking on the coordinating node, but the longest pause was 300 milliseconds and total duration was 2 seconds for a 1 minute period (200 pauses). If I chased those down I bet they are correlated with GC pauses. I was able to get 2.1.2 to write hints, but not to fail the same way that trunk does with SEPExecutor OOM. Still digging into why trunk fares worse. I checked and disabling coalescing and reverting the change to OutboundTcpConnectionPool doesn't make things better. was (Author: aweisberg): [~mkjellman] I tried this reverting the socket change and initially I thought it mattered, but I think I was swapping when it passed with the change reverted. I tried it three times and they do the same thing. The first node OOMs and the heap dump blames tasks sitting in SEPExecutor. I also ran with flight recorder and checked the node serving client traffic and one of the other nodes. There is some significant blocking on the coordinating node, but the longest pause was 300 milliseconds and total duration was 2 seconds for a 1 minute period (200 pauses). If I chased those down I bet they are correlated with GC pauses. I was able to get 2.1.2 to write hints, but not to fail the same way that trunk does with SEPExecutor OOM. Still digging into why trunk fares worse. OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510016#comment-14510016 ] Benedict edited comment on CASSANDRA-8789 at 4/23/15 10:54 PM: --- I should clarify here that I do think MUTATION messages could plausibly delay gossip messages where they couldn't before. However REQUEST_RESPONSE messages as mentioned above as the potential cause could always cause head of line blocking for gossip messages. So my position is only that the head of line blocking concern is not a new one, not that its characteristics are identical. I don't however have any data/position on what these theoretical analyses have on the perceived issue. was (Author: benedict): I should clarify here that I do think MUTATION messages could plausibly delay gossip messages where they couldn't before. However REQUEST_RESPONSE messages as mentioned above as the potential cause could always cause head of line blocking for gossip messages. So my position is only that the head of line blocking concern is not a new one, not that its characteristics are identical. OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8789) OutboundTcpConnectionPool should route messages to sockets by size not type
[ https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504158#comment-14504158 ] Benedict edited comment on CASSANDRA-8789 at 4/21/15 2:05 AM: -- 2.0 stress, AFAICR, does not load balance. By default 2.1 does (smart thrift routing round-robins the owning nodes for any token). So all of the writes to the cluster are likely being piped through a single node in the 2.0 experiment (so over just two tcp connections), instead of evenly spread all three (i.e. six tcp connections). was (Author: benedict): 2.0 stress, AFAICR, does not load balance. By default 2.1 does (smart thrift routing round-robins the owning nodes for any token). So all of the writes to the cluster are likely being piped through a single node in the 2.0 experiment (so over just two tcp connections), instead of evenly spread over six. OutboundTcpConnectionPool should route messages to sockets by size not type --- Key: CASSANDRA-8789 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 8789.diff I was looking at this trying to understand what messages flow over which connection. For reads the request goes out over the command connection and the response comes back over the ack connection. For writes the request goes out over the command connection and the response comes back over the command connection. Reads get a dedicated socket for responses. Mutation commands and responses both travel over the same socket along with read requests. Sockets are used uni-directional so there are actually four sockets in play and four threads at each node (2 inbounded, 2 outbound). CASSANDRA-488 doesn't leave a record of what the impact of this change was. If someone remembers what situations were made better it would be good to know. I am not clear on when/how this is helpful. The consumer side shouldn't be blocking so the only head of line blocking issue is the time it takes to transfer data over the wire. If message size is the cause of blocking issues then the current design mixes small messages and large messages on the same connection retaining the head of line blocking. Read requests share the same connection as write requests (which are large), and write acknowledgments (which are small) share the same connections as write requests. The only winner is read acknowledgements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)