[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512166#comment-14512166
 ] 

Michael Kjellman edited comment on CASSANDRA-8789 at 4/25/15 1:41 AM:
----------------------------------------------------------------------

I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

h4. ccm node1 showlog
{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
{noformat}

h4. ccm node2 showlog
{noformat}
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous comment on this ticket) I was able to successfully run 
cassandra-stress -l 3 against without failure.


was (Author: mkjellman):
I just tried the following.

Checkout 8896a70b015102c212d0a27ed1f4e1f0fabe85c4 (which I'm able to insert all 
100k records without issue) and then apply 
828496492c51d7437b690999205ecc941f41a0a9 and 
144644bbf77a546c45db384e2dbc18e13f65c9ce

I started seeing failures 1/3 of the way thru stress with messages like the 
following in the logs

{noformat}
WARN  [GossipTasks:1] 2015-04-24 18:32:16,832 Gossiper.java:685 - Gossip stage 
has 3 pending tasks; skipping status check (no nodes will be marked down)
INFO  [GossipTasks:1] 2015-04-24 18:32:40,995 Gossiper.java:938 - InetAddress 
/127.0.0.1 is now DOWN
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:42,002 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:47,004 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,005 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:52,010 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,010 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:32:57,011 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,012 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:02,022 
OutboundTcpConnection.java:485 - Handshaking version with /127.0.0.1
INFO  [HANDSHAKE-/127.0.0.1] 2015-04-24 18:33:07,023 
OutboundTcpConnection.java:494 - Cannot handshake version with /127.0.0.1
{noformat}

So, in summary, I am able to cause Gossiper/FD to DOWN nodes and have 2.0 
stress fail with the changes to OutboundTcpConnection/OutboundTcpConnectionPool 
(828496492c51d7437b690999205ecc941f41a0a9/144644bbf77a546c45db384e2dbc18e13f65c9ce)
 applied against (8896a70b015102c212d0a27ed1f4e1f0fabe85c4) which (detailed in 
previous comment on this ticket) I was able to successfully run 
cassandra-stress -l 3 against without failure.

> OutboundTcpConnectionPool should route messages to sockets by size not type
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 3.0
>
>         Attachments: 8789.diff
>
>
> I was looking at this trying to understand what messages flow over which 
> connection.
> For reads the request goes out over the command connection and the response 
> comes back over the ack connection.
> For writes the request goes out over the command connection and the response 
> comes back over the command connection.
> Reads get a dedicated socket for responses. Mutation commands and responses 
> both travel over the same socket along with read requests.
> Sockets are used uni-directional so there are actually four sockets in play 
> and four threads at each node (2 inbounded, 2 outbound).
> CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
> If someone remembers what situations were made better it would be good to 
> know.
> I am not clear on when/how this is helpful. The consumer side shouldn't be 
> blocking so the only head of line blocking issue is the time it takes to 
> transfer data over the wire.
> If message size is the cause of blocking issues then the current design mixes 
> small messages and large messages on the same connection retaining the head 
> of line blocking.
> Read requests share the same connection as write requests (which are large), 
> and write acknowledgments (which are small) share the same connections as 
> write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to