[ https://issues.apache.org/jira/browse/CASSANDRA-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444266#comment-16444266 ]
Dinesh Joshi commented on CASSANDRA-14389: ------------------------------------------ I found the issue. When you leave the local side of the socket unbound, the kernel will prefer the IP address that matches the remote IP. Say node1 with IP {{127.0.0.1}} wants to open a connection to node2 with IP {{127.0.0.2}}, the socket would look like {{<127.0.0.2:61002, 127.0.0.2:7000>}} on node1. This seems to confuse the streaming code. Here's how - Say we have three nodes node1, node2 & node3 with IPs {{127.0.0.1, 127.0.0.2, 127.0.0.3}}. node1 has data and node3 is bootstrapping. It requests a stream from node1. So node3 is the `peer` in this case and node1's code execution is described below - * node1 receives the request ({{StreamingInboundHandler#deriveSession}}) and {{StreamResultFuture#initReceivingSide}} creates a new {{StreamResultFuture}} and calls {{attachConnection()}}. At this point it has two sets of IP & Ports from the peer. They are identified by the variable `{{from}}` & expression `{{channel.remoteAddress()}}` a.k.a `{{connecting}}` ). * {{StreamResultFuture#attachConnection calls StreamCoordinator#getOrCreateSessionById}} passing the from IP & {{InetAddressAndPort.getByAddressOverrideDefaults(connecting, from.port)}} (!!!) * The key observation here is `from` is the IP that the peer sent in the `{{StreamMessageHeader}}` while `connecting` is the remote IP of the peer. * {{StreamCoordinator#getOrCreateSessionById}} subsequently calls {{StreamCoordinator#getOrCreateHostData(peer)}}. So we're indexing the {{peerSessions}} by the `{{peer}}` IP address. We also end up creating a `{{StreamSession}}` in the process. * During `{{StreamSession}}` creation, we end up passing the `{{peer}}` and `{{connecting}}` IPs. We use the `connecting` IP to establish the outbound connection to the peer. ({{NettyStreamingMessageSender}} is now connected to `{{connecting}}` IP on port {{7000}}). In our case, since we leave the local side of the socket unbound, although the `{{peer}}` correctly sets its IP to {{127.0.0.3}} in the {{StreamMessageHeader}}, the {{localAddress}} that the kernel chooses for it is {{127.0.0.1}}. On the inbound node1 seems to think that the `peer` is {{127.0.0.3}} however the connecting IP address should be {{127.0.0.1}}. Therefore, it prefers that IP when trying to establish an outbound session. In fact it establishes a connection to itself leading to the `{{Unknown peer requested: 127.0.0.1:7000}}` exception. Note that along the way it actually drops the ephemeral port and instead uses the port returned by {{MessagingService#portFor}}. Streaming code seems to rely on the perceived remote IP address of the host rather than the one that is set in the message header. I am not sure if preferring the IP address set in the header is the correct approach. > Resolve local address binding in 4.0 > ------------------------------------ > > Key: CASSANDRA-14389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14389 > Project: Cassandra > Issue Type: Bug > Reporter: Jason Brown > Assignee: Jason Brown > Priority: Minor > Fix For: 4.x > > > CASSANDRA-8457/CASSANDRA-12229 introduced a regression against > CASSANDRA-12673. This was discovered with CASSANDRA-14362 and moved here for > resolution independent of that ticket. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org