[ 
https://issues.apache.org/jira/browse/CASSANDRA-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444266#comment-16444266
 ] 

Dinesh Joshi commented on CASSANDRA-14389:
------------------------------------------

I found the issue. When you leave the local side of the socket unbound, the 
kernel will prefer the IP address that matches the remote IP. Say node1 with IP 
{{127.0.0.1}} wants to open a connection to node2 with IP {{127.0.0.2}}, the 
socket would look like {{<127.0.0.2:61002, 127.0.0.2:7000>}} on node1. This 
seems to confuse the streaming code. Here's how -

Say we have three nodes node1, node2 & node3 with IPs {{127.0.0.1, 127.0.0.2, 
127.0.0.3}}. node1 has data and node3 is bootstrapping. It requests a stream 
from node1. So node3 is the `peer` in this case and node1's code execution is 
described below -

* node1 receives the request ({{StreamingInboundHandler#deriveSession}}) and 
{{StreamResultFuture#initReceivingSide}} creates a new {{StreamResultFuture}} 
and calls {{attachConnection()}}. At this point it has two sets of IP & Ports 
from the peer. They are identified by the variable `{{from}}` & expression 
`{{channel.remoteAddress()}}` a.k.a `{{connecting}}` ).
* {{StreamResultFuture#attachConnection calls 
StreamCoordinator#getOrCreateSessionById}} passing the from IP & 
{{InetAddressAndPort.getByAddressOverrideDefaults(connecting, from.port)}} (!!!)
* The key observation here is `from` is the IP that the peer sent in the 
`{{StreamMessageHeader}}` while `connecting` is the remote IP of the peer.
* {{StreamCoordinator#getOrCreateSessionById}} subsequently calls 
{{StreamCoordinator#getOrCreateHostData(peer)}}. So we're indexing the 
{{peerSessions}} by the `{{peer}}` IP address. We also end up creating a 
`{{StreamSession}}` in the process.
* During `{{StreamSession}}` creation, we end up passing the `{{peer}}` and 
`{{connecting}}` IPs. We use the `connecting` IP to establish the outbound 
connection to the peer. ({{NettyStreamingMessageSender}} is now connected to 
`{{connecting}}` IP on port {{7000}}).

In our case, since we leave the local side of the socket unbound, although the 
`{{peer}}` correctly sets its IP to {{127.0.0.3}} in the 
{{StreamMessageHeader}}, the {{localAddress}} that the kernel chooses for it is 
{{127.0.0.1}}. On the inbound node1 seems to think that the `peer` is 
{{127.0.0.3}} however the connecting IP address should be {{127.0.0.1}}. 
Therefore, it prefers that IP when trying to establish an outbound session. In 
fact it establishes a connection to itself leading to the `{{Unknown peer 
requested: 127.0.0.1:7000}}` exception. Note that along the way it actually 
drops the ephemeral port and instead uses the port returned by 
{{MessagingService#portFor}}.

Streaming code seems to rely on the perceived remote IP address of the host 
rather than the one that is set in the message header. I am not sure if 
preferring the IP address set in the header is the correct approach.

> Resolve local address binding in 4.0
> ------------------------------------
>
>                 Key: CASSANDRA-14389
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14389
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 4.x
>
>
> CASSANDRA-8457/CASSANDRA-12229 introduced a regression against 
> CASSANDRA-12673. This was discovered with CASSANDRA-14362 and moved here for 
> resolution independent of that ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to