[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair

J.B. Langston (JIRA) Wed, 08 Oct 2014 09:18:32 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163680#comment-14163680
 ]


J.B. Langston commented on CASSANDRA-8084:
------------------------------------------

Here is the AWS cluster used to reproduce this:

{code}
automaton@ip-172-31-0-237:~$ nodetool status
Note: Ownership information does not include topology; for complete 
information, specify a keyspace

Datacenter: aws_east

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.165.86.246 304.01 MB 256 26.8% 1042deb8-5395-42b1-adf4-2a373149b052 rack1
UN 54.209.121.225 302.82 MB 256 21.8% 7e7499c2-acfb-4eda-b786-7878907038b8 rack1

Datacenter: aws_west

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.183.246.79 79.01 MB 256 24.7% 9a4450a4-d00b-407c-8217-464ca5d3d74c rack1
UN 54.183.249.149 319.14 MB 256 26.7% cb6579d4-3eac-48c6-a8c0-ca30071a97e8 rack1
{code}

Here is the test case I ran to reproduce this:

1) Run cassandra-stress once to create Keyspace1 and Standard1 CF.

2) Alter keyspace with replication to all nodes:

{code}
ALTER KEYSPACE "Keyspace1" WITH replication = {  'class': 
'NetworkTopologyStrategy',  'aws_east': '2',  'aws_west': '2' };
{code}

3) Shut down one of the nodes in aws_west.

4) Run cassandra-stress on the other node in aws-west (just cassandra-stress 
with no options). Let it finish.

5) Start back up the node.

6) Run nodetool repair -local

7) Repair and streaming messages in system.log will show that it is using the 
broadcast IP for nodes in the same DC.  You can also watch the connections 
being established over the broadcast IP with this command:

{code}
sudo netstat -anp | grep 7000 | sort -k5
{code}

This was conducted on DSE with GPFS. We should repeat with EC2MRS on DSE and 
with GPFS on Apache Cassandra/DSC.

Here is the netstat output showing that it is establishing connections to the 
node in the same DC (54.183.249.149). This command is being run on 
54.183.246.79, so it should have used the private 172 address to talk to 
54.183.249.149 instead.

{code}
automaton@ip-172-31-0-237:~$ sudo netstat -anp | grep 7000 | sort -k5
tcp 0 0 172.31.0.237:7000 0.0.0.0:* LISTEN 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54148 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54149 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54150 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54148 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54149 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54150 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.4.163:56894 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.4.163:56895 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:55510 172.31.4.163:7000 ESTABLISHED 8959/java
tcp 0 35 172.31.0.237:55504 172.31.4.163:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 54.165.86.246:36101 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:50600 54.165.86.246:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:50606 54.165.86.246:7000 ESTABLISHED 8959/java
tcp 1 0 172.31.0.237:60588 54.183.249.149:7000 CLOSE_WAIT 8959/java
tcp 0 0 172.31.0.237:60587 54.183.249.149:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:60505 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60508 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60509 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60511 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60513 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60514 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60515 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60517 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60521 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60523 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60524 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60527 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60528 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60532 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60534 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60536 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60538 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60544 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60546 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60552 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60554 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60560 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60562 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60564 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60565 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60566 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60568 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60570 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60572 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60578 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60580 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60582 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60584 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60586 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:7000 54.209.121.225:51246 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:52200 54.209.121.225:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:52209 54.209.121.225:7000 ESTABLISHED 8959/java
{code}

> GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE 
> clusters doesnt use the PRIVATE IPS for Intra-DC communications - When 
> running nodetool repair
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8084
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Config
>         Environment: Tested this in GCE and AWS clusters. Created multi 
> region and multi dc cluster once in GCE and once in AWS and ran into the same 
> problem. 
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.3 LTS"
> NAME="Ubuntu"
> VERSION="12.04.3 LTS, Precise Pangolin"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu precise (12.04.3 LTS)"
> VERSION_ID="12.04"
> Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also 
> latest DSE version which is 4.5 and which corresponds to 2.0.8.39.
>            Reporter: Jana
>              Labels: features
>             Fix For: 2.0.10
>
>
> Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) 
> used the PRIVATE IPS for communication between INTRA-DC nodes in my 
> multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran 
> "nodetool repair -local". It works fine during regular reads.
>  Here are the various cluster flavors I tried and failed- 
> AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
> (Prefer_local=true) in rackdc-properties file. 
> AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
> rackdc-properties file. 
> GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
> (Prefer_local=true) in rackdc-properties file. 
> GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
> rackdc-properties file. 
> I am expecting with the above setup all of my nodes in a given DC all 
> communicate via private ips since the cloud providers dont charge us for 
> using the private ips and they charge for using public ips.
> But they can use PUBLIC IPs for INTER-DC communications which is working as 
> expected. 
> Here is a snippet from my log files when I ran the "nodetool repair -local" - 
> Node responding to 'node running repair' 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) 
> [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
> to /54.172.118.222 for system_traces/sessions
>  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) 
> [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
> to /54.172.118.222 for system_traces/events
> Node running repair - 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 
> 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for 
> events from /54.172.118.222
> Note: The IPs its communicating is all PUBLIC Ips and it should have used the 
> PRIVATE IPs starting with 172.x.x.x
> YAML file values : 
> The listen address is set to: PRIVATE IP
> The broadcast address is set to: PUBLIC IP
> The SEEDs address is set to: PUBLIC IPs from both DCs
> The SNITCHES tried: GPFS and EC2MultiRegionSnitch
> RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair

Reply via email to