Re: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-10-22 Thread João Reis
 Hi,

We have received another report of this issue and this time we were able to
identify the bug and fix it. Today's release of the driver (version 3.16.1)
contains this fix. The JIRA issue is CSHARP-943 [1]

Thanks,
João Reis

[1] https://datastax-oss.atlassian.net/browse/CSHARP-943

Gediminas Blazys  escreveu no dia
segunda, 18/05/2020 à(s) 07:16:

> Hey,
>
>
>
> Apologies for the late reply João.
>
>
>
> We really, really appreciate your interest and likewise we could not
> reproduce this issue anywhere else but in production where it occurred,
> which is slightly undesirable. As we could not afford to keep the DC in
> this state we have removed it from our cluster. I’m afraid we cannot
> provide you with the info you’ve requested.
>
>
>
> Gediminas
>
>
>
> *From:* João Reis 
> *Sent:* Tuesday, May 12, 2020 19:58
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Adding new DC results in clients failing to
> connect
>
>
>
> Unfortunately I'm not able to reproduce this.
>
>
>
> Would it be possible for you to run a couple of queries and give us the
> results? The queries are "SELECT * FROM system.peers" and "SELECT * FROM
> system_schema.keyspaces". You should run both of these queries on any node
> that the driver uses to set up the control connection when that error
> occurs. To determine the node you can look for this driver log message:
> "Connection established to [NODE_ADDRESS] using protocol version [VERSION]."
>
>
>
> It should be easier to reproduce the issue with the results of those
> queries.
>
>
>
> Thanks,
>
> João Reis
>
>
>
> Gediminas Blazys  escreveu no dia
> sexta, 8/05/2020 à(s) 08:27:
>
> Hello,
>
>
>
> Thanks for looking into this. As far as the time for token map calculation
> goes, we are considering reducing the number of vnodes for future DCs.
> However, in the mean time we were able to deploy another DC8  (testing the
> hypothesis that this may be isolated to DC7 only) and the deployment
> worked. DC8 is part of the cluster now, currently being rebuilt and we did
> not notice login issues with this expansion. So the topology now is this:
>
>
>
> DC1 - 18 nodes - 256 vnodes - working
>
> DC2 - 18 nodes - 256 vnodes - working
>
> DC3 - 18 nodes - 256 vnodes - working
>
> DC4 - 18 nodes - 256 vnodes - working
>
> DC5 - 18 nodes - 256 vnodes - working
>
> DC6 - 60 nodes - 256 vnodes - working
>
> DC7 - 60 nodes - 256 vnodes - once added to replication, clients can't
> connect to any DC
>
> DC8 - 60 nodes - 256 vnodes - rebuilding at the moment, including this DC
> into replication did not cause login issues.
>
>
>
> The major difference between DC7 and other DCs is that in DC7 we only have
> two racks while in other locations we use three, the replication factor
> however for all keyspaces remains the same – 3 for all user defined
> keyspaces. Maybe this is something that could cause issues with duplicates? 
> It's
> a theoretical but cassandra having to place two replicas  on the same  rack
> maybe placed both the primary and a backup replica on the same node. Hence
> a duplicate...
>
>
>
> Gediminas
>
>
>
> *From:* João Reis 
> *Sent:* Thursday, May 7, 2020 19:22
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Adding new DC results in clients failing to
> connect
>
>
>
> Hi,
>
>
>
> I don't believe that the peers entry is responsible for that exception.
> Looking at the driver code, I can't even think of a scenario where that
> exception would be thrown... I will run some tests in the next couple of
> days to try and figure something out.
>
>
>
> One thing that is certain from those log messages is that the tokenmap
> computation is very slow (20 seconds). With 100+ nodes and 256 vnodes per
> node, we should expect the token map computation to be a bit slower but 20
> seconds is definitely too much. I've opened CSHARP-901 to track this. [1]
>
>
>
> João Reis
>
>
>
> [1] https://datastax-oss.atlassian.net/browse/CSHARP-901
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatastax-oss.atlassian.net%2Fbrowse%2FCSHARP-901&data=02%7C01%7CGediminas.Blazys%40microsoft.com%7C699b48a02b404847fd1908d7f695b020%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637248995123287800&sdata=cX1uFLsvyJPt%2FdL6x84d1CdYCN8m17A%2FpTFi1VmrG1c%3D&reserved=0>
>
>
>
> Gediminas Blazys  escreveu no dia
> segunda, 4/05/2020 à(s) 11:13:
>
> Hello again,
>
>
>
> Looking into system.peers we found that some nodes contain entries about
> themse

RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-17 Thread Gediminas Blazys
Hey,

Apologies for the late reply João.

We really, really appreciate your interest and likewise we could not reproduce 
this issue anywhere else but in production where it occurred, which is slightly 
undesirable. As we could not afford to keep the DC in this state we have 
removed it from our cluster. I’m afraid we cannot provide you with the info 
you’ve requested.

Gediminas

From: João Reis 
Sent: Tuesday, May 12, 2020 19:58
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Unfortunately I'm not able to reproduce this.

Would it be possible for you to run a couple of queries and give us the 
results? The queries are "SELECT * FROM system.peers" and "SELECT * FROM 
system_schema.keyspaces". You should run both of these queries on any node that 
the driver uses to set up the control connection when that error occurs. To 
determine the node you can look for this driver log message: "Connection 
established to [NODE_ADDRESS] using protocol version [VERSION]."

It should be easier to reproduce the issue with the results of those queries.

Thanks,
João Reis

Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 escreveu no dia sexta, 8/05/2020 à(s) 08:27:
Hello,

Thanks for looking into this. As far as the time for token map calculation 
goes, we are considering reducing the number of vnodes for future DCs. However, 
in the mean time we were able to deploy another DC8  (testing the hypothesis 
that this may be isolated to DC7 only) and the deployment worked. DC8 is part 
of the cluster now, currently being rebuilt and we did not notice login issues 
with this expansion. So the topology now is this:

DC1 - 18 nodes - 256 vnodes - working
DC2 - 18 nodes - 256 vnodes - working
DC3 - 18 nodes - 256 vnodes - working
DC4 - 18 nodes - 256 vnodes - working
DC5 - 18 nodes - 256 vnodes - working
DC6 - 60 nodes - 256 vnodes - working
DC7 - 60 nodes - 256 vnodes - once added to replication, clients can't connect 
to any DC
DC8 - 60 nodes - 256 vnodes - rebuilding at the moment, including this DC into 
replication did not cause login issues.

The major difference between DC7 and other DCs is that in DC7 we only have two 
racks while in other locations we use three, the replication factor however for 
all keyspaces remains the same – 3 for all user defined keyspaces. Maybe this 
is something that could cause issues with duplicates? It's a theoretical but 
cassandra having to place two replicas  on the same  rack maybe placed both the 
primary and a backup replica on the same node. Hence a duplicate...

Gediminas

From: João Reis mailto:joao.r.r...@outlook.com>>
Sent: Thursday, May 7, 2020 19:22
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hi,

I don't believe that the peers entry is responsible for that exception. Looking 
at the driver code, I can't even think of a scenario where that exception would 
be thrown... I will run some tests in the next couple of days to try and figure 
something out.

One thing that is certain from those log messages is that the tokenmap 
computation is very slow (20 seconds). With 100+ nodes and 256 vnodes per node, 
we should expect the token map computation to be a bit slower but 20 seconds is 
definitely too much. I've opened CSHARP-901 to track this. [1]

João Reis

[1] 
https://datastax-oss.atlassian.net/browse/CSHARP-901<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatastax-oss.atlassian.net%2Fbrowse%2FCSHARP-901&data=02%7C01%7CGediminas.Blazys%40microsoft.com%7C699b48a02b404847fd1908d7f695b020%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637248995123287800&sdata=cX1uFLsvyJPt%2FdL6x84d1CdYCN8m17A%2FpTFi1VmrG1c%3D&reserved=0>

Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 escreveu no dia segunda, 4/05/2020 à(s) 11:13:
Hello again,

Looking into system.peers we found that some nodes contain entries about 
themselves with null values. Not sure if this could be an issue, maybe someone 
saw something similar? This state is there before including the funky DC into 
replication.
peer
 data_center
 host_id
 preferred_ip
 rack
 release_version
 rpc_address
 schema_version
 tokens

null
 null
 192.168.104.111
  null
null
null
null
null

Have a wonderful day 😊

Gediminas

From: Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.INVALID>>
Sent: Monday, May 4, 2020 10:09
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hello,

Thanks for the reply.

Following your advice we took a look at system.local for seed nodes and 
compared that data with nodetool ring. Both sources contain the same tokens for 
these 

Re: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-12 Thread João Reis
Unfortunately I'm not able to reproduce this.

Would it be possible for you to run a couple of queries and give us the
results? The queries are "SELECT * FROM system.peers" and "SELECT * FROM
system_schema.keyspaces". You should run both of these queries on any node
that the driver uses to set up the control connection when that error
occurs. To determine the node you can look for this driver log message:
"Connection established to [NODE_ADDRESS] using protocol version [VERSION]."

It should be easier to reproduce the issue with the results of those
queries.

Thanks,
João Reis

Gediminas Blazys  escreveu no dia
sexta, 8/05/2020 à(s) 08:27:

> Hello,
>
>
>
> Thanks for looking into this. As far as the time for token map calculation
> goes, we are considering reducing the number of vnodes for future DCs.
> However, in the mean time we were able to deploy another DC8  (testing the
> hypothesis that this may be isolated to DC7 only) and the deployment
> worked. DC8 is part of the cluster now, currently being rebuilt and we did
> not notice login issues with this expansion. So the topology now is this:
>
>
>
> DC1 - 18 nodes - 256 vnodes - working
>
> DC2 - 18 nodes - 256 vnodes - working
>
> DC3 - 18 nodes - 256 vnodes - working
>
> DC4 - 18 nodes - 256 vnodes - working
>
> DC5 - 18 nodes - 256 vnodes - working
>
> DC6 - 60 nodes - 256 vnodes - working
>
> DC7 - 60 nodes - 256 vnodes - once added to replication, clients can't
> connect to any DC
>
> DC8 - 60 nodes - 256 vnodes - rebuilding at the moment, including this DC
> into replication did not cause login issues.
>
>
>
> The major difference between DC7 and other DCs is that in DC7 we only have
> two racks while in other locations we use three, the replication factor
> however for all keyspaces remains the same – 3 for all user defined
> keyspaces. Maybe this is something that could cause issues with duplicates? 
> It's
> a theoretical but cassandra having to place two replicas  on the same  rack
> maybe placed both the primary and a backup replica on the same node. Hence
> a duplicate...
>
>
>
> Gediminas
>
>
>
> *From:* João Reis 
> *Sent:* Thursday, May 7, 2020 19:22
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Adding new DC results in clients failing to
> connect
>
>
>
> Hi,
>
>
>
> I don't believe that the peers entry is responsible for that exception.
> Looking at the driver code, I can't even think of a scenario where that
> exception would be thrown... I will run some tests in the next couple of
> days to try and figure something out.
>
>
>
> One thing that is certain from those log messages is that the tokenmap
> computation is very slow (20 seconds). With 100+ nodes and 256 vnodes per
> node, we should expect the token map computation to be a bit slower but 20
> seconds is definitely too much. I've opened CSHARP-901 to track this. [1]
>
>
>
> João Reis
>
>
>
> [1] https://datastax-oss.atlassian.net/browse/CSHARP-901
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatastax-oss.atlassian.net%2Fbrowse%2FCSHARP-901&data=02%7C01%7CGediminas.Blazys%40microsoft.com%7Cb82cb4f2ca784a9fd4a608d7f2a2d300%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637244653454584013&sdata=%2B9ojISBaiyNt%2Fvlyat2wOCgDbFJIyXjmjuYMhPCB4YU%3D&reserved=0>
>
>
>
> Gediminas Blazys  escreveu no dia
> segunda, 4/05/2020 à(s) 11:13:
>
> Hello again,
>
>
>
> Looking into system.peers we found that some nodes contain entries about
> themselves with null values. Not sure if this could be an issue, maybe
> someone saw something similar? This state is there before including the
> funky DC into replication.
>
> peer
>
>  data_center
>
>  host_id
>
>  preferred_ip
>
>  rack
>
>  release_version
>
>  rpc_address
>
>  schema_version
>
>  tokens
>
> 
>
> null
>
>  null
>
>  192.168.104.111
>
>   null
>
> null
>
> null
>
> null
>
> null
>
>
>
> Have a wonderful day 😊
>
>
>
> Gediminas
>
>
>
> *From:* Gediminas Blazys 
> *Sent:* Monday, May 4, 2020 10:09
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] Re: Adding new DC results in clients failing to
> connect
>
>
>
> Hello,
>
>
>
> Thanks for the reply.
>
>
>
> Following your advice we took a look at system.local for seed nodes and
> compared that data with nodetool ring. Both sources contain the same tokens
> for these specific hosts. Will continue looking into system.peers.
&

RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-08 Thread Gediminas Blazys
Hello,

Thanks for looking into this. As far as the time for token map calculation 
goes, we are considering reducing the number of vnodes for future DCs. However, 
in the mean time we were able to deploy another DC8  (testing the hypothesis 
that this may be isolated to DC7 only) and the deployment worked. DC8 is part 
of the cluster now, currently being rebuilt and we did not notice login issues 
with this expansion. So the topology now is this:

DC1 - 18 nodes - 256 vnodes - working
DC2 - 18 nodes - 256 vnodes - working
DC3 - 18 nodes - 256 vnodes - working
DC4 - 18 nodes - 256 vnodes - working
DC5 - 18 nodes - 256 vnodes - working
DC6 - 60 nodes - 256 vnodes - working
DC7 - 60 nodes - 256 vnodes - once added to replication, clients can't connect 
to any DC
DC8 - 60 nodes - 256 vnodes - rebuilding at the moment, including this DC into 
replication did not cause login issues.

The major difference between DC7 and other DCs is that in DC7 we only have two 
racks while in other locations we use three, the replication factor however for 
all keyspaces remains the same – 3 for all user defined keyspaces. Maybe this 
is something that could cause issues with duplicates? It's a theoretical but 
cassandra having to place two replicas  on the same  rack maybe placed both the 
primary and a backup replica on the same node. Hence a duplicate...

Gediminas

From: João Reis 
Sent: Thursday, May 7, 2020 19:22
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hi,

I don't believe that the peers entry is responsible for that exception. Looking 
at the driver code, I can't even think of a scenario where that exception would 
be thrown... I will run some tests in the next couple of days to try and figure 
something out.

One thing that is certain from those log messages is that the tokenmap 
computation is very slow (20 seconds). With 100+ nodes and 256 vnodes per node, 
we should expect the token map computation to be a bit slower but 20 seconds is 
definitely too much. I've opened CSHARP-901 to track this. [1]

João Reis

[1] 
https://datastax-oss.atlassian.net/browse/CSHARP-901<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatastax-oss.atlassian.net%2Fbrowse%2FCSHARP-901&data=02%7C01%7CGediminas.Blazys%40microsoft.com%7Cb82cb4f2ca784a9fd4a608d7f2a2d300%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637244653454584013&sdata=%2B9ojISBaiyNt%2Fvlyat2wOCgDbFJIyXjmjuYMhPCB4YU%3D&reserved=0>

Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 escreveu no dia segunda, 4/05/2020 à(s) 11:13:
Hello again,

Looking into system.peers we found that some nodes contain entries about 
themselves with null values. Not sure if this could be an issue, maybe someone 
saw something similar? This state is there before including the funky DC into 
replication.
peer
 data_center
 host_id
 preferred_ip
 rack
 release_version
 rpc_address
 schema_version
 tokens

null
 null
 192.168.104.111
  null
null
null
null
null

Have a wonderful day 😊

Gediminas

From: Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.INVALID>>
Sent: Monday, May 4, 2020 10:09
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hello,

Thanks for the reply.

Following your advice we took a look at system.local for seed nodes and 
compared that data with nodetool ring. Both sources contain the same tokens for 
these specific hosts. Will continue looking into system.peers.

We have enabled more verbosity on the C# driver and this is the message that we 
get now:


ControlConnection: 05/03/2020 14:28:42.346 +03:00 : Updating keyspaces metadata

ControlConnection: 05/03/2020 14:28:42.377 +03:00 : Rebuilding token map

ControlConnection: 05/03/2020 14:29:03.837 +03:00 : Finished building TokenMap 
for 7 keyspaces and 210 hosts. It took 19403 milliseconds.

ControlConnection: 05/03/2020 14:29:03.901 +03:00 ALARMA: ENDPOINT: 
<>:9042 EXCEPTION: System.ArgumentException: The source argument 
contains duplicate keys.

   at 
System.Collections.Concurrent.ConcurrentDictionary`2.InitializeFromCollection(IEnumerable`1
 collection)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection, IEqualityComparer`1 comparer)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection)

   at Cassandra.TokenMap..ctor(TokenFactory factory, IReadOnlyDictionary`2 
tokenToHostsByKeyspace, List`1 ring, IReadOnlyDictionary`2 primaryReplicas, 
IReadOnlyDictionary`2 keyspaceTokensCache, IReadOnlyDictionary`2 datacenters, 
Int32 numberOfHostsWithTokens)

   at Cassandra.TokenMap.Build(String partitioner, ICollection`1 hosts, 
ICollection`1 keyspaces)

   at Cassandra.Metadata.d__59.MoveNext()

--- End of stack

Re: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-07 Thread João Reis
Hi,

I don't believe that the peers entry is responsible for that exception.
Looking at the driver code, I can't even think of a scenario where that
exception would be thrown... I will run some tests in the next couple of
days to try and figure something out.

One thing that is certain from those log messages is that the tokenmap
computation is very slow (20 seconds). With 100+ nodes and 256 vnodes per
node, we should expect the token map computation to be a bit slower but 20
seconds is definitely too much. I've opened CSHARP-901 to track this. [1]

João Reis

[1] https://datastax-oss.atlassian.net/browse/CSHARP-901

Gediminas Blazys  escreveu no dia
segunda, 4/05/2020 à(s) 11:13:

> Hello again,
>
>
>
> Looking into system.peers we found that some nodes contain entries about
> themselves with null values. Not sure if this could be an issue, maybe
> someone saw something similar? This state is there before including the
> funky DC into replication.
>
> peer
>
>  data_center
>
>  host_id
>
>  preferred_ip
>
>  rack
>
>  release_version
>
>  rpc_address
>
>  schema_version
>
>  tokens
>
> 
>
> null
>
>  null
>
>  192.168.104.111
>
>   null
>
> null
>
> null
>
> null
>
> null
>
>
>
> Have a wonderful day 😊
>
>
>
> Gediminas
>
>
>
> *From:* Gediminas Blazys 
> *Sent:* Monday, May 4, 2020 10:09
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] Re: Adding new DC results in clients failing to
> connect
>
>
>
> Hello,
>
>
>
> Thanks for the reply.
>
>
>
> Following your advice we took a look at system.local for seed nodes and
> compared that data with nodetool ring. Both sources contain the same tokens
> for these specific hosts. Will continue looking into system.peers.
>
>
>
> We have enabled more verbosity on the C# driver and this is the message
> that we get now:
>
>
>
> ControlConnection: 05/03/2020 14:28:42.346 +03:00 : Updating keyspaces
> metadata
>
> ControlConnection: 05/03/2020 14:28:42.377 +03:00 : Rebuilding token map
>
> ControlConnection: 05/03/2020 14:29:03.837 +03:00 : Finished building
> TokenMap for 7 keyspaces and 210 hosts. It took 19403 milliseconds.
>
> ControlConnection: 05/03/2020 14:29:03.901 +03:00 ALARMA: ENDPOINT:
> <>:9042 EXCEPTION: System.ArgumentException: The source argument
> contains duplicate keys.
>
>at
> System.Collections.Concurrent.ConcurrentDictionary`2.InitializeFromCollection(IEnumerable`1
> collection)
>
>at
> System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1
> collection, IEqualityComparer`1 comparer)
>
>at
> System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1
> collection)
>
>at Cassandra.TokenMap..ctor(TokenFactory factory, IReadOnlyDictionary`2
> tokenToHostsByKeyspace, List`1 ring, IReadOnlyDictionary`2 primaryReplicas,
> IReadOnlyDictionary`2 keyspaceTokensCache, IReadOnlyDictionary`2
> datacenters, Int32 numberOfHostsWithTokens)
>
>at Cassandra.TokenMap.Build(String partitioner, ICollection`1 hosts,
> ICollection`1 keyspaces)
>
>at Cassandra.Metadata.d__59.MoveNext()
>
> --- End of stack trace from previous location where exception was thrown
> ---
>
>at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task
> task)
>
>at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)
>
>at
> System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
>
>at Cassandra.Connections.ControlConnection.d__44.MoveNext()
>
>
>
> The error occurs on Cassandra.TokenMap. We are analyzing objects that the
> driver initializes during the token map creation but we are yet to find
> that dictionary with duplicated keys.
>
> Just to note, once this new DC is added to replication python driver is
> unable to establish a connection either. cqlsh though, seems to be ok. It
> is hard to say for sure, but for now at least, this issue seems to be
> pointing to Cassandra.
>
>
>
> Gediminas
>
>
>
> *From:* Jorge Bay Gondra 
> *Sent:* Thursday, April 30, 2020 11:45
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Adding new DC results in clients failing to
> connect
>
>
>
> Hi,
>
> You can enable logging at driver to see what's happening under the hood:
> https://docs.datastax.com/en/developer/csharp-driver/3.14/faq/#how-can-i-enable-logging-in-the-driver
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.data

RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-04 Thread Gediminas Blazys
Hello again,

Looking into system.peers we found that some nodes contain entries about 
themselves with null values. Not sure if this could be an issue, maybe someone 
saw something similar? This state is there before including the funky DC into 
replication.
peer
 data_center
 host_id
 preferred_ip
 rack
 release_version
 rpc_address
 schema_version
 tokens

null
 null
 192.168.104.111
  null
null
null
null
null

Have a wonderful day 😊

Gediminas

From: Gediminas Blazys 
Sent: Monday, May 4, 2020 10:09
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hello,

Thanks for the reply.

Following your advice we took a look at system.local for seed nodes and 
compared that data with nodetool ring. Both sources contain the same tokens for 
these specific hosts. Will continue looking into system.peers.

We have enabled more verbosity on the C# driver and this is the message that we 
get now:


ControlConnection: 05/03/2020 14:28:42.346 +03:00 : Updating keyspaces metadata

ControlConnection: 05/03/2020 14:28:42.377 +03:00 : Rebuilding token map

ControlConnection: 05/03/2020 14:29:03.837 +03:00 : Finished building TokenMap 
for 7 keyspaces and 210 hosts. It took 19403 milliseconds.

ControlConnection: 05/03/2020 14:29:03.901 +03:00 ALARMA: ENDPOINT: 
<>:9042 EXCEPTION: System.ArgumentException: The source argument 
contains duplicate keys.

   at 
System.Collections.Concurrent.ConcurrentDictionary`2.InitializeFromCollection(IEnumerable`1
 collection)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection, IEqualityComparer`1 comparer)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection)

   at Cassandra.TokenMap..ctor(TokenFactory factory, IReadOnlyDictionary`2 
tokenToHostsByKeyspace, List`1 ring, IReadOnlyDictionary`2 primaryReplicas, 
IReadOnlyDictionary`2 keyspaceTokensCache, IReadOnlyDictionary`2 datacenters, 
Int32 numberOfHostsWithTokens)

   at Cassandra.TokenMap.Build(String partitioner, ICollection`1 hosts, 
ICollection`1 keyspaces)

   at Cassandra.Metadata.d__59.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at 
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)

   at 
System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()

   at Cassandra.Connections.ControlConnection.d__44.MoveNext()

The error occurs on Cassandra.TokenMap. We are analyzing objects that the 
driver initializes during the token map creation but we are yet to find that 
dictionary with duplicated keys.
Just to note, once this new DC is added to replication python driver is unable 
to establish a connection either. cqlsh though, seems to be ok. It is hard to 
say for sure, but for now at least, this issue seems to be pointing to 
Cassandra.

Gediminas

From: Jorge Bay Gondra 
mailto:jorgebaygon...@gmail.com>>
Sent: Thursday, April 30, 2020 11:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hi,
You can enable logging at driver to see what's happening under the hood: 
https://docs.datastax.com/en/developer/csharp-driver/3.14/faq/#how-can-i-enable-logging-in-the-driver<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.datastax.com%2Fen%2Fdeveloper%2Fcsharp-driver%2F3.14%2Ffaq%2F%23how-can-i-enable-logging-in-the-driver&data=02%7C01%7CGediminas.Blazys%40microsoft.com%7C6a5b382a16e54752bb8e08d7effa07bc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637241729477296305&sdata=a3XX8EzNAZk7ak3EE3Q7U4kxTtNii2svHqNpoKZgADI%3D&reserved=0>
With logging information, it should be easy to track the issue down.

Can you query system.local and system.peers on a seed node / contact point to 
see if all the node list / token info is expected. You can compare it to 
nodetool ring info.

Not directly related: 256 vnodes is probably more than you want.

Thanks,
Jorge

On Thu, Apr 30, 2020 at 9:48 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

We have run into a very interesting issue and maybe some of you have 
encountered it or just have an idea where to look.

We are working towards adding new dcs into our cluster, here's the current 
topology:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes

Recently we introduced a new DC6 (60 nodes) into our cluster. The joining and 
rebuilding of DC6 went smoothly, clients are using it without issue. This is 
how it looked after joining DC6:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes
DC6 - 60 nodes

Next we wanted to add another DC7 (also 60 nodes) makin

RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-04 Thread Gediminas Blazys
Hello,

Thanks for the reply.

Following your advice we took a look at system.local for seed nodes and 
compared that data with nodetool ring. Both sources contain the same tokens for 
these specific hosts. Will continue looking into system.peers.

We have enabled more verbosity on the C# driver and this is the message that we 
get now:


ControlConnection: 05/03/2020 14:28:42.346 +03:00 : Updating keyspaces metadata

ControlConnection: 05/03/2020 14:28:42.377 +03:00 : Rebuilding token map

ControlConnection: 05/03/2020 14:29:03.837 +03:00 : Finished building TokenMap 
for 7 keyspaces and 210 hosts. It took 19403 milliseconds.

ControlConnection: 05/03/2020 14:29:03.901 +03:00 ALARMA: ENDPOINT: 
<>:9042 EXCEPTION: System.ArgumentException: The source argument 
contains duplicate keys.

   at 
System.Collections.Concurrent.ConcurrentDictionary`2.InitializeFromCollection(IEnumerable`1
 collection)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection, IEqualityComparer`1 comparer)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection)

   at Cassandra.TokenMap..ctor(TokenFactory factory, IReadOnlyDictionary`2 
tokenToHostsByKeyspace, List`1 ring, IReadOnlyDictionary`2 primaryReplicas, 
IReadOnlyDictionary`2 keyspaceTokensCache, IReadOnlyDictionary`2 datacenters, 
Int32 numberOfHostsWithTokens)

   at Cassandra.TokenMap.Build(String partitioner, ICollection`1 hosts, 
ICollection`1 keyspaces)

   at Cassandra.Metadata.d__59.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at 
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)

   at 
System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()

   at Cassandra.Connections.ControlConnection.d__44.MoveNext()

The error occurs on Cassandra.TokenMap. We are analyzing objects that the 
driver initializes during the token map creation but we are yet to find that 
dictionary with duplicated keys.
Just to note, once this new DC is added to replication python driver is unable 
to establish a connection either. cqlsh though, seems to be ok. It is hard to 
say for sure, but for now at least, this issue seems to be pointing to 
Cassandra.

Gediminas

From: Jorge Bay Gondra 
Sent: Thursday, April 30, 2020 11:45
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hi,
You can enable logging at driver to see what's happening under the hood: 
https://docs.datastax.com/en/developer/csharp-driver/3.14/faq/#how-can-i-enable-logging-in-the-driver
With logging information, it should be easy to track the issue down.

Can you query system.local and system.peers on a seed node / contact point to 
see if all the node list / token info is expected. You can compare it to 
nodetool ring info.

Not directly related: 256 vnodes is probably more than you want.

Thanks,
Jorge

On Thu, Apr 30, 2020 at 9:48 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

We have run into a very interesting issue and maybe some of you have 
encountered it or just have an idea where to look.

We are working towards adding new dcs into our cluster, here's the current 
topology:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes

Recently we introduced a new DC6 (60 nodes) into our cluster. The joining and 
rebuilding of DC6 went smoothly, clients are using it without issue. This is 
how it looked after joining DC6:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes
DC6 - 60 nodes

Next we wanted to add another DC7 (also 60 nodes) making it a total of 210 
nodes in the cluster, and while joining new nodes went smoothly, once we 
changed the replication of user defined keyspaces to include DC7, no clients 
were able to connect to Cassandra (regardless of which DC is being addressed). 
They would throw an exception that I have provided at the end of the email.

Cassandra version 3.11.4.
C# driver version 3.12.0. Also tested with 3.14.0. We use dc round robin policy 
and update ring metadata for connecting clients.
Amount of vnodes per node: 256

The stack trace starts with an exception 'The source argument contains 
duplicate keys.'. Maybe you know what kind of data is in this dictionary? What 
data can be duplicated here?

Clients are unable to connect until the moment we remove DC7 from replication. 
Once replication is adjus

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal,

 

Also to check:

 

You should use the same list of seeds, probably two in each data center if you 
will have five nodes in each, in all the yaml files.  All the seeds node 
addresses from all the data centers listed in each yaml file where it says 
“-seeds:”.  I’m not sure from your previous replies if you’re doing that.

 

Let us know your results.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, March 12, 2018 7:14 PM
To: 'user@cassandra.apache.org'
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman  wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: 

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal,

 

While we are looking into all this I feel compelled to ask you to check your 
security configurations now that you are using public addresses to communicate 
inter-node across data centers.  Are you sure you are using best practices?  

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, March 12, 2018 7:14 PM
To: 'user@cassandra.apache.org'
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman  wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenne

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman  wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is p

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
Yes, that's correct. The customer wants us to migrate the cassandra setup
in their AWS account.

Thanks,
Kunal

On 13 March 2018 at 04:56, Kenneth Brotman 
wrote:

> I didn’t understand something.  Are you saying you are using one data
> center on Google and one on Amazon?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Monday, March 12, 2018 4:24 PM
> *To:* user@cassandra.apache.org
> *Cc:* Nikhil Soman
> *Subject:* Re: [EXTERNAL] RE: Adding new DC?
>
>
>
>
>
> On 13 March 2018 at 03:28, Kenneth Brotman 
> wrote:
>
> You can’t migrate and upgrade at the same time perhaps but you could do
> one and then the other so as to end up on new version.  I’m guessing it’s
> an error in the yaml file or a port not open.  Is there any good reason for
> a production cluster to still be on version 2.1x?
>
>
>
> I'm not trying to migrate AND upgrade at the same time. However, the apt
> repo shows only 2.120 as the available version.
>
> This is the output from the new node in AWS
>
>
>
> ubuntu@ip-10-0-43-213:*~*$ apt-cache policy cassandra
> cassandra:
>  Installed: 2.1.20
>  Candidate: 2.1.20
>  Version table:
> *** 2.1.20 500
>500 http://www.apache.org/dist/cassandra/debian 21x/main amd64
> Packages
>100 /var/lib/dpkg/status
>
> Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node
> into GCE nodes.
>
> As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE
> firewall for the public IP of the AWS instance.
>
>
>
> I mentioned earlier - there are some differences in the column types - for
> example, date (>= 2.2) vs. timestamp (2.1.x)
>
> The application has not been updated yet.
>
> Hence sticking to 2.1.x for now.
>
>
>
> And, so far, 2.1.x has been serving the purpose.
>
> Kunal
>
>
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Durity, Sean R [mailto:sean_r_dur...@homedepot.com]
> *Sent:* Monday, March 12, 2018 11:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] RE: Adding new DC?
>
>
>
> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>
>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>
>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com
> ]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docsdatastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fdc-5F

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
On 13 March 2018 at 04:54, Kenneth Brotman 
wrote:

> Kunal,
>
>
>
> Please provide the following setting from the yaml files you  are using:
>
>
>
> seeds:
>

In GCE: seeds: "10.142.14.27"
In AWS (new node being added): seeds:
"35.196.96.247,35.227.127.245,35.196.241.232" (these are the public IP
addresses of 3 nodes from GCE)

 I have verified that I am able to do cqlsh from the AWS instance to all 3
ip addresses.


> listen_address:
>

We use the listen_interface setting instead of listen_address.

In GCE: listen_interface: eth0 (running ubuntu 14.04 LTS)
In AWS: listen_interface: ens3 (running ubuntu 16.04 LTS)


> broadcast_address:
>

I tried setting broadcast_address to one instance in GCE: broadcast_address:
35.196.96.247

In AWS: broadcast_address: 13.127.89.251 (this is the public/elastic IP of
the node in AWS)

rpc_address:
>

Like listen_address, we use rpc_interface.
In GCE: rpc_interface:  eth0
In AWS: rpc_interface:  ens3


> endpoint_snitch:
>

In both setups, we currently use GossipingPropertyFileSnitch.
The cassandra-rackdc.properties files from both setups:
GCE:
dc=DC1
rack=RAC1

AWS:
dc=DC2
rack=RAC1



> auto_bootstrap:
>

When the google cloud instances started up, we hadn't set this explicitly -
so, they started off with default value (auto_bootstrap: true)
However, as outlined in the datastax doc for adding new dc, I had added
'auto_bootstrap: false' to the google cloud instances (not restarted the
service as per the doc).

In the AWS instance, I had added 'auto_bootstrap: false' - the doc says we
need to do "nodetool rebuild" and hence no automatic bootstrapping.
But, haven't gotten to that step yet.

Thanks,
Kunal


>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Monday, March 12, 2018 4:13 PM
> *To:* user@cassandra.apache.org
> *Cc:* Nikhil Soman
> *Subject:* Re: [EXTERNAL] RE: Adding new DC?
>
>
>
>
>
> On 13 March 2018 at 00:06, Durity, Sean R 
> wrote:
>
> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>
>
>
> I'm not trying to upgrade as of now - first priority is the migration.
>
> We can look at version upgrade later on.
>
>
>
>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>
>
>
> I just tried setting the broadcast_address in one of the instances in GCE
> to its public IP and restarted the service.
>
> However, it now shows all other nodes (in GCE) as DN in nodetool status
> output and the other nodes also report this node as DN with its
> internal/private IP address.
>
>
>
> I also tried setting the broadcast_rpc_address to the internal/private IP
> address - still the same.
>
>
>
>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>
>
>
> No. of nodes: 5
>
> RF: 3
>
> Data size (as reported by the load factor in nodetool status output):
> ~30GB per node
>
>
>
> Thanks,
> Kunal
>
>
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk.
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> K

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman  wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is production setup - so, cannot take risk

Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?

 

 

No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

 

Thanks,

Kunal

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

 

Hi all,

 

We currently have a cluster in GCE for one of the customers.

They want it to be migrated to AWS.

 

I have setup one node in AWS to join into the cluster by following:

https://docsdatastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fdc-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=4s2PNt4_Ty1RVe_0dQ4sn-jQTjmz-Wmxnf2OS4URoYo&s=pfA6Jkn2UwG7AISlAM3OJ1OzQpghd_nVJj-KnYLCvBk&e=>
 

 

Will add more nodes once the first one joins successfully.

 

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

 

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

 

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:

OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx

However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=D

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
On 13 March 2018 at 03:28, Kenneth Brotman 
wrote:

> You can’t migrate and upgrade at the same time perhaps but you could do
> one and then the other so as to end up on new version.  I’m guessing it’s
> an error in the yaml file or a port not open.  Is there any good reason for
> a production cluster to still be on version 2.1x?
>

I'm not trying to migrate AND upgrade at the same time. However, the apt
repo shows only 2.1.20 as the available version.
This is the output from the new node in AWS

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra
cassandra:
 Installed: 2.1.20
 Candidate: 2.1.20
 Version table:
*** 2.1.20 500
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64
Packages
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node
into GCE nodes.
As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE
firewall for the public IP of the AWS instance.

I mentioned earlier - there are some differences in the column types - for
example, date (>= 2.2) vs. timestamp (2.1.x)
The application has not been updated yet.
Hence sticking to 2.1.x for now.

And, so far, 2.1.x has been serving the purpose.

Kunal


>
> Kenneth Brotman
>
>
>
> *From:* Durity, Sean R [mailto:sean_r_dur...@homedepot.com]
> *Sent:* Monday, March 12, 2018 11:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [EXTERNAL] RE: Adding new DC?
>
>
>
> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>
>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>
>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com
> ]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk.
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fdc-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=4s2PNt4_Ty1RVe_0dQ4sn-jQTjmz-Wmxnf2OS4URoYo&s=pfA6Jkn2UwG7AISlAM3OJ1OzQpghd_nVJj-KnYLCvBk&e=>
>
>
>
> Will add more nodes once the first one joins successfully.
>
>
>
> The node in AWS has an elastic IP - which is white-listed for ports
> 7000-7001, 7199, 9042 in GCE firewall.
>
>
>
> The snitch is set to GossipingPropertyFileSnitch. The GCE setup has
> dc=DC1, rack=RAC1 while on AWS, I changed the DC to dc=DC2.
>
>
>
> When I start cassandra service on the AWS instance, I see the version
> handshake msgs in the logs trying to connect to the public IPs of the GCE
> nodes:
>
> OutboundTcpConnection.

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal,

 

Please provide the following setting from the yaml files you  are using:

 

seeds: 

listen_address: 

broadcast_address: 

rpc_address: 

endpoint_snitch: 

auto_bootstrap: 

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:13 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 00:06, Durity, Sean R  wrote:

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

I'm not trying to upgrade as of now - first priority is the migration.

We can look at version upgrade later on.

 

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

I just tried setting the broadcast_address in one of the instances in GCE to 
its public IP and restarted the service.

However, it now shows all other nodes (in GCE) as DN in nodetool status output 
and the other nodes also report this node as DN with its internal/private IP 
address.

 

I also tried setting the broadcast_rpc_address to the internal/private IP 
address - still the same.

 

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

No. of nodes: 5

RF: 3

Data size (as reported by the load factor in nodetool status output): ~30GB per 
node

 

Thanks,
Kunal

 

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is production setup - so, cannot take risk.

Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?

 

 

No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

 

Thanks,

Kunal

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhedkar@gmailcom 
<mailto:kgangakhed...@gmail.com> ] 
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

 

Hi all,

 

We currently have a cluster in GCE for one of the customers.

They want it to be migrated to AWS.

 

I have setup one node in AWS to join into the cluster by following:

https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fdc-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=4s2PNt4_Ty1RVe_0dQ4sn-jQTjmz-Wmxnf2OS4URoYo&s=pfA6Jkn2UwG7AISlAM3OJ1OzQpghd_nVJj-KnYLCvBk&e=>
 

 

Will add more nodes once the first one joins successfully.

 

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

 

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

 

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:

OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx

However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

 

In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
settings - no explicit IP addresses used - so, ends up using the internal 
private IP ranges.

 

Do I need to explicitly add the broadcast_address? for both side?

Would that require restarting of cassandra service on GCE side? Or is it 
possible to change that setting on-the-fly without a restart?

 

I would prefer a non-restart option.

 

PS: The cassandra version running in GCE is 2.1.18 while the new node setup in 
AWS is running 2.1.20 - just in case if that's relevant

 

Thanks,


Kunal

 

 

  _  

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
On 13 March 2018 at 00:06, Durity, Sean R 
wrote:

> You cannot migrate and upgrade at the same time across major versions.
> Streaming is (usually) not compatible between versions.
>

I'm not trying to upgrade as of now - first priority is the migration.
We can look at version upgrade later on.


>
>
> As to the migration question, I would expect that you may need to put the
> external-facing ip addresses in several places in the cassandra.yaml file.
> And, yes, it would require a restart. Why is a non-restart more desirable?
> Most Cassandra changes require a restart, but you can do a rolling restart
> and not impact your application. This is fairly normal admin work and
> can/should be automated.
>

I just tried setting the broadcast_address in one of the instances in GCE
to its public IP and restarted the service.
However, it now shows all other nodes (in GCE) as DN in nodetool status
output and the other nodes also report this node as DN with its
internal/private IP address.

I also tried setting the broadcast_rpc_address to the internal/private IP
address - still the same.


>
>
> How large is the cluster to migrate (# of nodes and size of data). The
> preferred method might depend on how much data needs to move. Is any
> application outage acceptable?
>

No. of nodes: 5
RF: 3
Data size (as reported by the load factor in nodetool status output): ~30GB
per node

Thanks,
Kunal


>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 10:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Adding new DC?
>
>
>
> Hi Kenneth,
>
>
>
> Replies inline below.
>
>
>
> On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
> wrote:
>
> Hi Kunal,
>
>
>
> That version of Cassandra is too far before me so I’ll let others answer.
> I was wonder why you wouldn’t want to end up on 3.0x if you’re going
> through all the trouble of migrating anyway?
>
>
>
>
>
> Application side constraints - some data types are different between 2.1.x
> and 3.x (for example, date vs. timestamp).
>
>
>
> Besides, this is production setup - so, cannot take risk.
>
> Are both data centers in the same region on AWS?  Can you provide yaml
> file for us to see?
>
>
>
>
>
> No, they are in different regions - GCE setup is in us-east while AWS
> setup is in Asia-south (Mumbai)
>
>
>
> Thanks,
>
> Kunal
>
> Kenneth Brotman
>
>
>
> *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
> *Sent:* Sunday, March 11, 2018 2:32 PM
> *To:* user@cassandra.apache.org
> *Subject:* Adding new DC?
>
>
>
> Hi all,
>
>
>
> We currently have a cluster in GCE for one of the customers.
>
> They want it to be migrated to AWS.
>
>
>
> I have setup one node in AWS to join into the cluster by following:
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> operations/ops_add_dc_to_cluster_t.html
> 
>
>
>
> Will add more nodes once the first one joins successfully.
>
>
>
> The node in AWS has an elastic IP - which is white-listed for ports
> 7000-7001, 7199, 9042 in GCE firewall.
>
>
>
> The snitch is set to GossipingPropertyFileSnitch. The GCE setup has
> dc=DC1, rack=RAC1 while on AWS, I changed the DC to dc=DC2.
>
>
>
> When I start cassandra service on the AWS instance, I see the version
> handshake msgs in the logs trying to connect to the public IPs of the GCE
> nodes:
>
> OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
>
> However, nodetool status output on both sides don't show the other side at
> all. That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS
> setup doesn't show old DC (dc=DC1).
>
>
>
> In cassandra.yaml file, I'm only using listen_interface and rpc_interface
> settings - no explicit IP addresses used - so, ends up using the internal
> private IP ranges.
>
>
>
> Do I need to explicitly add the broadcast_address? for both side?
>
> Would that require restarting of cassandra service on GCE side? Or is it
> possible to change that setting on-the-fly without a restart?
>
>
>
> I would prefer a non-restart option.
>
>
>
> PS: The cassandra version running in GCE is 2.1.18 while the new node
> setup in AWS is running 2.1.20 - just in case if that's relevant
>
>
>
> Thanks,
>
> Kunal
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibite

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman"  wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

 

Besides, this is production setup - so, cannot take risk.

Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?

 

 

No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

 

Thanks,

Kunal

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

 

Hi all,

 

We currently have a cluster in GCE for one of the customers.

They want it to be migrated to AWS.

 

I have setup one node in AWS to join into the cluster by following:

https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fdc-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=4s2PNt4_Ty1RVe_0dQ4sn-jQTjmz-Wmxnf2OS4URoYo&s=pfA6Jkn2UwG7AISlAM3OJ1OzQpghd_nVJj-KnYLCvBk&e=>
 

 

Will add more nodes once the first one joins successfully.

 

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

 

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

 

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:

OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx

However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

 

In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
settings - no explicit IP addresses used - so, ends up using the internal 
private IP ranges.

 

Do I need to explicitly add the broadcast_address? for both side?

Would that require restarting of cassandra service on GCE side? Or is it 
possible to change that setting on-the-fly without a restart?

 

I would prefer a non-restart option.

 

PS: The cassandra version running in GCE is 2.1.18 while the new node setup in 
AWS is running 2.1.20 - just in case if that's relevant

 

Thanks,


Kunal

 

 

  _  


The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attach

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Durity, Sean R
You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

Sean Durity
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com]
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

Hi Kenneth,

Replies inline below.

On 12-Mar-2018 3:40 AM, "Kenneth Brotman" 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
Hi Kunal,

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?


Application side constraints - some data types are different between 2.1.x and 
3.x (for example, date vs. timestamp).

Besides, this is production setup - so, cannot take risk.
Are both data centers in the same region on AWS?  Can you provide yaml file for 
us to see?


No, they are in different regions - GCE setup is in us-east while AWS setup is 
in Asia-south (Mumbai)

Thanks,
Kunal
Kenneth Brotman

From: Kunal Gangakhedkar 
[mailto:kgangakhed...@gmail.com]
Sent: Sunday, March 11, 2018 2:32 PM
To: user@cassandra.apache.org
Subject: Adding new DC?

Hi all,

We currently have a cluster in GCE for one of the customers.
They want it to be migrated to AWS.

I have setup one node in AWS to join into the cluster by following:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

Will add more nodes once the first one joins successfully.

The node in AWS has an elastic IP - which is white-listed for ports 7000-7001, 
7199, 9042 in GCE firewall.

The snitch is set to GossipingPropertyFileSnitch. The GCE setup has dc=DC1, 
rack=RAC1 while on AWS, I changed the DC to dc=DC2.

When I start cassandra service on the AWS instance, I see the version handshake 
msgs in the logs trying to connect to the public IPs of the GCE nodes:
OutboundTcpConnection.java:496 - Handshaking version with /xx.xx.xx.xx
However, nodetool status output on both sides don't show the other side at all. 
That is, the GCE setup doesn't show the new DC (dc=DC2) and the AWS setup 
doesn't show old DC (dc=DC1).

In cassandra.yaml file, I'm only using listen_interface and rpc_interface 
settings - no explicit IP addresses used - so, ends up using the internal 
private IP ranges.

Do I need to explicitly add the broadcast_address? for both side?
Would that require restarting of cassandra service on GCE side? Or is it 
possible to change that setting on-the-fly without a restart?

I would prefer a non-restart option.

PS: The cassandra version running in GCE is 2.1.18 while the new node setup in 
AWS is running 2.1.20 - just in case if that's relevant

Thanks,
Kunal




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.