Re: Should replica placement change after a topology change?

2015-09-16 Thread Richard Dawe
Hi Rob,

On 11/09/2015 18:27, "Robert Coli" 
mailto:rc...@eventbrite.com>> wrote:
On Fri, Sep 11, 2015 at 7:24 AM, Richard Dawe 
mailto:rich.d...@messagesystems.com>> wrote:
Thanks, Nate and Rob. We are going to have to migrate some installations from 
SimpleSnitch to Ec2Snitch, others to GossipingPropertyFileSnitch. Your help is 
much appreciated!

If I were operating in a hybrid ec2/non-ec2 environment, I'd use GPFS 
everywhere, FWIW.

Right now we don’t have this mix — it’s either EC2 or non-EC2 — but who knows 
what the future holds?

In that mixed non-EC2/EC2 environment, with GossipingPropertyFileSnitch, it 
seems like you would need to simulate what Ec2Snitch does, and manually 
configure GPFS to treat each Availability Zone as a rack.

Thanks, best regards, Rich



Re: Should replica placement change after a topology change?

2015-09-11 Thread Richard Dawe
Thanks, Nate and Rob. We are going to have to migrate some installations from 
SimpleSnitch to Ec2Snitch, others to GossipingPropertyFileSnitch. Your help is 
much appreciated!

Best regards, Rich

On 10/09/2015 20:33, "Nate McCall" 
mailto:n...@thelastpickle.com>> wrote:


So if you have a topology that would change if you switched from SimpleStrategy 
to NetworkTopologyStrategy plus multiple racks, it sounds like a different 
migration strategy would be needed?

I am imagining:

  1.  Switch to a different snitch, and the keyspace from SimpleStrategy to NTS 
but keep it all in one rack. So effectively the same topology, but with a 
different snitch.
  2.  Set up a new data centre with the desired topology.
  3.  Change the keyspace to have replicas in the new DC.
  4.  Rebuild all the nodes in the new DC.
  5.  Flip all your clients over to the new DC.
  6.  Decommission your original DC.

That would work, yes. I would add :

- 4.5. Repair all nodes.

I can confirm that the above process works (definitely include Rob's repair 
suggestion, though). It is really the only way we've found to safely go from 
SimpleSnitch to rack-aware NTS.

The same process works/is required for SimpleSnitch to Ec2Snitch fwiw.



Re: Should replica placement change after a topology change?

2015-09-10 Thread Richard Dawe
Hi Robert,

Firstly, thank you very much for you help. I have some comments inline below.

On 10/09/2015 01:26, "Robert Coli" 
mailto:rc...@eventbrite.com>> wrote:

On Wed, Sep 9, 2015 at 7:52 AM, Richard Dawe 
mailto:rich.d...@messagesystems.com>> wrote:
I am investigating various topology changes, and their effect on replica 
placement. As far as I can tell, replica placement is not changing after I’ve 
changed the topology and run nodetool repair + cleanup. I followed the 
procedure described at 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_switch_snitch.html

That's probably a good thing. I'm going to be modifying the warning in the 
cassandra.yaml to advise users that in practice the only change of snitch or 
replication strategy one can safely do is one in which replica placement does 
not change. It currently says that you need to repair, but there are plenty of 
scenarios where you lose all existing replicas for a given datum, and are 
therefore unable to repair. The key is that you need at least one replica to 
stay the same or repair is worthless. And if you only have one replica staying 
the same, you lose any consistency consistency contract you might have been 
operating under. One ALMOST NEVER ACTUALLY WANTS TO DO ANYTHING BUT A NO-OP 
HERE.

So if you have a topology that would change if you switched from SimpleStrategy 
to NetworkTopologyStrategy plus multiple racks, it sounds like a different 
migration strategy would be needed?

I am imagining:

  1.  Switch to a different snitch, and the keyspace from SimpleStrategy to NTS 
but keep it all in one rack. So effectively the same topology, but with a 
different snitch.
  2.  Set up a new data centre with the desired topology.
  3.  Change the keyspace to have replicas in the new DC.
  4.  Rebuild all the nodes in the new DC.
  5.  Flip all your clients over to the new DC.
  6.  Decommission your original DC.

Or something like that.


Here is my test scenario : 


  1.  To determine the token range ownership, I used “nodetool ring ” 
and “nodetool info –T ”. I saved the output of those commands with 
the original topology, after changing the topology, after repairing, after 
changing the replication strategy, and then again after repairing. In no cases 
did the tokens change. It looks like nodetool ring and nodetool info –T show 
the owner but not the replicas for a particular range.

The tokens and ranges shouldn't be changing, the replica placement should be. 
AFAIK neither of those commands show you replica placement, they show you 
primary range ownership.

Use getendpoints to determine replica placement before and after.


Thanks, I will play with that when I have a chance next week.


I was expecting the replica placement to change. Because the racks were 
assigned in groups (rather than alternating), I was expecting the original 
replica placement with SimpleStrategy to be non-optimal after switching to 
NetworkTopologyStrategy. E.g.: if some data was replicated to nodes 1, 2 and 3, 
then after the topology change there would be 2 replicas in RAC1, 1 in RAC2 and 
none in RAC3. And hence when the repair ran, it would remove one replica from 
RAC1 and make sure that there was a replica in RAC3.

I would expect this to be the case.

However, when I did a query using cqlsh at consistency QUORUM, I saw that it 
was hitting two replicas in the same rack, and a replica in a different rack. 
This suggests that the replica placement did not change after the topology 
change.

Perhaps you are seeing the quirks of the current rack-aware implementation, 
explicated here?

https://issues.apache.org/jira/browse/CASSANDRA-3810


Thanks. I need to re-read that a few times to understand it.

Is there some way I can see which nodes have a replica for a given token range?

Not for a range, but for a given key with nodetool getendpoints.

I wonder if there would be value to the range... in the pre-vnode past I have 
merely generated a key for each range. With the number of ranges increased so 
dramatically by vnodes, it might be easier to have an endpoint that works on 
ranges...

Thank you again. Best regards, Rich


=Rob



Should replica placement change after a topology change?

2015-09-09 Thread Richard Dawe
Good afternoon,

I am investigating various topology changes, and their effect on replica 
placement. As far as I can tell, replica placement is not changing after I’ve 
changed the topology and run nodetool repair + cleanup. I followed the 
procedure described at 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_switch_snitch.html

Here is my test scenario:

  1.  Cassandra 2.0.15
  2.  6 nodes, initially set up with SimpleSnitch, vnodes enabled, all in one 
data centre.
  3.  Keyspace set up with SimpleStrategy, replication factor 3.
  4.  Four rows inserted into table in keyspace, integer primary key, text 
value.
  5.  I shut down the cluster, switch to GossipingPropertyFileSnitch. I set up 
nodes 1+2 to RAC1, 3+4 to RAC2, 5+6 to RAC3, all in data centre DC1.
  6.  Restart C* on all nodes.
  7.  Run a nodetool repair plus cleanup.
  8.  Change the keyspace to use replication strategy NetworkTopologyStrategy, 
RF 3 in DC1.
  9.  Run a nodetool repair plus cleanup.

To determine the token range ownership, I used “nodetool ring ” and 
“nodetool info –T ”. I saved the output of those commands with the 
original topology, after changing the topology, after repairing, after changing 
the replication strategy, and then again after repairing. In no cases did the 
tokens change. It looks like nodetool ring and nodetool info –T show the owner 
but not the replicas for a particular range.

I was expecting the replica placement to change. Because the racks were 
assigned in groups (rather than alternating), I was expecting the original 
replica placement with SimpleStrategy to be non-optimal after switching to 
NetworkTopologyStrategy. E.g.: if some data was replicated to nodes 1, 2 and 3, 
then after the topology change there would be 2 replicas in RAC1, 1 in RAC2 and 
none in RAC3. And hence when the repair ran, it would remove one replica from 
RAC1 and make sure that there was a replica in RAC3.

However, when I did a query using cqlsh at consistency QUORUM, I saw that it 
was hitting two replicas in the same rack, and a replica in a different rack. 
This suggests that the replica placement did not change after the topology 
change.

Am I missing something?

Is there some way I can see which nodes have a replica for a given token range?

Any help/insight appreciated.

Thanks, best regards, Rich



Re: Schema changes: where in Java code are they sent?

2015-02-25 Thread Richard Dawe
Good morning,

Sorry for the slow reply here. I finally had some time to test cqlsh tracing on 
a ccm cluster with 2 of 3 nodes down, to see if the unavailable error was due 
to cqlsh or my query. Reply inline below.

On 15/01/2015 12:46, "Tyler Hobbs" 
mailto:ty...@datastax.com>> wrote:

On Thu, Jan 15, 2015 at 6:30 AM, Richard Dawe 
mailto:rich.d...@messagesystems.com>> wrote:

I thought it might be quorum consistency level, because of the because I was 
seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes 
enabled ("ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, 
my schema operations were working fine. When I took down two nodes using “ccm 
node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm 
node1 cqlsh” were failing like this:

  cqlsh> ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it 
seems like the “one or more nodes were unavailable” may be due to cqlsh’s error 
handling.

If I wait a bit, and re-run my schema operations, they work fine with only one 
node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) 
to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line 
session log.

That Unavailable error may actually be from the tracing-related queries failing 
(that's what I suspect, at least).  Starting cqlsh with --debug might show you 
a stacktrace in that case, but I'm not 100% sure.

Yes, it does seem to be cqlsh tracing. The debug output below was generated 
with:

 * A 3 node ccm cluster, running Cassandra 2.0.8 on Ubuntu 14.10 x86_64.
 * I took down 2 of the 3 nodes.
 * Table test5 has a replication factor of 3, primary key is “id text”.
 * cqlsh session was started after 2 of the 3 nodes had been shut down.

Debug output:

rdawe@cstar:~$ ccm node1 cqlsh --debug
Using CQL driver: 
Using thrift lib: 
Connected to test at 127.0.0.1:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 
19.39.0]
Use HELP for help.
cqlsh> USE test;
cqlsh:test> TRACING ON
Now tracing requests.
cqlsh:test> SELECT * FROM test5;

 id| foo
---+---
 blarg |  ness
 hello | world

(2 rows)

Traceback (most recent call last):
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 827, in onecmd
self.handle_statement(st, statementtext)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 865, in 
handle_statement
return custom_handler(parsed)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 901, in do_select
with_default_limit=with_default_limit)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/cqlsh", line 910, in 
perform_statement
print_trace_session(self, self.cursor, session_id)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py", 
line 26, in print_trace_session
rows  = fetch_trace_session(cursor, session_id)
  File "/home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py", 
line 47, in fetch_trace_session
consistency_level='ONE')
  File 
"/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/cursor.py",
 line 80, in execute
response = self.get_response(prepared_q, cl)
  File 
"/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py",
 line 77, in get_response
return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File 
"/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py",
 line 102, in handle_cql_execution_errors
raise cql.OperationalError("Unable to complete request: one or "
OperationalError: Unable to complete request: one or more nodes were 
unavailable.

Sometimes I get a different error:

rdawe@cstar:~$ echo -e 'TRACING ON\nSELECT * FROM test.test5;\n' | ccm node1 
cqlsh --debug
Using CQL driver: 
Using thrift lib: 
Now tracing requests.

 id| foo
---+---
 blarg |  ness
 hello | world

(2 rows)

:3:Session edc8c010-bcd5-11e4-a008-1dd7f4de70a1 wasn't found.

I notice that the system_traces keyspace has replication factor 2. Since 2 
nodes are down, perhaps sometimes the tracing session would be stored on nodes 
that are down. And other times one of the two replicas for system_traces would 
be on the node that’s up, but for some reason storing the data in 
system_traces.sessions fails?

Thanks, best regards, Rich



Re: Schema changes: where in Java code are they sent?

2015-01-15 Thread Richard Dawe
Hi Tyler,

Thank you for your quick reply; follow-up inline below.

On 14/01/2015 19:36, "Tyler Hobbs" 
mailto:ty...@datastax.com>> wrote:

On Wed, Jan 14, 2015 at 5:13 PM, Richard Dawe 
mailto:rich.d...@messagesystems.com>> wrote:

I’ve been trying to find the Java code where the schema migration is sent to 
the other nodes in the cluster, to understand what the requirements are for 
successfully applying the update. E.g.: is QUORUM consistency level applied?

A quorum isn't required.  Schema changes are simply applied against the local 
node (whichever node the client sends the query to) and then are pushed out to 
the other nodes.  Nodes will also pull the latest schema from other nodes as 
needed (for example, if a node was down during a schema change).

I thought it might be quorum consistency level, because of the because I was 
seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes 
enabled ("ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, 
my schema operations were working fine. When I took down two nodes using “ccm 
node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm 
node1 cqlsh” were failing like this:

  cqlsh> ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it 
seems like the “one or more nodes were unavailable” may be due to cqlsh’s error 
handling.

If I wait a bit, and re-run my schema operations, they work fine with only one 
node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) 
to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line 
session log.


I spent an hour looking through the Java code last night, with no luck. I 
thought this code would be in StorageProxy.java, but I have not found it there, 
or in any of the other classes I looked at.

MigrationManager is probably the most central class for this stuff.

Thank you. That code makes a lot more sense now. :)

Best regards, Rich



Schema changes: where in Java code are they sent?

2015-01-14 Thread Richard Dawe
Hello,

I’m doing some research on schema migrations for Cassandra.

I’ve been playing with cqlsh with TRACING ON, and I can see that a schema 
change like “CREATE TABLE” is sent to all nodes in the cluster. And also that 
“CREATE TABLE” fails if only one of my three nodes is up (with replication 
factor = 3).

I’ve been trying to find the Java code where the schema migration is sent to 
the other nodes in the cluster, to understand what the requirements are for 
successfully applying the update. E.g.: is QUORUM consistency level applied?

I spent an hour looking through the Java code last night, with no luck. I 
thought this code would be in StorageProxy.java, but I have not found it there, 
or in any of the other classes I looked at.

Any pointers would be appreciated.

Thanks, best regards, Rich