from:"srmore"

Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore

tl;dr: Decommissioning datacenters by running nodetool decommission on a
node deletes the data on the decommissioned node - is this expected ?

I am trying our some tests on my multi-datacenter setup. Somewhere in the
docs I read that decommissioning a node will stream its data to other nodes
but it still retains its copy of the data.

I was expecting the same behavior with multiple datacenters. I am using
cassandra 1.2.12. Following are my observations:


Lets say I have a datacenter DC1 which has keyspace keyspace_dc_1 and I
have another datacenter DC2 which has keyspace keyspace_dc_2. They
already have some data in them.

I add DC2 to DC1, update the replication factors on both the keyspaces.
Looking at the gossipinfo, I can see that the schemas are synced. I then
look at the cfstats output and I can see then both the keyspaces are
replicated on both the datacenters (also on the disk, as I can see a
non-zero sstable count).

Now, I decommission DC2:
1) Update the replication factors for the keyspaces.
2) Run nodetool decommission on all the nodes.

I see that I have lost all my keyspaces (and data), the keyspaces from DC1
and DC2. This does not seem normal to me, is this expected ?

Thanks,
Sandeep

Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore

Hello Rob
Sorry for being ambiguous.  By deletes I mean that running decommission I
can no longer see any keyspaces owned by this node or replicated by other
nodes using the cfstats command. I am also seeing the same behavior when I
remove a single node from a cluster (without datacenters).



On Thu, Aug 7, 2014 at 11:43 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 7, 2014 at 8:26 AM, srmore comom...@gmail.com wrote:


 tl;dr: Decommissioning datacenters by running nodetool decommission on a
 node deletes the data on the decommissioned node - is this expected ?


 What does deletes mean? What does lost all my keyspaces (and data)
 mean?

 =Rob

Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore

On Thu, Aug 7, 2014 at 12:27 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 7, 2014 at 10:04 AM, srmore comom...@gmail.com wrote:

 Sorry for being ambiguous.  By deletes I mean that running decommission
 I can no longer see any keyspaces owned by this node or replicated by other
 nodes using the cfstats command. I am also seeing the same behavior when I
 remove a single node from a cluster (without datacenters).


 I'm still not fully parsing you, but clusters should never forget schema
 as a result of decommission.

 Is that what you are saying is happening?


Yes, this is what is happening.



 (In fact, even the decommissioned node itself does not forget its schema,
 which I personally consider a bug.)



Ok, so I am assuming this is not a normal behavior and possibly a bug  - is
this correct ?



 =Rob

Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore

Thanks for the detailed reply Ken, this really helps. I also realized that
I wasn't doing a 'nodetool rebuild' after reading your email. I was
following the steps mentioned here
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_decomission_dc_t.html

I do a test with nodetool rebuild and see what happens.

On Thu, Aug 7, 2014 at 1:27 PM, Ken Hancock ken.hanc...@schange.com wrote:

My reading is it didn't forget the schema. It lost the data.

My reading is decomissioning worked fine. Possibly when you changed the
replication on a keyspace to include a second data center, the data didn't
get replicated.

Probably not because I could see the sstables for the keyspace from the
other datacenter created. My understanding could be wrong though.

When you ADD a datacenter, you need to do a nodetool rebuild to get the
data streamed to the new data center. When you alter a keyspace to include
another datacenter in its replication schema, a nodetool repair is required
-- was this done?
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/update_ks_rf_t.html

I missed the 'nodetool rebuild' step that could be my issue, yes I did run
repair.

When you use nodetool decomission, you're effectively deleting the
parititioning token from the cluster. The node being decommissioned will
stream its data to the new owners of its original token range. This
streaming in no way should affect any other datacenter because you have not
changed the tokens or data ownership for any datacenter but the one in
which you are decomissioning a node.

That is what my understanding was, but when I decommission it does clear
out (removes) all the keyspaces.

When you eventually decomission the last node in the datacenter, all data
is gone as there are no tokens in that datacenter to own any data.

If you had a keyspace that was only replicated within that datacenter,
that data is gone (though you could probably add nodes back in and
ressurect it).

The (now outdated) documentation [1] says that data remains on the node
even after decommissioning. So I do not understand why the data would go
away.

If you had a keyspace where you changed the replication to include another
datacenter, if that datacenter had never received the data, then it may
have the schema but would have none of the data (other than new data that
was written AFTER you change the replication).

I would expect the repair to fix this, i.e. to stream the old data to the
newly added datacenter. So, does nodetool rebuild help here ?

[1] https://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely

On Thu, Aug 7, 2014 at 2:11 PM, srmore comom...@gmail.com wrote:

On Thu, Aug 7, 2014 at 12:27 PM, Robert Coli rc...@eventbrite.com
wrote:

On Thu, Aug 7, 2014 at 10:04 AM, srmore comom...@gmail.com wrote:

Sorry for being ambiguous. By deletes I mean that running
decommission I can no longer see any keyspaces owned by this node or
replicated by other nodes using the cfstats command. I am also seeing the
same behavior when I remove a single node from a cluster (without
datacenters).

I'm still not fully parsing you, but clusters should never forget
schema as a result of decommission.

Is that what you are saying is happening?

Yes, this is what is happening.

(In fact, even the decommissioned node itself does not forget its
schema, which I personally consider a bug.)

Ok, so I am assuming this is not a normal behavior and possibly a bug -
is this correct ?

=Rob

--
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
| [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn] http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore

I tried using 'nodetool rebuild' after I add the datacenters,date same
outcome, and after I decommission my keyspaces are getting wiped out, I
don't understand this.

On Thu, Aug 7, 2014 at 1:54 PM, srmore comom...@gmail.com wrote:

I do a test with nodetool rebuild and see what happens.

On Thu, Aug 7, 2014 at 1:27 PM, Ken Hancock ken.hanc...@schange.com
wrote: