Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Alexander Dejanovski
Shalom,

you may have a high trace probability which could explain what you're
observing :
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html

On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <clohfin...@gmail.com> wrote:

> count(*) actually pages through all the data. So a select count(*) without
> a limit would be expected to cause a lot of load on the system. The hit is
> more than just IO load and CPU, it also creates a lot of garbage that can
> cause pauses slowing down the entire JVM. Some details here:
> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>
> You may want to consider maintaining the count yourself, using Spark, or
> if you just want a ball park number you can grab it from JMX.
>
> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
> actually has nothing to do with flushes. A flush is the operation of moving
> data from memory (memtable) to disk (SSTable).
>
> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
> memtable flushing acquired a switchlock on that blocks mutations during the
> flush (the "pending task" metric is the measure of how many mutations are
> blocked by this lock).
>
> Chris
>
> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <shal...@liveperson.com>
> wrote:
>
> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <shal...@liveperson.com>
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shal...@liveperson.com <shal...@liveperson.com>>

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Alexander Dejanovski
Could you check the write count on a per table basis in order to check
which specific table is actually receiving writes ?
Check the OneMinuteRate metric in
org.apache.cassandra.metrics:type=ColumnFamily,keyspace=*keyspace1*,scope=
*standard1*,name=WriteLatency
(Make sure you replace keyspace and table name here).

Also, check if you have tracing turned on as it can indeed generate writes
for every query you send in the sessions and events table :
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tracing_r.html

Cheers,

On Thu, Nov 10, 2016 at 3:11 PM Shalom Sagges <shal...@liveperson.com>
wrote:

> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <shal...@liveperson.com>
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shal...@liveperson.com <shal...@liveperson.com>>* wrote 
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Alexander Dejanovski
Hi Shalom,

Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
has nothing to do with flushes. A flush is the operation of moving data
from memory (memtable) to disk (SSTable).

The Cassandra write path and read path are two different things and, as far
as I know, I see no way for a select count(*) to increase your write count
(if you are indeed talking about actual Cassandra writes, and not I/O
operations).

Cheers,

On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <shal...@liveperson.com>
wrote:

> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shal...@liveperson.com <shal...@liveperson.com>>* wrote 
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shal...@liveperson.com <shal...@liveperson.com>>* wrote 
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email_source=mkto_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Cassandra reaper

2016-11-01 Thread Alexander Dejanovski
Running reaper with INFO level logging (that can be configured in the yaml
file), you should have a console output telling you what's going on.

If you started reaper with memory back end, restarting it will reset it and
you'll have to register your cluster again, but if you used postgres it
will resume tasks where they were left off.

Please restart Reaper to at least have an output we can get information
from, otherwise we're blind.

Since you're using Cassandra 2.1, I'd advise switching to our fork since
the original one is compiled against Cassandra 2.0 libraries. If you switch
and use postgres, make sure you update the schema accordingly as we added
fields for incremental repair support.

Cheers,

Le mar. 1 nov. 2016 18:31, Jai Bheemsen Rao Dhanwada <jaibheem...@gmail.com>
a écrit :

> Cassandra version is 2.1.16
>
> In my setup I don't see it is writting to any logs
>
> On Tue, Nov 1, 2016 at 10:25 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Do you have anything in the reaper logs that would show a failure of some
> sort ?
> Also, can you tell me which version of Cassandra you're using ?
>
> Thanks
>
> On Tue, Nov 1, 2016 at 6:15 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
> Thanks Alex,
>
> Forgot to mention but I did add the cluster. See the status below. It says
> the status is running but I don't see any repair happening. this is in the
> same state from past 1 days.
> b/w there not much of data in cluster.
>
> [root@machine cassandra-reaper]#  ./bin/spreaper status-repair 3
> # Report improvements/bugs at
> https://github.com/spotify/cassandra-reaper/issues
> #
> --
> # Repair run with id '3':
> {
>   "cause": "manual spreaper run",
>   "cluster_name": "production",
>   "column_families": [],
>   "creation_time": "2016-11-01T00:39:15Z",
>   "duration": null,
>   "end_time": null,
>   "estimated_time_of_arrival": null,
>   "id": 3,
>   "intensity": 0.900,
>   "keyspace_name": "users",
> *  "last_event": "no events",*
>   "owner": "root",
>   "pause_time": null,
>   "repair_parallelism": "DATACENTER_AWARE",
>   "segments_repaired": 0,
>   "start_time": "2016-11-01T00:39:15Z",
> *  "state": "RUNNING",*
>   "total_segments": 301
> }
> [root@ machine cassandra-reaper]#
>
> On Tue, Nov 1, 2016 at 9:24 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi,
>
> The first step in using reaper is to add a cluster to it, as it is a tool
> that can manage multiple clusters and does not need to be executed on a
> Cassandra node (you can run in on any edge node you want).
>
> You should run : ./bin/spreaper add-cluster 127.0.0.1
> Where you'll replace 127.0.0.1 by the address of one of the nodes of your
> cluster.
>
> Then you can run : ./bin/spreaper cluster_name keyspace_name
> to start repairing a keyspace.
>
> You might want to drop in the UI made by Stefan Podkowinski which might
> ease things up for you, at least at the beginning :
> https://github.com/spodkowinski/cassandra-reaper-ui
>
> Worth mentioning that at The Last Pickle we maintain a fork of Reaper that
> handles incremental repair, works with C* 2.x and 3.0, and bundles the UI :
> https://github.com/thelastpickle/cassandra-reaper
> We have a branch that allows using Cassandra as a storage backend instead
> of Postgres :
> https://github.com/thelastpickle/cassandra-reaper/tree/add-cassandra-storage
> It should be merged to master really soon and should be ready to use.
>
> Cheers,
>
>
> On Tue, Nov 1, 2016 at 1:45 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
> Hello,
>
> Has anyone played around with the cassandra reaper (
> https://github.com/spotify/cassandra-reaper)?
>
> if so can some please help me with the set-up, I can't get it working. I
> used the below steps:
>
> 1. create jar file using maven
> 2. java -jar cassandra-reaper-0.2.3-SNAPSHOT.jar server
> cassandra-reaper.yaml
> 3. ./bin/spreaper repair production users
>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Cassandra reaper

2016-11-01 Thread Alexander Dejanovski
Do you have anything in the reaper logs that would show a failure of some
sort ?
Also, can you tell me which version of Cassandra you're using ?

Thanks

On Tue, Nov 1, 2016 at 6:15 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thanks Alex,
>
> Forgot to mention but I did add the cluster. See the status below. It says
> the status is running but I don't see any repair happening. this is in the
> same state from past 1 days.
> b/w there not much of data in cluster.
>
> [root@machine cassandra-reaper]#  ./bin/spreaper status-repair 3
> # Report improvements/bugs at
> https://github.com/spotify/cassandra-reaper/issues
> #
> --
> # Repair run with id '3':
> {
>   "cause": "manual spreaper run",
>   "cluster_name": "production",
>   "column_families": [],
>   "creation_time": "2016-11-01T00:39:15Z",
>   "duration": null,
>   "end_time": null,
>   "estimated_time_of_arrival": null,
>   "id": 3,
>   "intensity": 0.900,
>   "keyspace_name": "users",
> *  "last_event": "no events",*
>   "owner": "root",
>   "pause_time": null,
>   "repair_parallelism": "DATACENTER_AWARE",
>   "segments_repaired": 0,
>   "start_time": "2016-11-01T00:39:15Z",
> *  "state": "RUNNING",*
>   "total_segments": 301
> }
> [root@ machine cassandra-reaper]#
>
> On Tue, Nov 1, 2016 at 9:24 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi,
>
> The first step in using reaper is to add a cluster to it, as it is a tool
> that can manage multiple clusters and does not need to be executed on a
> Cassandra node (you can run in on any edge node you want).
>
> You should run : ./bin/spreaper add-cluster 127.0.0.1
> Where you'll replace 127.0.0.1 by the address of one of the nodes of your
> cluster.
>
> Then you can run : ./bin/spreaper cluster_name keyspace_name
> to start repairing a keyspace.
>
> You might want to drop in the UI made by Stefan Podkowinski which might
> ease things up for you, at least at the beginning :
> https://github.com/spodkowinski/cassandra-reaper-ui
>
> Worth mentioning that at The Last Pickle we maintain a fork of Reaper that
> handles incremental repair, works with C* 2.x and 3.0, and bundles the UI :
> https://github.com/thelastpickle/cassandra-reaper
> We have a branch that allows using Cassandra as a storage backend instead
> of Postgres :
> https://github.com/thelastpickle/cassandra-reaper/tree/add-cassandra-storage
> It should be merged to master really soon and should be ready to use.
>
> Cheers,
>
>
> On Tue, Nov 1, 2016 at 1:45 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
> Hello,
>
> Has anyone played around with the cassandra reaper (
> https://github.com/spotify/cassandra-reaper)?
>
> if so can some please help me with the set-up, I can't get it working. I
> used the below steps:
>
> 1. create jar file using maven
> 2. java -jar cassandra-reaper-0.2.3-SNAPSHOT.jar server
> cassandra-reaper.yaml
> 3. ./bin/spreaper repair production users
>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Cassandra reaper

2016-11-01 Thread Alexander Dejanovski
Hi,

The first step in using reaper is to add a cluster to it, as it is a tool
that can manage multiple clusters and does not need to be executed on a
Cassandra node (you can run in on any edge node you want).

You should run : ./bin/spreaper add-cluster 127.0.0.1
Where you'll replace 127.0.0.1 by the address of one of the nodes of your
cluster.

Then you can run : ./bin/spreaper cluster_name keyspace_name
to start repairing a keyspace.

You might want to drop in the UI made by Stefan Podkowinski which might
ease things up for you, at least at the beginning :
https://github.com/spodkowinski/cassandra-reaper-ui

Worth mentioning that at The Last Pickle we maintain a fork of Reaper that
handles incremental repair, works with C* 2.x and 3.0, and bundles the UI :
https://github.com/thelastpickle/cassandra-reaper
We have a branch that allows using Cassandra as a storage backend instead
of Postgres :
https://github.com/thelastpickle/cassandra-reaper/tree/add-cassandra-storage
It should be merged to master really soon and should be ready to use.

Cheers,


On Tue, Nov 1, 2016 at 1:45 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello,
>
> Has anyone played around with the cassandra reaper (
> https://github.com/spotify/cassandra-reaper)?
>
> if so can some please help me with the set-up, I can't get it working. I
> used the below steps:
>
> 1. create jar file using maven
> 2. java -jar cassandra-reaper-0.2.3-SNAPSHOT.jar server
> cassandra-reaper.yaml
> 3. ./bin/spreaper repair production users
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Tools to manage repairs

2016-10-28 Thread Alexander Dejanovski
Hi Eric,

that would be https://issues.apache.org/jira/browse/CASSANDRA-9754 by
Michael Kjellman and https://issues.apache.org/jira/browse/CASSANDRA-11206 by
Robert Stupp.
If you haven't seen it yet, Robert's summit talk on big partitions is
totally worth it :
Video : https://www.youtube.com/watch?v=N3mGxgnUiRY
Slides :
http://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-2016

Cheers,


On Fri, Oct 28, 2016 at 4:09 PM Eric Evans <john.eric.ev...@gmail.com>
wrote:

> On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski
> <a...@thelastpickle.com> wrote:
> > A few patches are pushing the limits of partition sizes so we may soon be
> > more comfortable with big partitions.
>
> You don't happen to have Jira links to these handy, do you?
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Tools to manage repairs

2016-10-27 Thread Alexander Dejanovski
The "official" recommendation would be 100MB, but it's hard to give a
precise answer.
Keeping it under the GB seems like a good target.
A few patches are pushing the limits of partition sizes so we may soon be
more comfortable with big partitions.

Cheers

Le jeu. 27 oct. 2016 21:28, Vincent Rischmann <m...@vrischmann.me> a écrit :

> Yeah that particular table is badly designed, I intend to fix it, when the
> roadmap allows us to do it :)
> What is the recommended maximum partition size ?
>
> Thanks for all the information.
>
>
> On Thu, Oct 27, 2016, at 08:14 PM, Alexander Dejanovski wrote:
>
> 3.3GB is already too high, and it's surely not good to have well
> performing compactions. Still I know changing a data model is no easy thing
> to do, but you should try to do something here.
>
> Anticompaction is a special type of compaction and if an sstable is being
> anticompacted, then any attempt to run validation compaction on it will
> fail, telling you that you cannot have an sstable being part of 2 repair
> sessions at the same time, so incremental repair must be run one node at a
> time, waiting for anticompactions to end before moving from one node to the
> other.
>
> Be mindful of running incremental repair on a regular basis once you
> started as you'll have two separate pools of sstables (repaired and
> unrepaired) that won't get compacted together, which could be a problem if
> you want tombstones to be purged efficiently.
>
> Cheers,
>
> Le jeu. 27 oct. 2016 17:57, Vincent Rischmann <m...@vrischmann.me> a écrit :
>
>
> Ok, I think we'll give incremental repairs a try on a limited number of
> CFs first and then if it goes well we'll progressively switch more CFs to
> incremental.
>
> I'm not sure I understand the problem with anticompaction and validation
> running concurrently. As far as I can tell, right now when a CF is repaired
> (either via reaper, or via nodetool) there may be compactions running at
> the same time. In fact, it happens very often. Is it a problem ?
>
> As far as big partitions, the biggest one we have is around 3.3Gb. Some
> less big partitions are around 500Mb and less.
>
>
> On Thu, Oct 27, 2016, at 05:37 PM, Alexander Dejanovski wrote:
>
> Oh right, that's what they advise :)
> I'd say that you should skip the full repair phase in the migration
> procedure as that will obviously fail, and just mark all sstables as
> repaired (skip 1, 2 and 6).
> Anyway you can't do better, so take a leap of faith there.
>
> Intensity is already very low and 1 segments is a whole lot for 9
> nodes, you should not need that many.
>
> You can definitely pick which CF you'll run incremental repair on, and
> still run full repair on the rest.
> If you pick our Reaper fork, watch out for schema changes that add
> incremental repair fields, and I do not advise to run incremental repair
> without it, otherwise you might have issues with anticompaction and
> validation compactions running concurrently from time to time.
>
> One last thing : can you check if you have particularly big partitions in
> the CFs that fail to get repaired ? You can run nodetool cfhistograms to
> check that.
>
> Cheers,
>
>
>
> On Thu, Oct 27, 2016 at 5:24 PM Vincent Rischmann <m...@vrischmann.me>
> wrote:
>
>
> Thanks for the response.
>
> We do break up repairs between tables, we also tried our best to have no
> overlap between repair runs. Each repair has 1 segments (purely
> arbitrary number, seemed to help at the time). Some runs have an intensity
> of 0.4, some have as low as 0.05.
>
> Still, sometimes one particular app (which does a lot of read/modify/write
> batches in quorum) gets slowed down to the point we have to stop the repair
> run.
>
> But more annoyingly, since 2 to 3 weeks as I said, it looks like runs
> don't progress after some time. Every time I restart reaper, it starts to
> repair correctly again, up until it gets stuck. I have no idea why that
> happens now, but it means I have to baby sit reaper, and it's becoming
> annoying.
>
> Thanks for the suggestion about incremental repairs. It would probably be
> a good thing but it's a little challenging to setup I think. Right now
> running a full repair of all keyspaces (via nodetool repair) is going to
> take a lot of time, probably like 5 days or more. We were never able to run
> one to completion. I'm not sure it's a good idea to disable autocompaction
> for that long.
>
> But maybe I'm wrong. Is it possible to use incremental repairs on some
> column family only ?
>
>
> On Thu, Oct 27, 2016, at 05:02 PM, Alexander Dejanovski wrote:
>
> Hi Vincent,
>
> most people handle repair with :
> - p

Re: Tools to manage repairs

2016-10-27 Thread Alexander Dejanovski
3.3GB is already too high, and it's surely not good to have well performing
compactions. Still I know changing a data model is no easy thing to do, but
you should try to do something here.

Anticompaction is a special type of compaction and if an sstable is being
anticompacted, then any attempt to run validation compaction on it will
fail, telling you that you cannot have an sstable being part of 2 repair
sessions at the same time, so incremental repair must be run one node at a
time, waiting for anticompactions to end before moving from one node to the
other.

Be mindful of running incremental repair on a regular basis once you
started as you'll have two separate pools of sstables (repaired and
unrepaired) that won't get compacted together, which could be a problem if
you want tombstones to be purged efficiently.

Cheers,

Le jeu. 27 oct. 2016 17:57, Vincent Rischmann <m...@vrischmann.me> a écrit :

> Ok, I think we'll give incremental repairs a try on a limited number of
> CFs first and then if it goes well we'll progressively switch more CFs to
> incremental.
>
> I'm not sure I understand the problem with anticompaction and validation
> running concurrently. As far as I can tell, right now when a CF is repaired
> (either via reaper, or via nodetool) there may be compactions running at
> the same time. In fact, it happens very often. Is it a problem ?
>
> As far as big partitions, the biggest one we have is around 3.3Gb. Some
> less big partitions are around 500Mb and less.
>
>
> On Thu, Oct 27, 2016, at 05:37 PM, Alexander Dejanovski wrote:
>
> Oh right, that's what they advise :)
> I'd say that you should skip the full repair phase in the migration
> procedure as that will obviously fail, and just mark all sstables as
> repaired (skip 1, 2 and 6).
> Anyway you can't do better, so take a leap of faith there.
>
> Intensity is already very low and 1 segments is a whole lot for 9
> nodes, you should not need that many.
>
> You can definitely pick which CF you'll run incremental repair on, and
> still run full repair on the rest.
> If you pick our Reaper fork, watch out for schema changes that add
> incremental repair fields, and I do not advise to run incremental repair
> without it, otherwise you might have issues with anticompaction and
> validation compactions running concurrently from time to time.
>
> One last thing : can you check if you have particularly big partitions in
> the CFs that fail to get repaired ? You can run nodetool cfhistograms to
> check that.
>
> Cheers,
>
>
>
> On Thu, Oct 27, 2016 at 5:24 PM Vincent Rischmann <m...@vrischmann.me>
> wrote:
>
>
> Thanks for the response.
>
> We do break up repairs between tables, we also tried our best to have no
> overlap between repair runs. Each repair has 1 segments (purely
> arbitrary number, seemed to help at the time). Some runs have an intensity
> of 0.4, some have as low as 0.05.
>
> Still, sometimes one particular app (which does a lot of read/modify/write
> batches in quorum) gets slowed down to the point we have to stop the repair
> run.
>
> But more annoyingly, since 2 to 3 weeks as I said, it looks like runs
> don't progress after some time. Every time I restart reaper, it starts to
> repair correctly again, up until it gets stuck. I have no idea why that
> happens now, but it means I have to baby sit reaper, and it's becoming
> annoying.
>
> Thanks for the suggestion about incremental repairs. It would probably be
> a good thing but it's a little challenging to setup I think. Right now
> running a full repair of all keyspaces (via nodetool repair) is going to
> take a lot of time, probably like 5 days or more. We were never able to run
> one to completion. I'm not sure it's a good idea to disable autocompaction
> for that long.
>
> But maybe I'm wrong. Is it possible to use incremental repairs on some
> column family only ?
>
>
> On Thu, Oct 27, 2016, at 05:02 PM, Alexander Dejanovski wrote:
>
> Hi Vincent,
>
> most people handle repair with :
> - pain (by hand running nodetool commands)
> - cassandra range repair :
> https://github.com/BrianGallew/cassandra_range_repair
> - Spotify Reaper
> - and OpsCenter repair service for DSE users
>
> Reaper is a good option I think and you should stick to it. If it cannot
> do the job here then no other tool will.
>
> You have several options from here :
>
>- Try to break up your repair table by table and see which ones
>actually get stuck
>- Check your logs for any repair/streaming error
>- Avoid repairing everything :
>- you may have expendable tables
>   - you may have TTLed only tables with no deletes, accessed with
>   QUORUM CL only
>   - You can try to r

Re: Tools to manage repairs

2016-10-27 Thread Alexander Dejanovski
Oh right, that's what they advise :)
I'd say that you should skip the full repair phase in the migration
procedure as that will obviously fail, and just mark all sstables as
repaired (skip 1, 2 and 6).
Anyway you can't do better, so take a leap of faith there.

Intensity is already very low and 1 segments is a whole lot for 9
nodes, you should not need that many.

You can definitely pick which CF you'll run incremental repair on, and
still run full repair on the rest.
If you pick our Reaper fork, watch out for schema changes that add
incremental repair fields, and I do not advise to run incremental repair
without it, otherwise you might have issues with anticompaction and
validation compactions running concurrently from time to time.

One last thing : can you check if you have particularly big partitions in
the CFs that fail to get repaired ? You can run nodetool cfhistograms to
check that.

Cheers,



On Thu, Oct 27, 2016 at 5:24 PM Vincent Rischmann <m...@vrischmann.me> wrote:

> Thanks for the response.
>
> We do break up repairs between tables, we also tried our best to have no
> overlap between repair runs. Each repair has 1 segments (purely
> arbitrary number, seemed to help at the time). Some runs have an intensity
> of 0.4, some have as low as 0.05.
>
> Still, sometimes one particular app (which does a lot of read/modify/write
> batches in quorum) gets slowed down to the point we have to stop the repair
> run.
>
> But more annoyingly, since 2 to 3 weeks as I said, it looks like runs
> don't progress after some time. Every time I restart reaper, it starts to
> repair correctly again, up until it gets stuck. I have no idea why that
> happens now, but it means I have to baby sit reaper, and it's becoming
> annoying.
>
> Thanks for the suggestion about incremental repairs. It would probably be
> a good thing but it's a little challenging to setup I think. Right now
> running a full repair of all keyspaces (via nodetool repair) is going to
> take a lot of time, probably like 5 days or more. We were never able to run
> one to completion. I'm not sure it's a good idea to disable autocompaction
> for that long.
>
> But maybe I'm wrong. Is it possible to use incremental repairs on some
> column family only ?
>
>
> On Thu, Oct 27, 2016, at 05:02 PM, Alexander Dejanovski wrote:
>
> Hi Vincent,
>
> most people handle repair with :
> - pain (by hand running nodetool commands)
> - cassandra range repair :
> https://github.com/BrianGallew/cassandra_range_repair
> - Spotify Reaper
> - and OpsCenter repair service for DSE users
>
> Reaper is a good option I think and you should stick to it. If it cannot
> do the job here then no other tool will.
>
> You have several options from here :
>
>- Try to break up your repair table by table and see which ones
>actually get stuck
>- Check your logs for any repair/streaming error
>- Avoid repairing everything :
>- you may have expendable tables
>   - you may have TTLed only tables with no deletes, accessed with
>   QUORUM CL only
>   - You can try to relieve repair pressure in Reaper by lowering
>repair intensity (on the tables that get stuck)
>- You can try adding steps to your repair process by putting a higher
>segment count in reaper (on the tables that get stuck)
>- And lastly, you can turn to incremental repair. As you're familiar
>with Reaper already, you might want to take a look at our Reaper fork that
>handles incremental repair :
>https://github.com/thelastpickle/cassandra-reaper
>If you go down that way, make sure you first mark all sstables as
>repaired before you run your first incremental repair, otherwise you'll end
>up in anticompaction hell (bad bad place) :
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html
>Even if people say that's not necessary anymore, it'll save you from a
>very bad first experience with incremental repair.
>Furthermore, make sure you run repair daily after your first inc
>repair run, in order to work on small sized repairs.
>
>
> Cheers,
>
>
> On Thu, Oct 27, 2016 at 4:27 PM Vincent Rischmann <m...@vrischmann.me>
> wrote:
>
>
> Hi,
>
> we have two Cassandra 2.1.15 clusters at work and are having some trouble
> with repairs.
>
> Each cluster has 9 nodes, and the amount of data is not gigantic but some
> column families have 300+Gb of data.
> We tried to use `nodetool repair` for these tables but at the time we
> tested it, it made the whole cluster load too much and it impacted our
> production apps.
>
> Next we saw https://github.com/spotify/cassandra-reaper , tried it and
> had some success until recen

Re: Tools to manage repairs

2016-10-27 Thread Alexander Dejanovski
Hi Vincent,

most people handle repair with :
- pain (by hand running nodetool commands)
- cassandra range repair :
https://github.com/BrianGallew/cassandra_range_repair
- Spotify Reaper
- and OpsCenter repair service for DSE users

Reaper is a good option I think and you should stick to it. If it cannot do
the job here then no other tool will.

You have several options from here :

   - Try to break up your repair table by table and see which ones actually
   get stuck
   - Check your logs for any repair/streaming error
   - Avoid repairing everything :
  - you may have expendable tables
  - you may have TTLed only tables with no deletes, accessed with
  QUORUM CL only
   - You can try to relieve repair pressure in Reaper by lowering repair
   intensity (on the tables that get stuck)
   - You can try adding steps to your repair process by putting a higher
   segment count in reaper (on the tables that get stuck)
   - And lastly, you can turn to incremental repair. As you're familiar
   with Reaper already, you might want to take a look at our Reaper fork that
   handles incremental repair :
   https://github.com/thelastpickle/cassandra-reaper
   If you go down that way, make sure you first mark all sstables as
   repaired before you run your first incremental repair, otherwise you'll end
   up in anticompaction hell (bad bad place) :
   
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html
   Even if people say that's not necessary anymore, it'll save you from a
   very bad first experience with incremental repair.
   Furthermore, make sure you run repair daily after your first inc repair
   run, in order to work on small sized repairs.


Cheers,


On Thu, Oct 27, 2016 at 4:27 PM Vincent Rischmann <m...@vrischmann.me> wrote:

Hi,

we have two Cassandra 2.1.15 clusters at work and are having some trouble
with repairs.

Each cluster has 9 nodes, and the amount of data is not gigantic but some
column families have 300+Gb of data.
We tried to use `nodetool repair` for these tables but at the time we
tested it, it made the whole cluster load too much and it impacted our
production apps.

Next we saw https://github.com/spotify/cassandra-reaper , tried it and had
some success until recently. Since 2 to 3 weeks it never completes a repair
run, deadlocking itself somehow.

I know DSE includes a repair service but I'm wondering how do other
Cassandra users manage repairs ?

Vincent.

-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: incremental repairs with -pr flag?

2016-10-24 Thread Alexander Dejanovski
Hi Sean,

In order to mitigate its impact, anticompaction is not fully executed when
incremental repair is run with -pr. What you'll observe is that running
repair on all nodes with -pr will leave sstables marked as unrepaired on
all of them.

Then, if you think about it you realize it's no big deal as -pr is useless
with incremental repair : data is repaired only once with incremental
repair, which is what -pr intended to fix on full repair, by repairing all
token ranges only once instead of times the replication factor.

Cheers,

Le lun. 24 oct. 2016 18:05, Sean Bridges <sean.brid...@globalrelay.net> a
écrit :

> Hey,
>
> In the datastax documentation on repair [1], it says,
>
> "The partitioner range option is recommended for routine maintenance. Do
> not use it to repair a downed node. Do not use with incremental repair
> (default for Cassandra 3.0 and later)."
>
> Why is it not recommended to use -pr with incremental repairs?
>
> Thanks,
>
> Sean
>
> [1]
> https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsRepairNodesManualRepair.html
> --
>
> Sean Bridges
>
> senior systems architect
> Global Relay
>
> *sean.brid...@globalrelay.net* <sean.brid...@globalrelay.net>
>
> *866.484.6630 *
> New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore
> (+65.3158.1301)
>
> Global Relay Archive supports email, instant messaging, BlackBerry,
> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter,
> Facebook and more.
>
> Ask about *Global Relay Message*
> <http://www.globalrelay.com/services/message> - The Future of
> Collaboration in the Financial Services World
>
> All email sent to or from this address will be retained by Global Relay's
> email archiving system. This message is intended only for the use of the
> individual or entity to which it is addressed, and may contain information
> that is privileged, confidential, and exempt from disclosure under
> applicable law. Global Relay will not be liable for any compliance or
> technical information provided herein. All trademarks are the property of
> their respective owners.
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: non incremental repairs with cassandra 2.2+

2016-10-19 Thread Alexander Dejanovski
Hi Kurt,

we're not actually.
Reaper performs full repair by subrange but does incremental repair on all
ranges at once, node by node.
Subrange is incompatible with incremental repair anyway.

Cheers,

On Thu, Oct 20, 2016 at 5:24 AM kurt Greaves <k...@instaclustr.com> wrote:

>
> On 19 October 2016 at 17:13, Alexander Dejanovski <a...@thelastpickle.com>
> wrote:
>
> There aren't that many tools I know to orchestrate repairs and we maintain
> a fork of Reaper, that was made by Spotify, and handles incremental repair
> : https://github.com/thelastpickle/cassandra-reaper
>
>
> Looks like you're using subranges with incremental repairs. This will
> generate a lot of anticompactions as you'll only repair a portion of the
> SSTables. You should use forceRepairAsync for incremental repairs so that
> it's possible for the repair to act on the whole SSTable, minimising
> anticompactions.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: non incremental repairs with cassandra 2.2+

2016-10-19 Thread Alexander Dejanovski
There aren't that many tools I know to orchestrate repairs and we maintain
a fork of Reaper, that was made by Spotify, and handles incremental repair
: https://github.com/thelastpickle/cassandra-reaper

We just added Cassandra as storage back end (only postgres currently) in
one of the branches, which should soon be merged to master.

Le mer. 19 oct. 2016 19:03, Kant Kodali <k...@peernova.com> a écrit :

Also any suggestions on a tool to orchestrate the incremental repair? Like
say most commonly used

Sent from my iPhone

On Oct 19, 2016, at 9:54 AM, Alexander Dejanovski <a...@thelastpickle.com>
wrote:

Hi Kant,

subrange is a form of full repair, so it will just split the repair process
in smaller yet sequential pieces of work (repair is started giving a start
and end token). Overall, you should not expect improvements other than
having less overstreaming and better chances of success if your cluster is
dense.

You can try to use incremental repair if you know what the caveats are and
use a proper tool to orchestrate it, that would save you from repairing all
10TB each time.
CASSANDRA-12580 might help too as Romain showed us :
https://www.mail-archive.com/user@cassandra.apache.org/msg49344.html

Cheers,



On Wed, Oct 19, 2016 at 6:42 PM Kant Kodali <k...@peernova.com> wrote:

Another question on a same note would be what would be the fastest way to
do repairs of size 10TB cluster ? Full repairs are taking days. So among
repair parallel or repair sub range which is faster in the case of say
adding a new node to the cluster?

Sent from my iPhone

On Oct 19, 2016, at 9:30 AM, Sean Bridges <sean.brid...@globalrelay.net>
wrote:

Hey,

We are upgrading from cassandra 2.1 to cassandra 2.2.

With cassandra 2.1 we would periodically repair all nodes, using the -pr
flag.

With cassandra 2.2, the same repair takes a very long time, as cassandra
does an anti compaction after the repair.  This anti compaction causes most
(all?) the sstables to be rewritten.  Is there a way to do full repairs
without continually anti compacting?  If we do a full repair on each node
with the -pr flag, will subsequent full repairs also force anti compacting
most (all?) sstables?

Thanks,

Sean

-- 
---------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

-- 
---------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: non incremental repairs with cassandra 2.2+

2016-10-19 Thread Alexander Dejanovski
Can you explain why you would want to run repair for new nodes?

Aren't you talking about bootstrap, which is not related to repair actually?

Le mer. 19 oct. 2016 18:57, Kant Kodali <k...@peernova.com> a écrit :

> Thanks! How do I do an incremental repair when I add a new node?
>
> Sent from my iPhone
>
> On Oct 19, 2016, at 9:54 AM, Alexander Dejanovski <a...@thelastpickle.com>
> wrote:
>
> Hi Kant,
>
> subrange is a form of full repair, so it will just split the repair
> process in smaller yet sequential pieces of work (repair is started giving
> a start and end token). Overall, you should not expect improvements other
> than having less overstreaming and better chances of success if your
> cluster is dense.
>
> You can try to use incremental repair if you know what the caveats are and
> use a proper tool to orchestrate it, that would save you from repairing all
> 10TB each time.
> CASSANDRA-12580 might help too as Romain showed us :
> https://www.mail-archive.com/user@cassandra.apache.org/msg49344.html
>
> Cheers,
>
>
>
> On Wed, Oct 19, 2016 at 6:42 PM Kant Kodali <k...@peernova.com> wrote:
>
> Another question on a same note would be what would be the fastest way to
> do repairs of size 10TB cluster ? Full repairs are taking days. So among
> repair parallel or repair sub range which is faster in the case of say
> adding a new node to the cluster?
>
> Sent from my iPhone
>
> On Oct 19, 2016, at 9:30 AM, Sean Bridges <sean.brid...@globalrelay.net>
> wrote:
>
> Hey,
>
> We are upgrading from cassandra 2.1 to cassandra 2.2.
>
> With cassandra 2.1 we would periodically repair all nodes, using the -pr
> flag.
>
> With cassandra 2.2, the same repair takes a very long time, as cassandra
> does an anti compaction after the repair.  This anti compaction causes most
> (all?) the sstables to be rewritten.  Is there a way to do full repairs
> without continually anti compacting?  If we do a full repair on each node
> with the -pr flag, will subsequent full repairs also force anti compacting
> most (all?) sstables?
>
> Thanks,
>
> Sean
>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: non incremental repairs with cassandra 2.2+

2016-10-19 Thread Alexander Dejanovski
Hi Kant,

subrange is a form of full repair, so it will just split the repair process
in smaller yet sequential pieces of work (repair is started giving a start
and end token). Overall, you should not expect improvements other than
having less overstreaming and better chances of success if your cluster is
dense.

You can try to use incremental repair if you know what the caveats are and
use a proper tool to orchestrate it, that would save you from repairing all
10TB each time.
CASSANDRA-12580 might help too as Romain showed us :
https://www.mail-archive.com/user@cassandra.apache.org/msg49344.html

Cheers,



On Wed, Oct 19, 2016 at 6:42 PM Kant Kodali <k...@peernova.com> wrote:

Another question on a same note would be what would be the fastest way to
do repairs of size 10TB cluster ? Full repairs are taking days. So among
repair parallel or repair sub range which is faster in the case of say
adding a new node to the cluster?

Sent from my iPhone

On Oct 19, 2016, at 9:30 AM, Sean Bridges <sean.brid...@globalrelay.net>
wrote:

Hey,

We are upgrading from cassandra 2.1 to cassandra 2.2.

With cassandra 2.1 we would periodically repair all nodes, using the -pr
flag.

With cassandra 2.2, the same repair takes a very long time, as cassandra
does an anti compaction after the repair.  This anti compaction causes most
(all?) the sstables to be rewritten.  Is there a way to do full repairs
without continually anti compacting?  If we do a full repair on each node
with the -pr flag, will subsequent full repairs also force anti compacting
most (all?) sstables?

Thanks,

Sean

-- 
-----
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: non incremental repairs with cassandra 2.2+

2016-10-19 Thread Alexander Dejanovski
Hi Sean,

you should be able to do that by running subrange repairs, which is the
only type of repair that wouldn't trigger anticompaction AFAIK.
Beware that now you will have sstables marked as repaired and others marked
as unrepaired, which will never be compacted together.
You might want to flag all sstables as unrepaired before moving on, if you
do not intend to switch to incremental repair for now.

Cheers,

On Wed, Oct 19, 2016 at 6:31 PM Sean Bridges <sean.brid...@globalrelay.net>
wrote:

> Hey,
>
> We are upgrading from cassandra 2.1 to cassandra 2.2.
>
> With cassandra 2.1 we would periodically repair all nodes, using the -pr
> flag.
>
> With cassandra 2.2, the same repair takes a very long time, as cassandra
> does an anti compaction after the repair.  This anti compaction causes most
> (all?) the sstables to be rewritten.  Is there a way to do full repairs
> without continually anti compacting?  If we do a full repair on each node
> with the -pr flag, will subsequent full repairs also force anti compacting
> most (all?) sstables?
>
> Thanks,
>
> Sean
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: problem starting incremental repair using TheLastPicke Reaper

2016-10-19 Thread Alexander Dejanovski
Abhishek,

can you file an issue on our github repo so that we can further discuss
this ? https://github.com/thelastpickle/cassandra-reaper/issues

Thanks,

On Wed, Oct 19, 2016 at 1:20 PM Abhishek Aggarwal <
abhishek.aggarwa...@snapdeal.com> wrote:

> Hi Alex,
>
> that i already did and it worked but my question is if the passed value of
> incremental repair flag is different from the existing value  then it
> should allow to create new repair_unit instead of getting repair_unit based
> on cluster name/ keyspace /column combination.
>
> and also if i delete the repair_unit then due to referential constraints i
> need to delete repair_segment and repair_run as well which will delete the
> run history corresponds to the repaid_unit.
>
> Abhishek Aggarwal
>
> *Senior Software Engineer*
> *M*: +91 8861212073 <+91%2088612%2012073> , 8588840304
> *T*: 0124 6600600 *EXT*: 12128
> ASF Center -A, ASF Center Udyog Vihar Phase IV,
> Download Our App
> [image: A]
> <https://play.google.com/store/apps/details?id=com.snapdeal.main_source=mobileAppLp_campaign=android>
>  [image:
> A]
> <https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1=8_source=mobileAppLp_campaign=ios>
>  [image:
> W]
> <http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>
>
> On Wed, Oct 19, 2016 at 4:44 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi Abhishek,
>
> This shows you have two repair units for the same keyspace/table with
> different incremental repair settings.
> Can you delete your prior repair run (the one with incremental repair set
> to false) and then create the new one with incremental repair set to true ?
>
> Let me know how that works,
>
>
> On Wed, Oct 19, 2016 at 10:45 AM Abhishek Aggarwal <
> abhishek.aggarwa...@snapdeal.com> wrote:
>
>
> is there a way to start the incremental repair using the reaper. we
> completed full repair successfully and after that i tried to run the
> incremental run but getting the below error.
>
>
> A repair run already exist for the same cluster/keyspace/table but with a
> different incremental repair value.Requested value: true | Existing value:
> false
>
>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: problem starting incremental repair using TheLastPicke Reaper

2016-10-19 Thread Alexander Dejanovski
Hi Abhishek,

This shows you have two repair units for the same keyspace/table with
different incremental repair settings.
Can you delete your prior repair run (the one with incremental repair set
to false) and then create the new one with incremental repair set to true ?

Let me know how that works,


On Wed, Oct 19, 2016 at 10:45 AM Abhishek Aggarwal <
abhishek.aggarwa...@snapdeal.com> wrote:

>
> is there a way to start the incremental repair using the reaper. we
> completed full repair successfully and after that i tried to run the
> incremental run but getting the below error.
>
>
> A repair run already exist for the same cluster/keyspace/table but with a
> different incremental repair value.Requested value: true | Existing value:
> false
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Alexander Dejanovski
Atul,

our fork has been tested on 2.1 and 3.0.x clusters.
I've just tested with a CCM 3.6 cluster and it worked with no issue.

With Reaper, if you set incremental to false, it'll perform a full subrange
repair with no anticompaction.
You'll see this message in the logs : INFO  [AntiEntropyStage:1] 2016-09-29
16:11:34,950 ActiveRepairService.java:378 - Not a global repair, will not
do anticompaction

If you set incremental to true, it'll perform an incremental repair, one
node at a time, with anticompaction (set Parallelism to Parallel
exclusively with inc repair).

Let me know how it goes.


On Thu, Sep 29, 2016 at 3:06 PM Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> Hi Alexander,
>
> There is compatibility issue raised with spotify/cassandra-reaper for
> cassandra version 3.x.
> Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ?
>
> There are some suggestions mentioned by *brstgt* which we can try on our
> side.
>
> On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha <atul.sar...@snapdeal.com>
> wrote:
>
>> Thanks Alexander.
>>
>> Will look into all these.
>>
>> On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Atul,
>>>
>>> since you're using 3.6, by default you're running incremental repair,
>>> which doesn't like concurrency very much.
>>> Validation errors are not occurring on a partition or partition range
>>> base, but if you're trying to run both anticompaction and validation
>>> compaction on the same SSTable.
>>>
>>> Like advised to Robert yesterday, and if you want to keep on running
>>> incremental repair, I'd suggest the following :
>>>
>>>- run nodetool tpstats on all nodes in search for running/pending
>>>repair sessions
>>>- If you have some, and to be sure you will avoid conflicts, roll
>>>restart your cluster (all nodes)
>>>- Then, run "nodetool repair" on one node.
>>>- When repair has finished on this node (track messages in the log
>>>and nodetool tpstats), check if other nodes are running anticompactions
>>>- If so, wait until they are over
>>>- If not, move on to the other node
>>>
>>> You should be able to run concurrent incremental compactions on
>>> different tables if you wish to speed up the complete repair of the
>>> cluster, but do not try to repair the same table/full keyspace from two
>>> nodes at the same time.
>>>
>>> If you do not want to keep on using incremental repair, and fallback to
>>> classic full repair, I think the only way in 3.6 to avoid anticompaction
>>> will be to use subrange repair (Paulo mentioned that in 3.x full repair
>>> also triggers anticompaction).
>>>
>>> You have two options here : cassandra_range_repair (
>>> https://github.com/BrianGallew/cassandra_range_repair) and Spotify
>>> Reaper (https://github.com/spotify/cassandra-reaper)
>>>
>>> cassandra_range_repair might scream about subrange + incremental not
>>> being compatible (not sure here), but you can modify the repair_range()
>>> method by adding a --full switch to the command line used to run repair.
>>>
>>> We have a fork of Reaper that handles both full subrange repair and
>>> incremental repair here :
>>> https://github.com/thelastpickle/cassandra-reaper
>>> It comes with a tweaked version of the UI made by Stephan Podkowinski (
>>> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
>>> interactions to schedule, run and track repair - which adds fields to run
>>> incremental repair (accessible via ...:8080/webui/ in your browser).
>>>
>>> Cheers,
>>>
>>>
>>>
>>> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <atul.sar...@snapdeal.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are not sure whether this issue is linked to that node or not. Our
>>>> application does frequent delete and insert.
>>>>
>>>> May be our approach is not correct for nodetool repair. Yes, we
>>>> generally fire repair on all boxes at same time. Till now, it was manual
>>>> with default configuration ( command: "nodetool repair").
>>>> Yes, we saw validation error but that is linked to already running
>>>> repair of  same partition on other box for same partition range. We saw
>>>> error validation failed with some ip as repair in already running for the
>>>> same SSTa

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Alexander Dejanovski
Atul,

since you're using 3.6, by default you're running incremental repair, which
doesn't like concurrency very much.
Validation errors are not occurring on a partition or partition range base,
but if you're trying to run both anticompaction and validation compaction
on the same SSTable.

Like advised to Robert yesterday, and if you want to keep on running
incremental repair, I'd suggest the following :

   - run nodetool tpstats on all nodes in search for running/pending repair
   sessions
   - If you have some, and to be sure you will avoid conflicts, roll
   restart your cluster (all nodes)
   - Then, run "nodetool repair" on one node.
   - When repair has finished on this node (track messages in the log and
   nodetool tpstats), check if other nodes are running anticompactions
   - If so, wait until they are over
   - If not, move on to the other node

You should be able to run concurrent incremental compactions on different
tables if you wish to speed up the complete repair of the cluster, but do
not try to repair the same table/full keyspace from two nodes at the same
time.

If you do not want to keep on using incremental repair, and fallback to
classic full repair, I think the only way in 3.6 to avoid anticompaction
will be to use subrange repair (Paulo mentioned that in 3.x full repair
also triggers anticompaction).

You have two options here : cassandra_range_repair (
https://github.com/BrianGallew/cassandra_range_repair) and Spotify Reaper (
https://github.com/spotify/cassandra-reaper)

cassandra_range_repair might scream about subrange + incremental not being
compatible (not sure here), but you can modify the repair_range() method by
adding a --full switch to the command line used to run repair.

We have a fork of Reaper that handles both full subrange repair and
incremental repair here : https://github.com/thelastpickle/cassandra-reaper
It comes with a tweaked version of the UI made by Stephan Podkowinski (
https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
interactions to schedule, run and track repair - which adds fields to run
incremental repair (accessible via ...:8080/webui/ in your browser).

Cheers,



On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> Hi,
>
> We are not sure whether this issue is linked to that node or not. Our
> application does frequent delete and insert.
>
> May be our approach is not correct for nodetool repair. Yes, we generally
> fire repair on all boxes at same time. Till now, it was manual with default
> configuration ( command: "nodetool repair").
> Yes, we saw validation error but that is linked to already running repair
> of  same partition on other box for same partition range. We saw error
> validation failed with some ip as repair in already running for the same
> SSTable.
> Just few days back, we had 2 DCs with 3 nodes each and replication was
> also 3. It means all data on each node.
>
> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Atul,
>>
>> could you be more specific on how you are running repair ? What's the
>> precise command line for that, does it run on several nodes at the same
>> time, etc...
>> What is your gc_grace_seconds ?
>> Do you see errors in your logs that would be linked to repairs
>> (Validation failure or failure to create a merkle tree)?
>>
>> You seem to mention a single node that went down but say the whole
>> cluster seem to have zombie data.
>> What is the connection you see between the node that went down and the
>> fact that deleted data comes back to life ?
>> What is your strategy for cyclic maintenance repair (schedule, command
>> line or tool, etc...) ?
>>
>> Thanks,
>>
>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <atul.sar...@snapdeal.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We have seen a weird behaviour in cassandra 3.6.
>>> Once our node was went down more than 10 hrs. After that, we had ran
>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>> over the cluster. On day- today basis, on expiry of every grace period,
>>> deleted records start surfacing again in cassandra.
>>>
>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>> FYI, we have 3 data centres now.
>>>
>>> Just want the help how to verify and debug this issue. Help will be
>>> appreciated.
>>>
>>>
>>> --
>>> Regards,
>>> Atul Saroha
>>>
>>> *Lead Software Engineer | CAMS*
>>>
>>> M: +91 8447784271
>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>> Udyog Vihar Phase IV,Gurg

Re: WARN Writing large partition for materialized views

2016-09-29 Thread Alexander Dejanovski
Hi Robert,

Materialized Views are regular C* tables underneath, so based on their PK
they can generate big partitions.
It is often advised to keep partition size under 100MB because larger
partitions are hard to read and compact. They usually put pressure on the
heap and lead to long GC pauses  + laggy compactions.
You could possibly OOM while trying to fully read a partition that is way
too big for your heap.

It is indeed a schema problem and you most likely have to bucket your MV in
order to split those partitions into smaller chunks. In the case of MV, you
possibly need to add a bucketing field to the table it relies on (if you
don't have one already), and add it to the MV partition key.

You should try to use cassandra-stress to test your bucket sizes :
https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsCStress.html
In your schema definition you can now specify the creation of a MV.

Cheers,


On Wed, Sep 28, 2016 at 7:35 PM Robert Sicoie <robert.sic...@gmail.com>
wrote:

> Hi guys,
>
> I run a cluster with 5 nodes, cassandra version 3.0.5.
>
> I get this warning:
> 2016-09-28 17:22:18,480 BigTableWriter.java:171 - Writing large
> partition...
>
> for some materialized view. Some have values over 500MB. How this affects
> performance? What can/should be done? I suppose is a problem in the schema
> design.
>
> Thanks,
> Robert Sicoie
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Alexander Dejanovski
Hi Atul,

could you be more specific on how you are running repair ? What's the
precise command line for that, does it run on several nodes at the same
time, etc...
What is your gc_grace_seconds ?
Do you see errors in your logs that would be linked to repairs (Validation
failure or failure to create a merkle tree)?

You seem to mention a single node that went down but say the whole cluster
seem to have zombie data.
What is the connection you see between the node that went down and the fact
that deleted data comes back to life ?
What is your strategy for cyclic maintenance repair (schedule, command line
or tool, etc...) ?

Thanks,

On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> Hi,
>
> We have seen a weird behaviour in cassandra 3.6.
> Once our node was went down more than 10 hrs. After that, we had ran
> Nodetool repair multiple times. But tombstone are not getting sync properly
> over the cluster. On day- today basis, on expiry of every grace period,
> deleted records start surfacing again in cassandra.
>
> It seems Nodetool repair in not syncing tomebstone across cluster.
> FYI, we have 3 data centres now.
>
> Just want the help how to verify and debug this issue. Help will be
> appreciated.
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Alexander Dejanovski
Robert,

You can restart them in any order, that doesn't make a difference afaik.

Cheers

Le mer. 28 sept. 2016 17:10, Robert Sicoie <robert.sic...@gmail.com> a
écrit :

> Thanks Alexander,
>
> Yes, with tpstats I can see the hanging active repair(s) (output
> attached). For one there are 31 pending repair. On others there are less
> pending repairs (min 12). Is there any recomandation for the restart order?
> The one with more less pending repairs first, perhaps?
>
> Thanks,
> Robert
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 5:35 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> They will show up in nodetool compactionstats :
>> https://issues.apache.org/jira/browse/CASSANDRA-9098
>>
>> Did you check nodetool tpstats to see if you didn't have any running
>> repair session ?
>> Just to make sure (and if you can actually do it), roll restart the
>> cluster and try again. Repair sessions can get sticky sometimes.
>>
>> On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie <robert.sic...@gmail.com>
>> wrote:
>>
>>> I am using nodetool compactionstats to check for pending compactions and
>>> it shows me 0 pending on all nodes, seconds before running nodetool repair.
>>> I am also monitoring PendingCompactions on jmx.
>>>
>>> Is there other way I can find out if is there any anticompaction running
>>> on any node?
>>>
>>> Thanks a lot,
>>> Robert
>>>
>>> Robert Sicoie
>>>
>>> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
>>>> Robert,
>>>>
>>>> you need to make sure you have no repair session currently running on
>>>> your cluster, and no anticompaction.
>>>> I'd recommend doing a rolling restart in order to stop all running
>>>> repair for sure, then start the process again, node by node, checking that
>>>> no anticompaction is running before moving from one node to the other.
>>>>
>>>> Please do not use the -pr switch as it is both useless (token ranges
>>>> are repaired only once with inc repair, whatever the replication factor)
>>>> and harmful as all anticompactions won't be executed (you'll still have
>>>> sstables marked as unrepaired even if the process has ran entirely with no
>>>> error).
>>>>
>>>> Let us know how that goes.
>>>>
>>>> Cheers,
>>>>
>>>> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <robert.sic...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Alexander,
>>>>>
>>>>> Now I started to run the repair with -pr arg and with keyspace and
>>>>> table args.
>>>>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
>>>>> RepairRunnable.java:246 - Repair session
>>>>> 89af4d10-856f-11e6-b28f-df99132d7979 for range
>>>>> [(8323429577695061526,8326640819362122791],
>>>>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
>>>>> 10.45.113.88"
>>>>>
>>>>> for one of the tables. 10.45.113.88 is the ip of the machine I am
>>>>> running the nodetool on.
>>>>> I'm wondering if this is normal...
>>>>>
>>>>> Thanks,
>>>>> Robert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Robert Sicoie
>>>>>
>>>>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
>>>>> a...@thelastpickle.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> nodetool scrub won't help here, as what you're experiencing is most
>>>>>> likely that one SSTable is going through anticompaction, and then another
>>>>>> node is asking for a Merkle tree that involves it.
>>>>>> For understandable reasons, an SSTable cannot be anticompacted and
>>>>>> validation compacted at the same time.
>>>>>>
>>>>>> The solution here is to adjust the repair pressure on your cluster so
>>>>>> that anticompaction can end before you run repair on another node.
>>>>>> You may have a lot of anticompaction to do if you had high volumes of
>>>>>> unrepaired data, which can take a long time depending on several factors.
>>>>>>
>>>>>&g

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Alexander Dejanovski
They will show up in nodetool compactionstats :
https://issues.apache.org/jira/browse/CASSANDRA-9098

Did you check nodetool tpstats to see if you didn't have any running repair
session ?
Just to make sure (and if you can actually do it), roll restart the cluster
and try again. Repair sessions can get sticky sometimes.

On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie <robert.sic...@gmail.com>
wrote:

> I am using nodetool compactionstats to check for pending compactions and
> it shows me 0 pending on all nodes, seconds before running nodetool repair.
> I am also monitoring PendingCompactions on jmx.
>
> Is there other way I can find out if is there any anticompaction running
> on any node?
>
> Thanks a lot,
> Robert
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Robert,
>>
>> you need to make sure you have no repair session currently running on
>> your cluster, and no anticompaction.
>> I'd recommend doing a rolling restart in order to stop all running repair
>> for sure, then start the process again, node by node, checking that no
>> anticompaction is running before moving from one node to the other.
>>
>> Please do not use the -pr switch as it is both useless (token ranges are
>> repaired only once with inc repair, whatever the replication factor) and
>> harmful as all anticompactions won't be executed (you'll still have
>> sstables marked as unrepaired even if the process has ran entirely with no
>> error).
>>
>> Let us know how that goes.
>>
>> Cheers,
>>
>> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <robert.sic...@gmail.com>
>> wrote:
>>
>>> Thanks Alexander,
>>>
>>> Now I started to run the repair with -pr arg and with keyspace and table
>>> args.
>>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
>>> RepairRunnable.java:246 - Repair session
>>> 89af4d10-856f-11e6-b28f-df99132d7979 for range
>>> [(8323429577695061526,8326640819362122791],
>>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
>>> 10.45.113.88"
>>>
>>> for one of the tables. 10.45.113.88 is the ip of the machine I am
>>> running the nodetool on.
>>> I'm wondering if this is normal...
>>>
>>> Thanks,
>>> Robert
>>>
>>>
>>>
>>>
>>> Robert Sicoie
>>>
>>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> nodetool scrub won't help here, as what you're experiencing is most
>>>> likely that one SSTable is going through anticompaction, and then another
>>>> node is asking for a Merkle tree that involves it.
>>>> For understandable reasons, an SSTable cannot be anticompacted and
>>>> validation compacted at the same time.
>>>>
>>>> The solution here is to adjust the repair pressure on your cluster so
>>>> that anticompaction can end before you run repair on another node.
>>>> You may have a lot of anticompaction to do if you had high volumes of
>>>> unrepaired data, which can take a long time depending on several factors.
>>>>
>>>> You can tune your repair process to make sure no anticompaction is
>>>> running before launching a new session on another node or you can try my
>>>> Reaper fork that handles incremental repair :
>>>> https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
>>>> I may have to add a few checks in order to avoid all collisions between
>>>> anticompactions and new sessions, but it should be helpful if you struggle
>>>> with incremental repair.
>>>>
>>>> In any case, check if your nodes are still anticompacting before trying
>>>> to run a new repair session on a node.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <robert.sic...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I have a cluster of 5 nodes, cassandra 3.0.5.
>>>>> I was running nodetool repair last days, one node at a time, when I
>>>>> first encountered this exception
>>>>>
>>>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
>>>>> CassandraDaemon.java:195 - Exception in thread
>>

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Alexander Dejanovski
k.java:56)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_60]*
> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]*
> *Caused by: java.lang.InterruptedException: null*
> * at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
> ~[na:1.8.0_60]*
> * at
> org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * ... 6 common frames omitted*
>
>
> Now if I run nodetool repair I get the
>
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
>
> exception.
> What do you suggest? would nodetool scrub or sstablescrub help in this
> case. or it would just make it worse?
>
> Thanks,
>
> Robert
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: race condition for quorum consistency

2016-09-15 Thread Alexander Dejanovski
I haven't been very accurate in my first answer indeed, which was
misleading.
Apache Cassandra guarantees that if all queries are ran at least at quorum,
a client writing successfully (as in the cluster acknowledged the write)
then reading his previous write will see the correct value unless another
client updated it between the write and the read (which would be a race
condition). Same goes for two different clients if the first issues a
successful write and only after that the second reads the value.
Quorum provides consistency guaranty if queries are fired in sequence.

Without diving into complex scenarios where it may work because of read
repair and the fact that everything is async, Ken's use case was : C1
writes, it is not successful yet, C2 and C3 read at the approx. same time.
Once again, in this case C2 and C3 could be reading a different value as
C1's mutation could be in pending state on some nodes. Considering we have
nodes A, B and C :

   - Node A has received the write from C1, nodes B and C have not
   - C2 reads from A and B, there's a digest mismatch which triggers a
   foreground read repair (background read repairs are triggered at CL ONE) >
   it gets the up to date value that was written by C1
   - C3 reads from B and C, there's no digest mismatch and the value is not
   up to date with A > it does not get the value written by C1


Cheers,


On Thu, Sep 15, 2016 at 12:10 AM Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Wed, Sep 14, 2016 at 3:49 PM, Nicolas Douillet <
> nicolas.douil...@gmail.com> wrote:
>
>>
>>-
>>- during read requests, cassandra will ask to one node the data and
>>to the others involved in the CL a digest, and if all digests do not 
>> match,
>>will ask for them the entire data, handle the merge and finally will ask 
>> to
>>those nodes a background repair. Your write may have succeed during this
>>time.
>>
>>
> This is very good info, but as a minor correction, the repair here will
> happen in the foreground before the response is returned to the client.
> So, at least from a single client's perspective, you get monotonic reads.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: race condition for quorum consistency

2016-09-14 Thread Alexander Dejanovski
My understanding of the described scenario is that the write hasn't
succeeded when reads are fired, as B and C haven't processed the mutation
yet.

There would be 3 clients here and not 2 : C1 writes, C2 and C3 read.

So the race condition could still happen in this particular case.

Le mer. 14 sept. 2016 21:07, Work <jrother...@codojo.me> a écrit :

> Hi Alex:
>
> Hmmm ... Assuming clock skew is eliminated And assuming nodes are up
> and available ... And assuming quorum writes and quorum reads and everyone
> waiting for success ( which is NOT The OP scenario), Two different clients
> will be guaranteed to see all successful writes, or be told that read
> failed.
>
> C1 writes at quorum to A,B
> C2 reads at quorum.
> So it tries to read from ALL nodes, A,B, C.
> If A,B respond --> success
> If A,C respond --> conflict
> If B, C respond --> conflict
> Because a quorum (2 nodes) responded, the coordinator will return the
> latest time stamp and may issue read repair depending on YAML settings.
>
> So where do you see only one client having this guarantee?
>
> Regards,
>
> James
>
> On Sep 14, 2016, at 4:00 AM, Alexander DEJANOVSKI <adejanov...@gmail.com>
> wrote:
>
> Hi,
>
> the analysis is valid, and strong consistency the Cassandra way means that
> one client writing at quorum, then reading at quorum will always see his
> previous write.
> Two different clients have no guarantee to see the same data when using
> quorum, as illustrated in your example.
>
> Only options here are to route requests to specific clients based on some
> id to guarantee the sequence of operations outside of Cassandra (the same
> client will always be responsible for a set of ids), or raise the CL to ALL
> at the expense of availability (you should not do that).
>
>
> Cheers,
>
> Alex
>
> Le mer. 14 sept. 2016 à 11:47, Qi Li <ken.l...@gmail.com> a écrit :
>
>> hi all,
>>
>> we are using quorum consistency, and we *suspect* there may be a race
>> condition during the write. lets say RF is 3. so write will wait for at
>> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
>> 2 nodes(node B, C) are still waiting to update. there come two read requests
>> one read is having the data responded from the node B and C, so version 1
>> us returned.
>> the other node is having data responded from node A and B, so the latest
>> version 2 is returned.
>>
>> so clients are getting different data at the same time. is this a valid
>> analysis? if so, is there any options we can set to deal with this issue?
>>
>> thanks
>> Ken
>>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: race condition for quorum consistency

2016-09-14 Thread Alexander DEJANOVSKI
Hi,

the analysis is valid, and strong consistency the Cassandra way means that
one client writing at quorum, then reading at quorum will always see his
previous write.
Two different clients have no guarantee to see the same data when using
quorum, as illustrated in your example.

Only options here are to route requests to specific clients based on some
id to guarantee the sequence of operations outside of Cassandra (the same
client will always be responsible for a set of ids), or raise the CL to ALL
at the expense of availability (you should not do that).


Cheers,

Alex

Le mer. 14 sept. 2016 à 11:47, Qi Li  a écrit :

> hi all,
>
> we are using quorum consistency, and we *suspect* there may be a race
> condition during the write. lets say RF is 3. so write will wait for at
> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
> 2 nodes(node B, C) are still waiting to update. there come two read requests
> one read is having the data responded from the node B and C, so version 1
> us returned.
> the other node is having data responded from node A and B, so the latest
> version 2 is returned.
>
> so clients are getting different data at the same time. is this a valid
> analysis? if so, is there any options we can set to deal with this issue?
>
> thanks
> Ken
>


Re: How to start using incremental repairs?

2016-09-12 Thread Alexander DEJANOVSKI
Hi Paulo,

don't you think it might be better to keep applying the migration procedure
whatever the version ?
Anticompaction is pretty expensive on big SSTables and if the cluster has a
lot of data, the first run might be very very long if the nodes are dense,
and especially with a high number of vnodes.
We've seen this on clusters that had just upgraded from 2.1 to 3.0, where
the first incremental repair was taking a ridiculous amount of time as
there were loads of anticompaction running.

Indeed, if you run an inc repair on all ranges on a node, it can skip
anticompation by just marking SSTables as being repaired (which is fast),
but the rest of the nodes will still have to perform anticompaction as they
won't share all of its token ranges. Right ?

Cheers,

Alex

Le lun. 12 sept. 2016 à 13:56, Paulo Motta  a
écrit :

> > Can you clarify me please if what you said here applies for the version
> 2.1.14.
>
> yes
>
> 2016-09-06 5:50 GMT-03:00 Jean Carlo :
>
>> Hi Paulo
>>
>> Can you clarify me please if what you said here
>>
>> 1. Migration procedure is no longer necessary after CASSANDRA-8004, and
>> since you never ran repair before this would not make any difference
>> anyway, so just run repair and by default (CASSANDRA-7250) this will
>> already be incremental.
>>
>> applies for the version 2.1.14. I ask because I see that the jira
>> CASSANDRA-8004 is resolved for the version 2.1.2 and we are considering to
>> migrate to repairs inc before go to the version 3.0.x
>>
>> Thhx :)
>>
>>
>> Saludos
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Fri, Aug 26, 2016 at 9:04 PM, Stefano Ortolani 
>> wrote:
>>
>>> An extract of this conversation should definitely be posted somewhere.
>>> Read a lot but never learnt all these bits...
>>>
>>> On Fri, Aug 26, 2016 at 2:53 PM, Paulo Motta 
>>> wrote:
>>>
 > I must admit that I fail to understand currently how running repair
 with -pr could leave unrepaired data though, even when ran on all nodes in
 all DCs, and how that could be specific to incremental repair (and would
 appreciate if someone shared the explanation).

 Anti-compaction, which marks tables as repaired, is disabled for
 partial range repairs (which includes partitioner-range repair) to avoid
 the extra I/O cost of needing to run anti-compaction multiple times in a
 node to repair it completely. For example, there is an optimization which
 skips anti-compaction for sstables fully contained in the repaired range
 (only the repairedAt field is mutated), which is leveraged by full range
 repair, which would not work in many cases for partial range repairs,
 yielding higher I/O.

 2016-08-26 10:17 GMT-03:00 Stefano Ortolani :

> I see. Didn't think about it that way. Thanks for clarifying!
>
>
> On Fri, Aug 26, 2016 at 2:14 PM, Paulo Motta  > wrote:
>
>> > What is the underlying reason?
>>
>> Basically to minimize the amount of anti-compaction needed, since
>> with RF=3 you'd need to perform anti-compaction 3 times in a particular
>> node to get it fully repaired, while without it you can just repair the
>> full node's range in one run. Assuming you run repair frequent enough 
>> this
>> will not be a big deal, since you will skip already repaired data in the
>> next round so you will not have the problem of re-doing work as in 
>> non-inc
>> non-pr repair.
>>
>> 2016-08-26 7:57 GMT-03:00 Stefano Ortolani :
>>
>>> Hi Paulo, could you elaborate on 2?
>>> I didn't know incremental repairs were not compatible with -pr
>>> What is the underlying reason?
>>>
>>> Regards,
>>> Stefano
>>>
>>>
>>> On Fri, Aug 26, 2016 at 1:25 AM, Paulo Motta <
>>> pauloricard...@gmail.com> wrote:
>>>
 1. Migration procedure is no longer necessary after CASSANDRA-8004,
 and since you never ran repair before this would not make any 
 difference
 anyway, so just run repair and by default (CASSANDRA-7250) this will
 already be incremental.
 2. Incremental repair is not supported with -pr, -local or -st/-et
 options, so you should run incremental repair in all nodes in all DCs
 sequentially (you should be aware that this will probably generate 
 inter-DC
 traffic), no need to disable autocompaction or stopping nodes.

 2016-08-25 18:27 GMT-03:00 Aleksandr Ivanov :

> I’m new in Cassandra and trying to figure out how to _start_ using
> incremental repairs. I have seen article about “Migrating to 
> incremental
> repairs” but since I didn’t use repairs before at all and I use 
> 

Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Alexander DEJANOVSKI
Hi Siddharth,

yes, we are sure token ranges will never overlap (I think the start token
in describering output is excluded and the end token included).

You can get per host information in the Datastax Java driver using :

Set rangesForKeyspace =  cluster.getMetadata().getTokenRanges(
keyspaceName, host);
Bye,

Alex

Le mar. 30 août 2016 à 10:04, Siddharth Verma 
a écrit :

> Hi ,
> Can we be sure that, token ranges in nodetool describering will be non
> overlapping?
>
> Thanks
> Siddharth Verma
>


Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Alexander DEJANOVSKI
Hi Siddarth,

I would recommend running "nodetool describering keyspace_name" as its
output is much simpler to reason about :

Schema Version:9a091b4e-3712-3149-b187-d2b09250a19b

TokenRange:

TokenRange(start_token:1943978523300203561, end_token:2137919499801737315,
endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.7, 127.0.0.2, 127.0.0.5,
127.0.0.1], rpc_endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.7, 127.0.0.2,
127.0.0.5, 127.0.0.1], endpoint_details:[EndpointDetails(host:127.0.0.3,
datacenter:dc1, rack:r1), EndpointDetails(host:127.0.0.6, datacenter:dc2,
rack:r1), EndpointDetails(host:127.0.0.7, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.2, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.5, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.1, datacenter:dc1, rack:r1)])

TokenRange(start_token:6451470843510300950, end_token:7799236518897713874,
endpoints:[127.0.0.6, 127.0.0.4, 127.0.0.1, 127.0.0.3, 127.0.0.5,
127.0.0.2], rpc_endpoints:[127.0.0.6, 127.0.0.4, 127.0.0.1, 127.0.0.3,
127.0.0.5, 127.0.0.2], endpoint_details:[EndpointDetails(host:127.0.0.6,
datacenter:dc2, rack:r1), EndpointDetails(host:127.0.0.4, datacenter:dc2,
rack:r1), EndpointDetails(host:127.0.0.1, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.3, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.5, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.2, datacenter:dc1, rack:r1)])

TokenRange(start_token:-2494488779943368698,
end_token:-2344803022847488784, endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.6,
127.0.0.5, 127.0.0.3, 127.0.0.2], rpc_endpoints:[127.0.0.1, 127.0.0.4,
127.0.0.6, 127.0.0.5, 127.0.0.3, 127.0.0.2],
endpoint_details:[EndpointDetails(host:127.0.0.1, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.4, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.6, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.5, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.3, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.2, datacenter:dc1, rack:r1)])

TokenRange(start_token:-3354341409828719744,
end_token:-3001704612215276412, endpoints:[127.0.0.7, 127.0.0.1, 127.0.0.4,
127.0.0.6, 127.0.0.3, 127.0.0.2], rpc_endpoints:[127.0.0.7, 127.0.0.1,
127.0.0.4, 127.0.0.6, 127.0.0.3, 127.0.0.2],
endpoint_details:[EndpointDetails(host:127.0.0.7, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.1, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.4, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.6, datacenter:dc2, rack:r1),
EndpointDetails(host:127.0.0.3, datacenter:dc1, rack:r1),
EndpointDetails(host:127.0.0.2, datacenter:dc1, rack:r1)])


It will give you the start and end token of each range (vnode) and the list
of the replicas for each (the first endpoint being the primary).

Hope this helps you figuring out your token distribution.

Alex

Le mar. 30 août 2016 à 09:11, Siddharth Verma 
a écrit :

> Hi,
> I saw that in cassandra-driver-core,(3.1.0) Metadata.TokenMap has
> primaryToTokens which has the value for ALL the nodes.
> I tried to find (primary)range ownership for nodes in one DC.
> And executed the following in debug mode in IDE.
>
> TreeMap primaryTokenMap = new TreeMap<>();
> for(Host host :
> main.cluster.getMetadata().tokenMap.primaryToTokens.keySet()){
> if(!host.getDatacenter().equals("dc2"))
> continue;
> for(Token token :
> main.cluster.getMetadata().tokenMap.primaryToTokens.get(host)){
> primaryTokenMap.put((Long) token.getValue(),host);
>
> }
> }
> primaryTokenMap //this printed the map in evaluate code fragment window
>
> dc2 has 3 nodes, RF is 3
> Sample entries :
> 244925668410340093 -> /10.0.3.79:9042
> 291047688656337660 -> /10.0.3.217:9042
> 317775761591844910 -> /10.0.3.135:9042
> 328177243900091789 -> /10.0.3.79:9042
> 329239043633655596 -> /10.0.3.135:9042
> 
>
> Can I safely assume
> Token
> Range
> Host
> 244925668410340093 to 291047688656337660 -1 belongs to 10.0.3.79:9042
> 291047688656337660 to 317775761591844910 -1 belongs to 10.0.3.135:9042
> 317775761591844910 to 328177243900091789 -1 belongs to 10.0.3.135:9042
> And so on.
>
> Is the above assumption ABSOLUTELY correct?
> (Kindly suggest changes/errors, if any)
>
> Any help would be great.
> Thanks and Regards,
> Siddharth Verma
>


Re: New data center to an existing cassandra cluster

2016-08-27 Thread Alexander DEJANOVSKI
Reads at quorum in dc3 will involve dc1 and dc2 as they will require a
response from more than half the replicas throughout the Cluster.

If you're using RF=3 in each DC, each read will need at least 5 responses,
which DC3 cannot provide on its own.

You can have troubles if DC3 has more than half then replicas, but I
guess/hope it is not the case, otherwise you're fine.

You would be in trouble though if you were using local_quorum on DC3 or ONE
on any DC.



Le sam. 27 août 2016 19:11, Surbhi Gupta  a
écrit :

> Yes, it will have issue during the time new nodes are building
> So it is always advised to use LOCAL_QUORUM instead of QUORUM and
> LOCAL_ONE instead of ONE
>
> On 27 August 2016 at 09:45, laxmikanth sadula 
> wrote:
>
>> Hi,
>>
>> I'm going to add a new data center DC3 to an existing cassandra cluster
>> which has already 2 data centers DC1 , DC2.
>>
>> The thing I'm worried of is about tables in one keyspace which has QUORUM
>> reads and NOT LOCAL_QUORUM.
>> So while adding a new data centers with auto_bootstrap:false and
>> 'nodetool rebuild' , will the queries to tables in this keyspace will have
>> any issue ?
>>
>> Thanks in advance.
>>
>> --
>> Regards,
>> Laxmikanth
>>
>
>


Re: How to start using incremental repairs?

2016-08-26 Thread Alexander DEJANOVSKI
After running some tests I can confirm that using -pr leaves unrepaired
SSTables, while removing it shows repaired SSTables only once repair is
completed.

The purpose of -pr was to lighten the repair process by not repairing
ranges RF times, but just once. With incremental repair though, repaired
data is marked as such and will be skipped on the next session, making -pr
kinda useless.

I must admit that I fail to understand currently how running repair with
-pr could leave unrepaired data though, even when ran on all nodes in all
DCs, and how that could be specific to incremental repair (and would
appreciate if someone shared the explanation).

On a side note, I have a Spotify Reaper fork that handles incremental
repair, and embeds the UI of Stefan Podkowinski, tweaked to add incremental
repair inputs :
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui

Compile it with maven and run with : java -jar
target/cassandra-reaper-0.2.4-SNAPSHOT.jar server
resource/cassandra-reaper.yaml

Then go to http://127.0.0.1:8081/webui/



Le ven. 26 août 2016 à 12:58, Stefano Ortolani  a
écrit :

> Hi Paulo, could you elaborate on 2?
> I didn't know incremental repairs were not compatible with -pr
> What is the underlying reason?
>
> Regards,
> Stefano
>
>
> On Fri, Aug 26, 2016 at 1:25 AM, Paulo Motta 
> wrote:
>
>> 1. Migration procedure is no longer necessary after CASSANDRA-8004, and
>> since you never ran repair before this would not make any difference
>> anyway, so just run repair and by default (CASSANDRA-7250) this will
>> already be incremental.
>> 2. Incremental repair is not supported with -pr, -local or -st/-et
>> options, so you should run incremental repair in all nodes in all DCs
>> sequentially (you should be aware that this will probably generate inter-DC
>> traffic), no need to disable autocompaction or stopping nodes.
>>
>> 2016-08-25 18:27 GMT-03:00 Aleksandr Ivanov :
>>
>>> I’m new in Cassandra and trying to figure out how to _start_ using
>>> incremental repairs. I have seen article about “Migrating to incremental
>>> repairs” but since I didn’t use repairs before at all and I use Cassandra
>>> version v3.0.8, then maybe not all steps are needed which are mentioned in
>>> Datastax article.
>>> Should I start with full repair or I can start with executing “nodetool
>>> repair -pr  my_keyspace” on all nodes without autocompaction disabling and
>>> node stopping?
>>>
>>> I have 6 datacenters with 6 nodes in each DC. Is it enough to run
>>>  “nodetool repair -pr  my_keyspace” in one DC only or it should be executed
>>> on all nodes in _all_ DCs?
>>>
>>> I have tried to perform “nodetool repair -pr  my_keyspace” on all nodes
>>> in all datacenters sequentially but I still can see non repaired SSTables
>>> for my_keyspace   (Repaired at: 0). Is it expected behavior if during
>>> repair data in my_keyspace wasn’t modified (no writes, no reads)?
>>>
>>
>>
>


Re: How to start using incremental repairs?

2016-08-26 Thread Alexander DEJANOVSKI
There are 2 main reasons I see for still having unrepaired sstables after
running nodetool repair -pr :

1- new data is still flowing in your database after the repair sessions
were launched, and thus hasn't been repaired
2- some repair sessions failed and left unrepaired data on your nodes.
Incremental repair isn't fond of concurrency, as an SSTable cannot be
anticompacted and go through validation compaction at the same time. So if
an SSTable is being anticompacted and another node asks for a merkle tree
that involves it, it will fail with a message in the system.log saying that
an sstable cannot be involved in more than one repair session at a time
(search for validation failures in your cassandra log).
Best chance to have it succeed IMHO is to run inc repair one node at a time.

Le ven. 26 août 2016 à 08:02, Aleksandr Ivanov  a écrit :

> Thanks for confirmation Paulo. Then my understanding of proccess was
> correct.
>
> I'm curious why I still see unrepaired sstables after performing repair
> -pr on all nodes in all datacenters...
>
> пт, 26 Авг 2016 г., 3:25 Paulo Motta :
>
>> 1. Migration procedure is no longer necessary after CASSANDRA-8004, and
>> since you never ran repair before this would not make any difference
>> anyway, so just run repair and by default (CASSANDRA-7250) this will
>> already be incremental.
>> 2. Incremental repair is not supported with -pr, -local or -st/-et
>> options, so you should run incremental repair in all nodes in all DCs
>> sequentially (you should be aware that this will probably generate inter-DC
>> traffic), no need to disable autocompaction or stopping nodes.
>>
>> 2016-08-25 18:27 GMT-03:00 Aleksandr Ivanov :
>>
>>> I’m new in Cassandra and trying to figure out how to _start_ using
>>> incremental repairs. I have seen article about “Migrating to incremental
>>> repairs” but since I didn’t use repairs before at all and I use Cassandra
>>> version v3.0.8, then maybe not all steps are needed which are mentioned in
>>> Datastax article.
>>> Should I start with full repair or I can start with executing “nodetool
>>> repair -pr  my_keyspace” on all nodes without autocompaction disabling and
>>> node stopping?
>>>
>>> I have 6 datacenters with 6 nodes in each DC. Is it enough to run
>>>  “nodetool repair -pr  my_keyspace” in one DC only or it should be executed
>>> on all nodes in _all_ DCs?
>>>
>>> I have tried to perform “nodetool repair -pr  my_keyspace” on all nodes
>>> in all datacenters sequentially but I still can see non repaired SSTables
>>> for my_keyspace   (Repaired at: 0). Is it expected behavior if during
>>> repair data in my_keyspace wasn’t modified (no writes, no reads)?
>>>
>>
>>


<    1   2