About the data structure of partition index

2016-05-17 Thread Hiroyuki Yamada
Hi,

I am wondering how many primary keys are stored in one partition index.

As the following documents say,




I understand that each partition index has a list of primary keys and
the start position of compression offset map,
So, I assume the logical data structure of a partition index would be
like the following:

| [pkey1-pkeyN] | offset-to compression offset map |
(indexed by the first column to retrieve by a partition key)

I am wondering if it is a correct understanding and
how many primary keys are stored in the first column.

If it is not correct, would anyone give me the correct logical data structure ?

Thanks,
Hiro


Re: Accessing Cassandra data from Spark Shell

2016-05-17 Thread Cassa L
Hi,
I followed instructions to run SparkShell with Spark-1.6. It works fine.
However, I need to use spark-1.5.2 version. With it, it does not work. I
keep getting NoSuchMethod Errors. Is there any issue running Spark Shell
for Cassandra using older version of Spark?


Regards,
LCassa

On Tue, May 10, 2016 at 6:48 PM, Mohammed Guller 
wrote:

> Yes, it is very simple to access Cassandra data using Spark shell.
>
>
>
> Step 1: Launch the spark-shell with the spark-cassandra-connector package
>
> $SPARK_HOME/bin/spark-shell --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.5.0
>
>
>
> Step 2: Create a DataFrame pointing to your Cassandra table
>
> val dfCassTable = sqlContext.read
>
>
> .format("org.apache.spark.sql.cassandra")
>
>  .options(Map(
> "table" -> "your_column_family", "keyspace" -> "your_keyspace"))
>
>  .load()
>
>
>
> From this point onward, you have complete access to the DataFrame API. You
> can even register it as a temporary table, if you would prefer to use
> SQL/HiveQL.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> 
>
>
>
> *From:* Ben Slater [mailto:ben.sla...@instaclustr.com]
> *Sent:* Monday, May 9, 2016 9:28 PM
> *To:* user@cassandra.apache.org; user
> *Subject:* Re: Accessing Cassandra data from Spark Shell
>
>
>
> You can use SparkShell to access Cassandra via the Spark Cassandra
> connector. The getting started article on our support page will probably
> give you a good steer to get started even if you’re not using Instaclustr:
> https://support.instaclustr.com/hc/en-us/articles/213097877-Getting-Started-with-Instaclustr-Spark-Cassandra-
>
>
>
> Cheers
>
> Ben
>
>
>
> On Tue, 10 May 2016 at 14:08 Cassa L  wrote:
>
> Hi,
>
> Has anyone tried accessing Cassandra data using SparkShell? How do you do
> it? Can you use HiveContext for Cassandra data? I'm using community version
> of Cassandra-3.0
>
>
>
> Thanks,
>
> LCassa
>
> --
>
> 
>
> Ben Slater
>
> Chief Product Officer, Instaclustr
>
> +61 437 929 798
>


Re: Bloom filter memory usage disparity

2016-05-17 Thread Jeff Jirsa
Even with the same data, bloom filter is based on sstables. If your compaction 
behaves differently on 2 nodes than the third, your bloom filter RAM usage may 
be different.


From:  Kai Wang
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, May 17, 2016 at 8:02 PM
To:  "user@cassandra.apache.org"
Subject:  Re: Bloom filter memory usage disparity

Alain,

Thanks for replying.

I am using C* 2.2.4. 
Yes the table is RF=3. 
I changed bloom_filter_fp_chance from 0.01 to 0.1 a couple of months ago.


On Tue, May 17, 2016 at 11:05 AM, Alain RODRIGUEZ  wrote:
Hi, we would need more information here (if you did not solve it yet). 

What is your Cassandra version?
Does this 3 node cluster use a Replication Factor of 3?
Did you change the bloom_filter_fp_chance recently?

That table has about 16M keys and 140GB of data.

Is that the total value or per node? In any case, we need the data size for the 
3 nodes to understand.

It might have been a temporary situation, but in this case you would know by 
now.

C*heers,


2016-05-03 18:47 GMT+02:00 Kai Wang :
Hi,

I have a table on 3-node cluster. I notice bloom filter memory usage are very 
different on one of the node. For a given table, I checked 
CassandraMetricsRegistry$JmxGauge.[table]_BloomFilterOffHeapMemoryUsed.Value. 2 
of 3 nodes show 1.5GB while the other shows 2.5 GB.

What could be the reason?

That table is using LCS. 
bloom_filter_fp_chance=0.1
That table has about 16M keys and 140GB of data.

Thanks.





smime.p7s
Description: S/MIME cryptographic signature


Re: restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Ben Slater
It should definitely work if you use sstableloader to load all the files. I
imagine it is possible doing a straight restore (copy sstables) if you
assign the tokens from multiple source nodes to one target node  using the
initial_token parameter in cassandra.yaml.

Cheers
Ben

On Wed, 18 May 2016 at 10:35 Luigi Tagliamonte  wrote:

> Hi everyone,
> i'm wondering if it is possible to restore all the snapshots of a cluster
> (10 nodes) in a smaller cluster (3 nodes)? If yes how to do it?
>
> --
> Luigi
> ---
> “The only way to get smarter is by playing a smarter opponent.”
>
-- 

Ben Slater
Chief Product Officer, Instaclustr
+61 437 929 798


Re: restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Jeff Jirsa
http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated



From:  Luigi Tagliamonte
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, May 17, 2016 at 5:35 PM
To:  "user@cassandra.apache.org"
Subject:  restore cassandra snapshots on a smaller cluster

Hi everyone,
i'm wondering if it is possible to restore all the snapshots of a cluster (10 
nodes) in a smaller cluster (3 nodes)? If yes how to do it?

-- 
Luigi
---
“The only way to get smarter is by playing a smarter opponent.”



smime.p7s
Description: S/MIME cryptographic signature


restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Luigi Tagliamonte
Hi everyone,
i'm wondering if it is possible to restore all the snapshots of a cluster
(10 nodes) in a smaller cluster (3 nodes)? If yes how to do it?

-- 
Luigi
---
“The only way to get smarter is by playing a smarter opponent.”


Re: Applying TTL Change quickly

2016-05-17 Thread Jeff Jirsa
Fastest way? Stop cassandra, use sstablemetadata to remove any files with 
maxTimestamp > 2 days. Start cassandra. Works better with some compaction 
strategies than others (probably find a few droppable sstables with either DTCS 
/ STCS, but not perfect). 

Cleanest way? One by one (starting with oldest sstables first), use 
forceUserDefinedCompaction on each sstable and let it purge out the droppable 
garbage. This is what the tombstone sub properties would do.




From:  Anubhav Kale
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, May 17, 2016 at 4:17 PM
To:  "user@cassandra.apache.org"
Subject:  Applying TTL Change quickly

Hello,

 

We use STCS and DTCS on our tables and recently made a TTL change (reduced from 
8 days to 2) on a table with large amounts of data. What is the best way to 
quickly purge old data ? I am playing with tombstone_compaction_interval at the 
moment, but would like some suggestions on what else can be done to reclaim the 
space as quick as possible.

 

Thanks !



smime.p7s
Description: S/MIME cryptographic signature


Applying TTL Change quickly

2016-05-17 Thread Anubhav Kale
Hello,

We use STCS and DTCS on our tables and recently made a TTL change (reduced from 
8 days to 2) on a table with large amounts of data. What is the best way to 
quickly purge old data ? I am playing with tombstone_compaction_interval at the 
moment, but would like some suggestions on what else can be done to reclaim the 
space as quick as possible.

Thanks !


Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
OK to make things even more confusing, the “Release” files in the Apache Repo 
say "Origin: Unofficial Cassandra Packages”!!

i.e. http://dl.bintray.com/apache/cassandra/dists/35x/:Release


> On May 17, 2016, at 12:11 PM, Drew Kutcharian  wrote:
> 
> BTW, the language on this page should probably change since it currently 
> sounds like the official repo is the DataStax one and Apache is only an 
> “alternative"
> 
> http://wiki.apache.org/cassandra/DebianPackaging
> 
> - Drew
> 
>> On May 17, 2016, at 11:35 AM, Drew Kutcharian  wrote:
>> 
>> Thanks Eric.
>> 
>> 
>>> On May 17, 2016, at 7:50 AM, Eric Evans  wrote:
>>> 
>>> On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian  wrote:
 
 What’s the difference between the two “Community” repositories Apache 
 (http://www.apache.org/dist/cassandra/debian) and DataStax 
 (http://debian.datastax.com/community/)?
>>> 
>>> Good question.  All I can tell you is that the Apache repository is
>>> the official one (the only official one).
>>> 
 If they are just mirrors, then it seems like the DataStax one is a bit 
 behind (version 3.0.6 is available on Apache but not on DataStax).
 
 I’ve been using the DataStax community repo and wanted to see if I still 
 should continue using it or switch to the Apache repo.
>>> 
>>> If it is your intention to run Apache Cassandra, from the Apache
>>> Cassandra project, then you should be using the Apache repo.
>>> 
>>> -- 
>>> Eric Evans
>>> eev...@apache.org
>> 
> 



Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
BTW, the language on this page should probably change since it currently sounds 
like the official repo is the DataStax one and Apache is only an “alternative"

http://wiki.apache.org/cassandra/DebianPackaging

- Drew

> On May 17, 2016, at 11:35 AM, Drew Kutcharian  wrote:
> 
> Thanks Eric.
> 
> 
>> On May 17, 2016, at 7:50 AM, Eric Evans  wrote:
>> 
>> On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian  wrote:
>>> 
>>> What’s the difference between the two “Community” repositories Apache 
>>> (http://www.apache.org/dist/cassandra/debian) and DataStax 
>>> (http://debian.datastax.com/community/)?
>> 
>> Good question.  All I can tell you is that the Apache repository is
>> the official one (the only official one).
>> 
>>> If they are just mirrors, then it seems like the DataStax one is a bit 
>>> behind (version 3.0.6 is available on Apache but not on DataStax).
>>> 
>>> I’ve been using the DataStax community repo and wanted to see if I still 
>>> should continue using it or switch to the Apache repo.
>> 
>> If it is your intention to run Apache Cassandra, from the Apache
>> Cassandra project, then you should be using the Apache repo.
>> 
>> -- 
>> Eric Evans
>> eev...@apache.org
> 



Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
Thanks Eric.


> On May 17, 2016, at 7:50 AM, Eric Evans  wrote:
> 
> On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian  wrote:
>> 
>> What’s the difference between the two “Community” repositories Apache 
>> (http://www.apache.org/dist/cassandra/debian) and DataStax 
>> (http://debian.datastax.com/community/)?
> 
> Good question.  All I can tell you is that the Apache repository is
> the official one (the only official one).
> 
>> If they are just mirrors, then it seems like the DataStax one is a bit 
>> behind (version 3.0.6 is available on Apache but not on DataStax).
>> 
>> I’ve been using the DataStax community repo and wanted to see if I still 
>> should continue using it or switch to the Apache repo.
> 
> If it is your intention to run Apache Cassandra, from the Apache
> Cassandra project, then you should be using the Apache repo.
> 
> -- 
> Eric Evans
> eev...@apache.org



Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-17 Thread Andres de la Peña
Hi Siddarth,

Lucene doesn't immediately remove deleted documents from disk. Instead, it
just marks them as deleted, and they are effectively removed during
segments merge. This is quite similar to how C* manages deletions with
tombstones and compactions.

Regards,

2016-05-17 17:30 GMT+01:00 Siddharth Verma :

> Hi Eduardo,
> Thanks for your reply. If it is fixed in 3.0.5.1, we will shift to it.
>
> One more question,
> If instead of truncating table, we remove some rows, then
> are the lucene documents and indexes for those rows deleted?
>



-- 
Andrés de la Peña

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*


Re: MigrationManager.java:164 - Migration task failed to complete

2016-05-17 Thread Alain RODRIGUEZ
There is not that much context here, so I will do a standard answer too.

If you have a doubt regarding the data owned by a node, running repair
takes some resources but should never break anything. I mean it is an
operation you can be running as much as you want. So I would use it, just
in case.

If the repair finishes successfully, your data is now consistent.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-03 20:09 GMT+02:00 Zhang, Charles :

> I have seen a bunch of them in the log files of some newly joined nodes. I
> did a search in google and it seems increasing the countdown latch timeout
> can solve this problem. But I assume it only resolves it for future nodes
> when joining happens? For the existing nodes, anything needs to be done?
>


Re: Bloom filter memory usage disparity

2016-05-17 Thread Alain RODRIGUEZ
Hi, we would need more information here (if you did not solve it yet).

What is your Cassandra version?
Does this 3 node cluster use a Replication Factor of 3?
Did you change the bloom_filter_fp_chance recently?

That table has about 16M keys and 140GB of data.
>

Is that the total value or per node? In any case, we need the data size for
the 3 nodes to understand.

It might have been a temporary situation, but in this case you would know
by now.

C*heers,


2016-05-03 18:47 GMT+02:00 Kai Wang :

> Hi,
>
> I have a table on 3-node cluster. I notice bloom filter memory usage are
> very different on one of the node. For a given table, I checked
> CassandraMetricsRegistry$JmxGauge.[table]_BloomFilterOffHeapMemoryUsed.Value.
> 2 of 3 nodes show 1.5GB while the other shows 2.5 GB.
>
> What could be the reason?
>
> That table is using LCS.
> bloom_filter_fp_chance=0.1
> That table has about 16M keys and 140GB of data.
>
> Thanks.
>


Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Eric Evans
On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian  wrote:
>
> What’s the difference between the two “Community” repositories Apache 
> (http://www.apache.org/dist/cassandra/debian) and DataStax 
> (http://debian.datastax.com/community/)?

Good question.  All I can tell you is that the Apache repository is
the official one (the only official one).

> If they are just mirrors, then it seems like the DataStax one is a bit behind 
> (version 3.0.6 is available on Apache but not on DataStax).
>
> I’ve been using the DataStax community repo and wanted to see if I still 
> should continue using it or switch to the Apache repo.

If it is your intention to run Apache Cassandra, from the Apache
Cassandra project, then you should be using the Apache repo.

-- 
Eric Evans
eev...@apache.org


Re: SS Table File Names not containing GUIDs

2016-05-17 Thread Alain RODRIGUEZ
Hi,

I am wondering if there is any reason as to why the SS Table format doesn’t
> have a GUID


I don't know for sure, but what I can say is that GUID is often used to
solve the incremental issue on distributed system. SSTables are store on
one node, so increment works. So I would say this worked and was straight
forward. This is probably the reason. Plus sstables name / path are long
enough, I prefer to see '241' in there than
'c0629566-4a15-4db2-bb97-ee6e083de32b'.

Specifically, this causes some inconvenience when restoring snapshots.


This is true. Excepted in 5 years using Cassandra I restored snapshot maybe
twice. To feed staging (empty, so no issue) and to test recovery. So it is
not that often.

The problem is it is possible to overwrite new data with old files if the
> file names match. I can’t change the file names of snapshot-ed file to a
> huge number, because as soon as that file is copied over, C* will use that
> number in its get-next-number-gen logic potentially causing the same
> problem for the next snapshot-ed file.


What about using a lower value? Also if your value is really greater than
the current one, the risk is low, tables are being compacted often enough.
There are many relatively easy and working workaround here I believe. I
don't remember how I solved this though.

I would say I do not agree that we need to use GUID, but it is just my
opinion, if you fill this could be an improvement, search for a ticket
about that or fill up a new one.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-02 18:55 GMT+02:00 Anubhav Kale :

> Hello,
>
>
>
> I am wondering if there is any reason as to why the SS Table format
> doesn’t have a GUID. As far as I can tell, the incrementing number isn’t
> really used for any special purpose in code, and having a unique name for
> the file seems to be a better thing, in general.
>
>
>
> Specifically, this causes some inconvenience when restoring snapshots.
> Ideally, I would like to restore just the system* keyspaces and boot the
> node. Then, once the node is taking live traffic copy the SS Tables over
> and do a DSE restart at the end to load old data.
>
>
>
> The problem is it is possible to overwrite new data with old files if the
> file names match. I can’t change the file names of snapshot-ed file to a
> huge number, because as soon as that file is copied over, C* will use that
> number in its get-next-number-gen logic potentially causing the same
> problem for the next snapshot-ed file.
>
>
>
> How do people usually tackle this ? Is there some easy solution that I am
> not seeing ?
>
>
>
> Thanks !
>


Restoring Incremental Backups without using sstableloader

2016-05-17 Thread Ravi Teja A V
Hi everyone

I am currently working with Cassandra 3.5. I would like to know if it is
possible to restore backups without using sstableloader. I have been
referring to the following pages in the datastax documentation:
https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupSnapshotRestore.html
Thank you.

Yours sincerely
RAVI TEJA A V


Re: Why simple replication strategy for system_auth ?

2016-05-17 Thread Jérôme Mainaud
Thank you for your answer.

What I still don't understand is why auth data is not managed in the same
way as schema metadata.
Both must be accessible to the node to do the job. Both are changed very
rarely.
In a way users are some kind of database objects.

I understand the choice for trace and repair history, not for
authentication.

I note that 3.0 suggest 3 to 5 nodes. It was my choice but some client told
me I was wrong pointing at 2.1 documentation...
And it was difficult to explain to experienced classic DBAs that creating a
user and granting rights were so different from creating a table that
metadata was stored in a different way.



-- 
Jérôme Mainaud
jer...@mainaud.com

2016-05-13 12:13 GMT+02:00 Sam Tunnicliffe :

> LocalStrategy means that data is not replicated in the usual way and
> remains local to each node. Where it is used, replication is either not
> required (for example in the case of secondary indexes and system.local) or
> happens out of band via some other method (as in the case of schema, or
> system.peers which is populated largely from gossip).
>
> There are several components in Cassandra which generate or persist
> "system" data for which a normal distribution makes sense. Auth data is
> one, tracing, repair history and materialized view status are others. The
> keyspaces for this data generally use SimpleStategy by default as it is
> guaranteed to work out of the box, regardless of topology.  The intent of
> the advice to configure system_auth with RF=N was to increase the
> likelihood that any read of auth data would be done locally, avoiding
> remote requests where possible. This is somewhat outdated though and not
> really necessary. In fact, the 3.x docs actually suggest "3 to 5 nodes per
> Data Center"[1]
>
> FTR, you can't specify LocalStrategy in a CREATE or ALTER KEYSPACE, for
> these reasons.
>
> [1]
> http://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/secureConfigNativeAuth.htm
>
>
> On Fri, May 13, 2016 at 10:47 AM, Jérôme Mainaud 
> wrote:
>
>> Hello,
>>
>> Is there any good reason why system_auth strategy is SimpleStrategy by
>> default instead of LocalStrategy like system and system_schema ?
>>
>> Especially when documentation advice to set the replication factor to the
>> number of nodes in the cluster, which is both weird and inconvenient to
>> follow.
>>
>> Do you think that changing the strategy to LocalStrategy would work or
>> have undesirable side effects ?
>>
>> Thank you.
>>
>> --
>> Jérôme Mainaud
>> jer...@mainaud.com
>>
>
>


Re: Repair schedules for new clusters

2016-05-17 Thread Ben Slater
We’ve found with incremental repairs that more frequent repairs are
generally better. Our current standard for incremental repairs is once per
day. I imagine that the exact optimum frequency is dependant on the ratio
of reads to write in your cluster.

Turning on incremental repairs from the get-go works OK if your data load
is increment. If you do a big load before your first incremental repair
then it’s not much different to migrating to incremental repairs so worth
following the procedures for migration to avoid a big impact.

Cheers
Ben

On Tue, 17 May 2016 at 16:50 Ashic Mahtab  wrote:

> Hi All,
> My previous cassandra clusters had moderate loads, and I'd simply schedule
> full repairs at different times in the week (but on the same day). That
> seemed to work ok, but was redundant. In my current project, I'm going to
> need to care about repair times a lot more, and was wondering what would be
> the best way to go about it. I have a few questions around this:
>
> * This would be a brand new cluster, and as such, was wondering if I could
> simply turn on incremental repair from the get go.
> * I would then run nodetool repair -pr -par -inc once a week on every node
> at (roughly) the same time once a week. I'd do this with a cron job /
> external scheduler.
> * If I were to replace a node, or one rejoins after being absent for
> longer than the grace period, I'd run a full repair on that node.
>
> Does this sound reasonable? Are there any pitfalls I should be aware of?
>
> Thanks,
> Ashic.
>
-- 

Ben Slater
Chief Product Officer, Instaclustr
+61 437 929 798


RE: Data platform support

2016-05-17 Thread Ashic Mahtab
If Spark workers are installed on the same nodes as Cassandra nodes, then they 
can take advantage of data locality, greatly reducing the amount of network IO 
in Spark jobs. If you use a seperate / Cloudera / Hortonworks / EMR cluster, 
you won't be able to benefit from this. Other than the locality issue, you can 
run Spark jobs from external clusters just fine. I've used both approaches, and 
for particular types of jobs, I've found a "custom" cluster with Spark 
Master(s) + n*[Spark Worker + Cassandra] to be very effective. 
-Ashic.

Date: Tue, 10 May 2016 17:13:25 +0100
Subject: Re: Data platform support
From: ksrinivas...@gmail.com
To: user@cassandra.apache.org

I understand that spark supports hdfs and standalone modes.The recommendation 
from cassandra is that spark should be installed in standalone mode in SMACK 
framework.
On 10 May 2016 at 16:24, Sruti S  wrote:
Not sure what is meant.. Spark can access HDFS. Why is it in standalone mode? 
Please clarify.
On Tue, May 10, 2016 at 11:08 AM, Srini Sydney  wrote:
I have a clarification based on your answer -
spark is installed as standalone mode (not hdfs) in SMACK framework. Our data 
lake is in hdfs . How do we overcome this ?

  - cheers sreeni

On 10 May 2016, at 08:16, vincent gromakowski  
wrote:

Maybe a SMACK stack would be a better option for using spark with Cassandra...
Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
Thanks a lot..denise
On 10 May 2016 at 02:42, Denise Rogers  wrote:
It really depends how close you want to stay to the most current versions of 
open source community products.



Cloudera has tended to build more products that requires their distribution to 
not be as current with open source product versions.



Regards,

Denise



Sent from mi iPhone



> On May 9, 2016, at 8:21 PM, Srini Sydney  wrote:

>

> Hi guys

>

> We are thinking of using one the 3 big data platforms i.e hortonworks , mapr 
> or cloudera . Will use hadoop ,hive , zookeeper, and spark in these platforms.

>

>

> Which platform would be better suited for cassandra ?

>

>

> -  sreeni

>









  

Repair schedules for new clusters

2016-05-17 Thread Ashic Mahtab
Hi All,My previous cassandra clusters had moderate loads, and I'd simply 
schedule full repairs at different times in the week (but on the same day). 
That seemed to work ok, but was redundant. In my current project, I'm going to 
need to care about repair times a lot more, and was wondering what would be the 
best way to go about it. I have a few questions around this:
* This would be a brand new cluster, and as such, was wondering if I could 
simply turn on incremental repair from the get go.* I would then run nodetool 
repair -pr -par -inc once a week on every node at (roughly) the same time once 
a week. I'd do this with a cron job / external scheduler.* If I were to replace 
a node, or one rejoins after being absent for longer than the grace period, I'd 
run a full repair on that node.
Does this sound reasonable? Are there any pitfalls I should be aware of?
Thanks,Ashic.