Re: No deletes - is periodic repair needed? I think not...

2014-02-06 Thread Alain RODRIGUEZ
Hi,

In a distributed system, such as Cassandra, things can happen (node down,
stop the world GC, hardware issue, ...) and desynchronize replicas, isn't
repairing also a needed operation to keep replicas up to date at least once
a week or once a month ? It is a strong and reliable process to keep things
synced, isn't it ?

I know that read repairs and hinted handoff are also there to handle this
kind of issues, but they might fail (I saw a lot of error in the logs
around hints not being delivered - some people even disable them - and read
repairs are often configured to trigger on 10% of the reads).


2014-01-28 14:53 GMT+01:00 Sylvain Lebresne sylv...@datastax.com:


 I have actually set up one of our application streams such that the same
 key is only overwritten with a monotonically increasing ttl.

 For example, a breaking news item might have an initial ttl of 60
 seconds, followed in 45 seconds by an update with a ttl of 3000 seconds,
 followed by an 'ignore me' update in 600 seconds with a ttl of 30 days (our
 maximum ttl) when the article is published.

 My understanding is that this case fits the criteria and no 'periodic
 repair' is needed.


 That's correct. The real criteria for not needing repair if you do no
 deletes but only TTL is update only with monotonically increasing (non
 necessarily strictly) ttl. Always setting the same TTL is just a special
 case of that, but it's the most commonly used one I think, so I tend to
 simplify it to that case.



 I guess another thing I would point out that is easy to miss or forget
 (if you are a newish user like me), is that ttl's are fine-grained, by
 column. So we are talking 'fixed' or 'variable' by individual column, not
 by table. Which means, in my case, that ttl's can vary widely across a
 table, but as long as I constrain them by key value to be fixed or
 monotonically increasing, it fits the criteria.


 We're talking monotonically increasing ttl for a given primary key' if
 we're talking the CQL language and for a given column if we're talking
 the thrift one. Not by table.

 --
 Sylvain




 Cheers,

 Michael


 On Tue, Jan 28, 2014 at 4:18 AM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 If you have only ttl columns, and you never update the column I would
 not think you need a repair.


 Right, no deletes and no updates is the case 1. of Michael on which I
 think we all agree 'periodic repair to avoid resurrected columns' is not
 required.



 Repair cures lost deletes. If all your writes have a ttl a lost write
 should not matter since the column was never written to the node and thus
 could never be resurected on said node.


  I'm sure we're all in agreement here, but for the record, this is only
 true if you have no updates (overwrites) and/or if all writes have the
 *same* ttl. Because in the general case, a column with a relatively short
 TTL is basically very close to a delete, while a column with a long TTL is
 very close from one that has no TTL. If the former column (with short TTL)
 overwrites the latter one (with long TTL), and if one nodes misses the
 overwrite, that node could resurrect the column with the longer TTL (until
 that column expires that is). Hence the separation of the case 2. (fixed
 ttl, no repair needed) and 2.a. (variable ttl, repair may be needed).

 --
 Sylvain



 Unless i am missing something.

 On Monday, January 27, 2014, Laing, Michael michael.la...@nytimes.com
 wrote:
  Thanks Sylvain,
  Your assumption is correct!
  So I think I actually have 4 classes:
  1.Regular values, no deletes, no overwrites, write heavy,
 variable ttl's to manage size
  2.Regular values, no deletes, some overwrites, read heavy (10 to
 1), fixed ttl's to manage size
  2.a. Regular values, no deletes, some overwrites, read heavy (10 to
 1), variable ttl's to manage size
  3.Counter values, no deletes, update heavy, rotation/truncation
 to manage size
  Only 2.a. above requires me to do 'periodic repair'.
  What I will actually do is change my schema and applications slightly
 to eliminate the need for overwrites on the only table I have in that
 category.
  And I will set gc_grace_period to 0 for the tables in the updated
 schema and drop 'periodic repair' from the schedule.
  Cheers,
  Michael
 
 
  On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne 
 sylv...@datastax.com wrote:
 
  By periodic repair, I'll assume you mean having to run repair every
 gc_grace period to make sure no deleted entries resurrect. With that
 assumption:
 
 
  1. Regular values, no deletes, no overwrites, write heavy, ttl's to
 manage size
 
  Since 'repair within gc_grace' is about avoiding value that have
 been deleted to resurrect, if you do no delete nor overwrites, you're in no
 risk of that (and don't need to 'repair withing gc_grace').
 
 
  2. Regular values, no deletes, some overwrites, read heavy (10 to
 1), ttl's to manage size
 
  It depends 

Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Sylvain Lebresne
On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 If you have only ttl columns, and you never update the column I would not
 think you need a repair.


Right, no deletes and no updates is the case 1. of Michael on which I think
we all agree 'periodic repair to avoid resurrected columns' is not required.



 Repair cures lost deletes. If all your writes have a ttl a lost write
 should not matter since the column was never written to the node and thus
 could never be resurected on said node.


I'm sure we're all in agreement here, but for the record, this is only true
if you have no updates (overwrites) and/or if all writes have the *same*
ttl. Because in the general case, a column with a relatively short TTL is
basically very close to a delete, while a column with a long TTL is very
close from one that has no TTL. If the former column (with short TTL)
overwrites the latter one (with long TTL), and if one nodes misses the
overwrite, that node could resurrect the column with the longer TTL (until
that column expires that is). Hence the separation of the case 2. (fixed
ttl, no repair needed) and 2.a. (variable ttl, repair may be needed).

--
Sylvain



 Unless i am missing something.

 On Monday, January 27, 2014, Laing, Michael michael.la...@nytimes.com
 wrote:
  Thanks Sylvain,
  Your assumption is correct!
  So I think I actually have 4 classes:
  1.Regular values, no deletes, no overwrites, write heavy, variable
 ttl's to manage size
  2.Regular values, no deletes, some overwrites, read heavy (10 to 1),
 fixed ttl's to manage size
  2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1),
 variable ttl's to manage size
  3.Counter values, no deletes, update heavy, rotation/truncation to
 manage size
  Only 2.a. above requires me to do 'periodic repair'.
  What I will actually do is change my schema and applications slightly to
 eliminate the need for overwrites on the only table I have in that category.
  And I will set gc_grace_period to 0 for the tables in the updated schema
 and drop 'periodic repair' from the schedule.
  Cheers,
  Michael
 
 
  On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:
 
  By periodic repair, I'll assume you mean having to run repair every
 gc_grace period to make sure no deleted entries resurrect. With that
 assumption:
 
 
  1. Regular values, no deletes, no overwrites, write heavy, ttl's to
 manage size
 
  Since 'repair within gc_grace' is about avoiding value that have been
 deleted to resurrect, if you do no delete nor overwrites, you're in no risk
 of that (and don't need to 'repair withing gc_grace').
 
 
  2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
 ttl's to manage size
 
  It depends a bit. In general, if you always set the exact same TTL on
 every insert (implying you always set a TTL), then you have nothing to
 worry about. If the TTL varies (of if you only set TTL some of the times),
 then you might still need to have some periodic repairs. That being said,
 if there is no deletes but only TTLs, then the TTL kind of lengthen the
 period at which you need to do repair: instead of needing to repair withing
 gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
 is the smallest TTL you set on columns).
 
  3. Counter values, no deletes, update heavy, rotation/truncation to
 manage size
 
  No deletes and no TTL implies that your fine (as in, there is no need
 for 'repair withing gc_grace').
 
  --
  Sylvain
 

 --
 Sorry this was sent from mobile. Will do less grammar and spell check than
 usual.



Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Laing, Michael
Thanks again Sylvain!

I have actually set up one of our application streams such that the same
key is only overwritten with a monotonically increasing ttl.

For example, a breaking news item might have an initial ttl of 60 seconds,
followed in 45 seconds by an update with a ttl of 3000 seconds, followed by
an 'ignore me' update in 600 seconds with a ttl of 30 days (our maximum
ttl) when the article is published.

My understanding is that this case fits the criteria and no 'periodic
repair' is needed.

I guess another thing I would point out that is easy to miss or forget (if
you are a newish user like me), is that ttl's are fine-grained, by column.
So we are talking 'fixed' or 'variable' by individual column, not by table.
Which means, in my case, that ttl's can vary widely across a table, but as
long as I constrain them by key value to be fixed or monotonically
increasing, it fits the criteria.

Cheers,

Michael


On Tue, Jan 28, 2014 at 4:18 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 If you have only ttl columns, and you never update the column I would not
 think you need a repair.


 Right, no deletes and no updates is the case 1. of Michael on which I
 think we all agree 'periodic repair to avoid resurrected columns' is not
 required.



 Repair cures lost deletes. If all your writes have a ttl a lost write
 should not matter since the column was never written to the node and thus
 could never be resurected on said node.


 I'm sure we're all in agreement here, but for the record, this is only
 true if you have no updates (overwrites) and/or if all writes have the
 *same* ttl. Because in the general case, a column with a relatively short
 TTL is basically very close to a delete, while a column with a long TTL is
 very close from one that has no TTL. If the former column (with short TTL)
 overwrites the latter one (with long TTL), and if one nodes misses the
 overwrite, that node could resurrect the column with the longer TTL (until
 that column expires that is). Hence the separation of the case 2. (fixed
 ttl, no repair needed) and 2.a. (variable ttl, repair may be needed).

 --
 Sylvain



 Unless i am missing something.

 On Monday, January 27, 2014, Laing, Michael michael.la...@nytimes.com
 wrote:
  Thanks Sylvain,
  Your assumption is correct!
  So I think I actually have 4 classes:
  1.Regular values, no deletes, no overwrites, write heavy, variable
 ttl's to manage size
  2.Regular values, no deletes, some overwrites, read heavy (10 to
 1), fixed ttl's to manage size
  2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1),
 variable ttl's to manage size
  3.Counter values, no deletes, update heavy, rotation/truncation to
 manage size
  Only 2.a. above requires me to do 'periodic repair'.
  What I will actually do is change my schema and applications slightly
 to eliminate the need for overwrites on the only table I have in that
 category.
  And I will set gc_grace_period to 0 for the tables in the updated
 schema and drop 'periodic repair' from the schedule.
  Cheers,
  Michael
 
 
  On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:
 
  By periodic repair, I'll assume you mean having to run repair every
 gc_grace period to make sure no deleted entries resurrect. With that
 assumption:
 
 
  1. Regular values, no deletes, no overwrites, write heavy, ttl's to
 manage size
 
  Since 'repair within gc_grace' is about avoiding value that have been
 deleted to resurrect, if you do no delete nor overwrites, you're in no risk
 of that (and don't need to 'repair withing gc_grace').
 
 
  2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
 ttl's to manage size
 
  It depends a bit. In general, if you always set the exact same TTL on
 every insert (implying you always set a TTL), then you have nothing to
 worry about. If the TTL varies (of if you only set TTL some of the times),
 then you might still need to have some periodic repairs. That being said,
 if there is no deletes but only TTLs, then the TTL kind of lengthen the
 period at which you need to do repair: instead of needing to repair withing
 gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
 is the smallest TTL you set on columns).
 
  3. Counter values, no deletes, update heavy, rotation/truncation to
 manage size
 
  No deletes and no TTL implies that your fine (as in, there is no need
 for 'repair withing gc_grace').
 
  --
  Sylvain
 

 --
 Sorry this was sent from mobile. Will do less grammar and spell check
 than usual.





Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Sylvain Lebresne


 I have actually set up one of our application streams such that the same
 key is only overwritten with a monotonically increasing ttl.

 For example, a breaking news item might have an initial ttl of 60 seconds,
 followed in 45 seconds by an update with a ttl of 3000 seconds, followed by
 an 'ignore me' update in 600 seconds with a ttl of 30 days (our maximum
 ttl) when the article is published.

 My understanding is that this case fits the criteria and no 'periodic
 repair' is needed.


That's correct. The real criteria for not needing repair if you do no
deletes but only TTL is update only with monotonically increasing (non
necessarily strictly) ttl. Always setting the same TTL is just a special
case of that, but it's the most commonly used one I think, so I tend to
simplify it to that case.



 I guess another thing I would point out that is easy to miss or forget (if
 you are a newish user like me), is that ttl's are fine-grained, by column.
 So we are talking 'fixed' or 'variable' by individual column, not by table.
 Which means, in my case, that ttl's can vary widely across a table, but as
 long as I constrain them by key value to be fixed or monotonically
 increasing, it fits the criteria.


We're talking monotonically increasing ttl for a given primary key' if
we're talking the CQL language and for a given column if we're talking
the thrift one. Not by table.

--
Sylvain




 Cheers,

 Michael


 On Tue, Jan 28, 2014 at 4:18 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 If you have only ttl columns, and you never update the column I would
 not think you need a repair.


 Right, no deletes and no updates is the case 1. of Michael on which I
 think we all agree 'periodic repair to avoid resurrected columns' is not
 required.



 Repair cures lost deletes. If all your writes have a ttl a lost write
 should not matter since the column was never written to the node and thus
 could never be resurected on said node.


  I'm sure we're all in agreement here, but for the record, this is only
 true if you have no updates (overwrites) and/or if all writes have the
 *same* ttl. Because in the general case, a column with a relatively short
 TTL is basically very close to a delete, while a column with a long TTL is
 very close from one that has no TTL. If the former column (with short TTL)
 overwrites the latter one (with long TTL), and if one nodes misses the
 overwrite, that node could resurrect the column with the longer TTL (until
 that column expires that is). Hence the separation of the case 2. (fixed
 ttl, no repair needed) and 2.a. (variable ttl, repair may be needed).

 --
 Sylvain



 Unless i am missing something.

 On Monday, January 27, 2014, Laing, Michael michael.la...@nytimes.com
 wrote:
  Thanks Sylvain,
  Your assumption is correct!
  So I think I actually have 4 classes:
  1.Regular values, no deletes, no overwrites, write heavy, variable
 ttl's to manage size
  2.Regular values, no deletes, some overwrites, read heavy (10 to
 1), fixed ttl's to manage size
  2.a. Regular values, no deletes, some overwrites, read heavy (10 to
 1), variable ttl's to manage size
  3.Counter values, no deletes, update heavy, rotation/truncation to
 manage size
  Only 2.a. above requires me to do 'periodic repair'.
  What I will actually do is change my schema and applications slightly
 to eliminate the need for overwrites on the only table I have in that
 category.
  And I will set gc_grace_period to 0 for the tables in the updated
 schema and drop 'periodic repair' from the schedule.
  Cheers,
  Michael
 
 
  On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne 
 sylv...@datastax.com wrote:
 
  By periodic repair, I'll assume you mean having to run repair every
 gc_grace period to make sure no deleted entries resurrect. With that
 assumption:
 
 
  1. Regular values, no deletes, no overwrites, write heavy, ttl's to
 manage size
 
  Since 'repair within gc_grace' is about avoiding value that have been
 deleted to resurrect, if you do no delete nor overwrites, you're in no risk
 of that (and don't need to 'repair withing gc_grace').
 
 
  2. Regular values, no deletes, some overwrites, read heavy (10 to
 1), ttl's to manage size
 
  It depends a bit. In general, if you always set the exact same TTL on
 every insert (implying you always set a TTL), then you have nothing to
 worry about. If the TTL varies (of if you only set TTL some of the times),
 then you might still need to have some periodic repairs. That being said,
 if there is no deletes but only TTLs, then the TTL kind of lengthen the
 period at which you need to do repair: instead of needing to repair withing
 gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
 is the smallest TTL you set on columns).
 
  3. Counter values, no deletes, update heavy, rotation/truncation to
 manage size
 
  No deletes and no TTL implies that your fine 

Re: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Sylvain Lebresne
By periodic repair, I'll assume you mean having to run repair every
gc_grace period to make sure no deleted entries resurrect. With that
assumption:


 1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage
 size


Since 'repair within gc_grace' is about avoiding value that have been
deleted to resurrect, if you do no delete nor overwrites, you're in no risk
of that (and don't need to 'repair withing gc_grace').


 2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
 ttl's to manage size


It depends a bit. In general, if you always set the exact same TTL on every
insert (implying you always set a TTL), then you have nothing to worry
about. If the TTL varies (of if you only set TTL some of the times), then
you might still need to have some periodic repairs. That being said, if
there is no deletes but only TTLs, then the TTL kind of lengthen the period
at which you need to do repair: instead of needing to repair withing
gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
is the smallest TTL you set on columns).

3. Counter values, no deletes, update heavy, rotation/truncation to manage
 size


No deletes and no TTL implies that your fine (as in, there is no need for
'repair withing gc_grace').

--
Sylvain


Re: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Laing, Michael
Thanks Sylvain,

Your assumption is correct!

So I think I actually have 4 classes:

1.Regular values, no deletes, no overwrites, write heavy, variable
ttl's to manage size
2.Regular values, no deletes, some overwrites, read heavy (10 to 1),
fixed ttl's to manage size
2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1),
variable ttl's to manage size
3.Counter values, no deletes, update heavy, rotation/truncation to
manage size

Only 2.a. above requires me to do 'periodic repair'.

What I will actually do is change my schema and applications slightly to
eliminate the need for overwrites on the only table I have in that category.

And I will set gc_grace_period to 0 for the tables in the updated schema
and drop 'periodic repair' from the schedule.

Cheers,

Michael


On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 By periodic repair, I'll assume you mean having to run repair every
 gc_grace period to make sure no deleted entries resurrect. With that
 assumption:


 1. Regular values, no deletes, no overwrites, write heavy, ttl's to
 manage size


 Since 'repair within gc_grace' is about avoiding value that have been
 deleted to resurrect, if you do no delete nor overwrites, you're in no risk
 of that (and don't need to 'repair withing gc_grace').


 2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
 ttl's to manage size


 It depends a bit. In general, if you always set the exact same TTL on
 every insert (implying you always set a TTL), then you have nothing to
 worry about. If the TTL varies (of if you only set TTL some of the times),
 then you might still need to have some periodic repairs. That being said,
 if there is no deletes but only TTLs, then the TTL kind of lengthen the
 period at which you need to do repair: instead of needing to repair withing
 gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
 is the smallest TTL you set on columns).

 3. Counter values, no deletes, update heavy, rotation/truncation to manage
 size


 No deletes and no TTL implies that your fine (as in, there is no need for
 'repair withing gc_grace').

 --
 Sylvain



Re: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Edward Capriolo
If you have only ttl columns, and you never update the column I would not
think you need a repair.

Repair cures lost deletes. If all your writes have a ttl a lost write
should not matter since the column was never written to the node and thus
could never be resurected on said node.

Unless i am missing something.

On Monday, January 27, 2014, Laing, Michael michael.la...@nytimes.com
wrote:
 Thanks Sylvain,
 Your assumption is correct!
 So I think I actually have 4 classes:
 1.Regular values, no deletes, no overwrites, write heavy, variable
ttl's to manage size
 2.Regular values, no deletes, some overwrites, read heavy (10 to 1),
fixed ttl's to manage size
 2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1),
variable ttl's to manage size
 3.Counter values, no deletes, update heavy, rotation/truncation to
manage size
 Only 2.a. above requires me to do 'periodic repair'.
 What I will actually do is change my schema and applications slightly to
eliminate the need for overwrites on the only table I have in that category.
 And I will set gc_grace_period to 0 for the tables in the updated schema
and drop 'periodic repair' from the schedule.
 Cheers,
 Michael


 On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne sylv...@datastax.com
wrote:

 By periodic repair, I'll assume you mean having to run repair every
gc_grace period to make sure no deleted entries resurrect. With that
assumption:


 1. Regular values, no deletes, no overwrites, write heavy, ttl's to
manage size

 Since 'repair within gc_grace' is about avoiding value that have been
deleted to resurrect, if you do no delete nor overwrites, you're in no risk
of that (and don't need to 'repair withing gc_grace').


 2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
ttl's to manage size

 It depends a bit. In general, if you always set the exact same TTL on
every insert (implying you always set a TTL), then you have nothing to
worry about. If the TTL varies (of if you only set TTL some of the times),
then you might still need to have some periodic repairs. That being said,
if there is no deletes but only TTLs, then the TTL kind of lengthen the
period at which you need to do repair: instead of needing to repair withing
gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
is the smallest TTL you set on columns).

 3. Counter values, no deletes, update heavy, rotation/truncation to
manage size

 No deletes and no TTL implies that your fine (as in, there is no need
for 'repair withing gc_grace').

 --
 Sylvain


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


RE: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Donald Smith
Last week I made a feature request to apache cassandra along these lines:  
https://issues.apache.org/jira/browse/CASSANDRA-6611

Don

From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Monday, January 27, 2014 4:05 PM
To: user@cassandra.apache.org
Subject: Re: No deletes - is periodic repair needed? I think not...

If you have only ttl columns, and you never update the column I would not think 
you need a repair.

Repair cures lost deletes. If all your writes have a ttl a lost write should 
not matter since the column was never written to the node and thus could never 
be resurected on said node.

Unless i am missing something.

On Monday, January 27, 2014, Laing, Michael 
michael.la...@nytimes.commailto:michael.la...@nytimes.com wrote:
 Thanks Sylvain,
 Your assumption is correct!
 So I think I actually have 4 classes:
 1.Regular values, no deletes, no overwrites, write heavy, variable ttl's 
 to manage size
 2.Regular values, no deletes, some overwrites, read heavy (10 to 1), 
 fixed ttl's to manage size
 2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1), 
 variable ttl's to manage size
 3.Counter values, no deletes, update heavy, rotation/truncation to manage 
 size
 Only 2.a. above requires me to do 'periodic repair'.
 What I will actually do is change my schema and applications slightly to 
 eliminate the need for overwrites on the only table I have in that category.
 And I will set gc_grace_period to 0 for the tables in the updated schema and 
 drop 'periodic repair' from the schedule.
 Cheers,
 Michael


 On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne 
 sylv...@datastax.commailto:sylv...@datastax.com wrote:

 By periodic repair, I'll assume you mean having to run repair every 
 gc_grace period to make sure no deleted entries resurrect. With that 
 assumption:


 1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage 
 size

 Since 'repair within gc_grace' is about avoiding value that have been 
 deleted to resurrect, if you do no delete nor overwrites, you're in no risk 
 of that (and don't need to 'repair withing gc_grace').


 2. Regular values, no deletes, some overwrites, read heavy (10 to 1), ttl's 
 to manage size

 It depends a bit. In general, if you always set the exact same TTL on every 
 insert (implying you always set a TTL), then you have nothing to worry 
 about. If the TTL varies (of if you only set TTL some of the times), then 
 you might still need to have some periodic repairs. That being said, if 
 there is no deletes but only TTLs, then the TTL kind of lengthen the period 
 at which you need to do repair: instead of needing to repair withing 
 gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL) 
 is the smallest TTL you set on columns).

 3. Counter values, no deletes, update heavy, rotation/truncation to manage 
 size

 No deletes and no TTL implies that your fine (as in, there is no need for 
 'repair withing gc_grace').

 --
 Sylvain


--
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.


No deletes - is periodic repair needed? I think not...

2014-01-25 Thread Laing, Michael
I have a simple set of tables that can be grouped as follows:

1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage
size

2. Regular values, no deletes, some overwrites, read heavy (10 to 1), ttl's
to manage size

3. Counter values, no deletes, update heavy, rotation/truncation to manage
size

It seems to me that I can set gc_grace_seconds to 0 on each set of tables
and that I do not need to do periodic repair on any of them.

Is this the case? If so it relieves an operational headache and eliminates
a lot of processing.

The only downside I can see is if (when) a node really gets wiped out -
then I might lose any hints it may be holding as a coordinator and maybe
some other stuff. This is a rare occurrence, but if it happened I guess I
would replace the node, repairing and cleaning it as needed, and run repair
-pr sequentially on all other nodes to be sure the cluster is in sync.

BTW I am using Cassandra 2.0.3 and local quorum reads and writes on a 2 dc
12-node cluster.

Thanks,

Michael