are repairs in 2.0 more expensive than in 1.2

2014-10-15 Thread Sean Bridges
Hello,

We upgraded a cassandra cluster from 1.2.18 to 2.0.10, and it looks like
repair is significantly more expensive now.  Is this expected?

We schedule rolling repairs through the cluster.  With 1.2.18 a repair
would take 3 hours or so.  The first repair after the upgrade has been
going on for over a day now, and still hasn't finished.  The repair is
doing a lot more io as well.

We have 24 nodes in our cluster, about 400 gigs of sstables per node, and
are using vnodes.  All machines are using ssds.

Thanks,

Sean


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-15 Thread Robert Coli
On Wed, Oct 15, 2014 at 4:54 PM, Sean Bridges 
wrote:

> We upgraded a cassandra cluster from 1.2.18 to 2.0.10, and it looks like
> repair is significantly more expensive now.  Is this expected?
>

It depends on what you mean by "expected." Operators usually don't expect
defaults with such dramatic impacts to change without them understanding
why, but there is a reason for it.

In 2.0 the default for repair was changed to be non-parallel. To get the
old behavior, you need to supply -par as an argument.

The context-free note you missed the significance of in NEWS.txt for
version 2.0.2 says :

- Nodetool defaults to Sequential mode for repair operations

What this doesn't say is how almost certainly unreasonable this is as a
default, because this means that repair is predictably slower in direct
relationship to your replication factor, and the default for
gc_grace_seconds (the time box in which one must complete a repair) did not
change at the same time. The ticket where the change happens [1] does not
specify a rationale, so your guess is as good as mine as to the reasoning
which not only felt the change was necessary but reasonable.

Leaving aside the problem you've encountered ("upgraders notice that their
repairs (which already took forever) are suddenly WAY SLOWER") this default
is also quite pathological for anyone operating with a RF over 3, which are
valid, if very uncommon, configurations.

In summary, if, as an operator, you disagree that making repair slower by
default as a factor of replication factor is reasonable, I suggest filing a
JIRA and letting the project know. At least in that case there is a chance
they might explain the rationale for so blithely making a change that has
inevitable impact on operators... ?

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5950
http://twitter.com/rcolidba


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-15 Thread Sean Bridges
Thanks Robert.  Does the switch to sequential from parallel explain why IO
increases, we see significantly higher IO with 2.10.

The nodetool docs [1] hint at the reason for defaulting to sequential,

"This allows the dynamic snitch to maintain performance for your
application via the other replicas, because at least one replica in the
snapshot is not undergoing repair."

Sean

[1]
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsRepair.html


On Wed, Oct 15, 2014 at 5:36 PM, Robert Coli  wrote:

> On Wed, Oct 15, 2014 at 4:54 PM, Sean Bridges 
> wrote:
>
>> We upgraded a cassandra cluster from 1.2.18 to 2.0.10, and it looks like
>> repair is significantly more expensive now.  Is this expected?
>>
>
> It depends on what you mean by "expected." Operators usually don't expect
> defaults with such dramatic impacts to change without them understanding
> why, but there is a reason for it.
>
> In 2.0 the default for repair was changed to be non-parallel. To get the
> old behavior, you need to supply -par as an argument.
>
> The context-free note you missed the significance of in NEWS.txt for
> version 2.0.2 says :
>
> - Nodetool defaults to Sequential mode for repair operations
>
> What this doesn't say is how almost certainly unreasonable this is as a
> default, because this means that repair is predictably slower in direct
> relationship to your replication factor, and the default for
> gc_grace_seconds (the time box in which one must complete a repair) did not
> change at the same time. The ticket where the change happens [1] does not
> specify a rationale, so your guess is as good as mine as to the reasoning
> which not only felt the change was necessary but reasonable.
>
> Leaving aside the problem you've encountered ("upgraders notice that their
> repairs (which already took forever) are suddenly WAY SLOWER") this default
> is also quite pathological for anyone operating with a RF over 3, which are
> valid, if very uncommon, configurations.
>
> In summary, if, as an operator, you disagree that making repair slower by
> default as a factor of replication factor is reasonable, I suggest filing a
> JIRA and letting the project know. At least in that case there is a chance
> they might explain the rationale for so blithely making a change that has
> inevitable impact on operators... ?
>
> =Rob
> [1] https://issues.apache.org/jira/browse/CASSANDRA-5950
> http://twitter.com/rcolidba
>


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-23 Thread Sean Bridges
We switched to to parallel repairs, and now our repairs in 2.0 are behaving
like the repairs in 1.2.

The change from parallel to sequential is very dramatic.  For a small
cluster with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2
hours, and io throughput peaks at 6 mb/s.  Sequential repair takes 40
hours, with average io around 27 mb/s.  Should I file a jira?

Sean

On Wed, Oct 15, 2014 at 9:23 PM, Sean Bridges 
wrote:

> Thanks Robert.  Does the switch to sequential from parallel explain why IO
> increases, we see significantly higher IO with 2.10.
>
> The nodetool docs [1] hint at the reason for defaulting to sequential,
>
> "This allows the dynamic snitch to maintain performance for your
> application via the other replicas, because at least one replica in the
> snapshot is not undergoing repair."
>
> Sean
>
> [1]
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsRepair.html
>
>
> On Wed, Oct 15, 2014 at 5:36 PM, Robert Coli  wrote:
>
>> On Wed, Oct 15, 2014 at 4:54 PM, Sean Bridges 
>> wrote:
>>
>>> We upgraded a cassandra cluster from 1.2.18 to 2.0.10, and it looks like
>>> repair is significantly more expensive now.  Is this expected?
>>>
>>
>> It depends on what you mean by "expected." Operators usually don't expect
>> defaults with such dramatic impacts to change without them understanding
>> why, but there is a reason for it.
>>
>> In 2.0 the default for repair was changed to be non-parallel. To get the
>> old behavior, you need to supply -par as an argument.
>>
>> The context-free note you missed the significance of in NEWS.txt for
>> version 2.0.2 says :
>>
>> - Nodetool defaults to Sequential mode for repair operations
>>
>> What this doesn't say is how almost certainly unreasonable this is as a
>> default, because this means that repair is predictably slower in direct
>> relationship to your replication factor, and the default for
>> gc_grace_seconds (the time box in which one must complete a repair) did not
>> change at the same time. The ticket where the change happens [1] does not
>> specify a rationale, so your guess is as good as mine as to the reasoning
>> which not only felt the change was necessary but reasonable.
>>
>> Leaving aside the problem you've encountered ("upgraders notice that
>> their repairs (which already took forever) are suddenly WAY SLOWER") this
>> default is also quite pathological for anyone operating with a RF over 3,
>> which are valid, if very uncommon, configurations.
>>
>> In summary, if, as an operator, you disagree that making repair slower by
>> default as a factor of replication factor is reasonable, I suggest filing a
>> JIRA and letting the project know. At least in that case there is a chance
>> they might explain the rationale for so blithely making a change that has
>> inevitable impact on operators... ?
>>
>> =Rob
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-5950
>> http://twitter.com/rcolidba
>>
>
>


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-23 Thread Robert Coli
On Thu, Oct 23, 2014 at 9:33 AM, Sean Bridges 
wrote:

> The change from parallel to sequential is very dramatic.  For a small
> cluster with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2
> hours, and io throughput peaks at 6 mb/s.  Sequential repair takes 40
> hours, with average io around 27 mb/s.  Should I file a jira?
>

As you are an actual user actually encountering the problem I had only
conjectured about, you are the person best suited to file such a ticket on
the reasonableness of the -par default. :D

=Rob
http://twitter.com/rcolidba


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-23 Thread Janne Jalkanen

On 23 Oct 2014, at 21:29 , Robert Coli  wrote:

> On Thu, Oct 23, 2014 at 9:33 AM, Sean Bridges  wrote:
> The change from parallel to sequential is very dramatic.  For a small cluster 
> with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2 hours, and 
> io throughput peaks at 6 mb/s.  Sequential repair takes 40 hours, with 
> average io around 27 mb/s.  Should I file a jira?
> 
> As you are an actual user actually encountering the problem I had only 
> conjectured about, you are the person best suited to file such a ticket on 
> the reasonableness of the -par default. :D

Hm?  I’ve been banging my head against the exact same problem (cluster size 
five nodes, RF=3, ~40GB/node) - paraller repair takes about 6 hrs whereas 
serial takes some 48 hours or so. In addition, the compaction impact is roughly 
the same - that is, there’s the same number of compactions triggered per 
minute, but serial runs eight times more of them. There does not seem to be a 
difference between the node response latency during parallel or serial repair.

NB: We do increase our compaction throughput during calmer times, and lower it 
through busy times, and the serial compaction takes enough time to hit the busy 
period - that might also have an impact to the overall performance.

If I had known that this had so far been a theoretical problem, I would’ve 
spoken up earlier. Perhaps serial repair is not the best default.

/Janne



Re: are repairs in 2.0 more expensive than in 1.2

2014-10-23 Thread Robert Coli
On Thu, Oct 23, 2014 at 2:04 PM, Janne Jalkanen 
wrote:

>
> If I had known that this had so far been a theoretical problem, I would’ve
> spoken up earlier. Perhaps serial repair is not the best default.
>

Unfortunately you must not hang out in #cassandra on freenode, where I've
been ranting^Wcomplaining^Wtalking about this default since it was merged.
I did not file a JIRA ticket because my presumption was that any report
that did not describe an actual problem experienced in an actual production
deploy would be minimized or ignored. I was also not completely convinced
that my conjecture about just how severely bad the behavior was was
correct, which is a reasonable concern for a skeptic or critic to have... :D

Your perspective as a production operator having a negative experience of
this default in production is welcomed in JIRA.

If either you or Sean Bridges files a JIRA, please post its URL to the
thread for future reference. :D

=Rob
http://twitter.com/rcolidba


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-24 Thread Sean Bridges
Janne,

I filed CASSANDRA-8177 [1] for this.  Maybe comment on the jira that you
are having the same problem.

Sean

[1]  https://issues.apache.org/jira/browse/CASSANDRA-8177

On Thu, Oct 23, 2014 at 2:04 PM, Janne Jalkanen 
wrote:

>
> On 23 Oct 2014, at 21:29 , Robert Coli  wrote:
>
> On Thu, Oct 23, 2014 at 9:33 AM, Sean Bridges 
> wrote:
>
>> The change from parallel to sequential is very dramatic.  For a small
>> cluster with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2
>> hours, and io throughput peaks at 6 mb/s.  Sequential repair takes 40
>> hours, with average io around 27 mb/s.  Should I file a jira?
>>
>
> As you are an actual user actually encountering the problem I had only
> conjectured about, you are the person best suited to file such a ticket on
> the reasonableness of the -par default. :D
>
>
> Hm?  I’ve been banging my head against the exact same problem (cluster
> size five nodes, RF=3, ~40GB/node) - paraller repair takes about 6 hrs
> whereas serial takes some 48 hours or so. In addition, the compaction
> impact is roughly the same - that is, there’s the same number of
> compactions triggered per minute, but serial runs eight times more of them.
> There does not seem to be a difference between the node response latency
> during parallel or serial repair.
>
> NB: We do increase our compaction throughput during calmer times, and
> lower it through busy times, and the serial compaction takes enough time to
> hit the busy period - that might also have an impact to the overall
> performance.
>
> If I had known that this had so far been a theoretical problem, I would’ve
> spoken up earlier. Perhaps serial repair is not the best default.
>
> /Janne
>
>


Re: are repairs in 2.0 more expensive than in 1.2

2014-10-24 Thread Janne Jalkanen

Commented and added a munin graph, if it helps. For the record, I’m happy with 
-par performance for now.

/Janne

On 24 Oct 2014, at 18:59, Sean Bridges  wrote:

> Janne,
> 
> I filed CASSANDRA-8177 [1] for this.  Maybe comment on the jira that you are 
> having the same problem.
> 
> Sean
> 
> [1]  https://issues.apache.org/jira/browse/CASSANDRA-8177
> 
> On Thu, Oct 23, 2014 at 2:04 PM, Janne Jalkanen  
> wrote:
> 
> On 23 Oct 2014, at 21:29 , Robert Coli  wrote:
> 
>> On Thu, Oct 23, 2014 at 9:33 AM, Sean Bridges  wrote:
>> The change from parallel to sequential is very dramatic.  For a small 
>> cluster with 3 nodes, using cassandra 2.0.10,  a parallel repair takes 2 
>> hours, and io throughput peaks at 6 mb/s.  Sequential repair takes 40 hours, 
>> with average io around 27 mb/s.  Should I file a jira?
>> 
>> As you are an actual user actually encountering the problem I had only 
>> conjectured about, you are the person best suited to file such a ticket on 
>> the reasonableness of the -par default. :D
> 
> Hm?  I’ve been banging my head against the exact same problem (cluster size 
> five nodes, RF=3, ~40GB/node) - paraller repair takes about 6 hrs whereas 
> serial takes some 48 hours or so. In addition, the compaction impact is 
> roughly the same - that is, there’s the same number of compactions triggered 
> per minute, but serial runs eight times more of them. There does not seem to 
> be a difference between the node response latency during parallel or serial 
> repair.
> 
> NB: We do increase our compaction throughput during calmer times, and lower 
> it through busy times, and the serial compaction takes enough time to hit the 
> busy period - that might also have an impact to the overall performance.
> 
> If I had known that this had so far been a theoretical problem, I would’ve 
> spoken up earlier. Perhaps serial repair is not the best default.
> 
> /Janne
> 
>