repair -pr does not return

2014-05-02 Thread Jan Kesten

Hello together,

I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I 
know, routine repairs are still mandatory for handling tombstones - even 
I noticed that the cluster now does a snapshot-repair by default.


Now my cluster is running a while and has a load of about 200g per node 
- running a nodetool repair -pr on one of the nodes seems to run 
forever, right now it's running for 2 complete days and does not return.


Any suggestions?

Thanks in advance,
Jan




Re: repair -pr does not return

2014-05-02 Thread Duncan Sands

Hi Jan,

On 02/05/14 09:29, Jan Kesten wrote:

Hello together,

I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I know,
routine repairs are still mandatory for handling tombstones - even I noticed
that the cluster now does a snapshot-repair by default.

Now my cluster is running a while and has a load of about 200g per node -
running a nodetool repair -pr on one of the nodes seems to run forever, right
now it's running for 2 complete days and does not return.


is it actually doing something or does it look like it got stuck?  2.0.7 has a 
fix for a getting stuck problem.


Ciao, Duncan.



Any suggestions?

Thanks in advance,
Jan







Re: repair -pr does not return

2014-05-02 Thread Jan Kesten

Hi Duncan,

is it actually doing something or does it look like it got stuck?  
2.0.7 has a fix for a getting stuck problem.


it starts with sending merkle trees and streaming for some time (some 
hours in fact) and then seems just to hang. So I'll try to update and 
see it that's solves the issue. Thanks for that hint!


Cheers,
Jan




Re: repair -pr does not return

2014-05-02 Thread Artur Kronenberg

Hi,

to be honest 2 days for 200GB nodes doesn't sound too unreasonable to me 
(depending on your hardware of course). We were running a ~20 GB cluster 
with regualr hard drives (no SSD) and our first repair ran a day as well 
if I recall correctly. We since improved our hardware and got it down to 
a couple of hours (~5h for all nodes triggering a -pr repair).


As far as I know you can use nodetool compactionstats and nodetool 
netstats to check for activity on your repairs. There may be a chance 
that it is hanging but also that it just really takes a quite long time.


Cheers,

-- artur

On 02/05/14 09:12, Jan Kesten wrote:

Hi Duncan,

is it actually doing something or does it look like it got stuck?  
2.0.7 has a fix for a getting stuck problem.


it starts with sending merkle trees and streaming for some time (some 
hours in fact) and then seems just to hang. So I'll try to update and 
see it that's solves the issue. Thanks for that hint!


Cheers,
Jan






Re: repair -pr does not return

2014-05-02 Thread Robert Coli
On Fri, May 2, 2014 at 12:29 AM, Jan Kesten j...@dg6obo.de wrote:

 I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I know,
 routine repairs are still mandatory for handling tombstones - even I
 noticed that the cluster now does a snapshot-repair by default.

 Now my cluster is running a while and has a load of about 200g per node -
 running a nodetool repair -pr on one of the nodes seems to run forever,
 right now it's running for 2 complete days and does not return.


https://issues.apache.org/jira/browse/CASSANDRA-5220

The reports I am getting on this list and in #cassandra about the newly
re-written repair in 2.0.x line, with vnodes on a real sized data set, is
that it often does not work, and if it does work, not in tractable time. As
other posters have said, it is continually being fixed and improved. If I
were you, I would consider increasing gc_grace_seconds to something like 34
days until repair starts working more efficiently with vnodes.

https://issues.apache.org/jira/browse/CASSANDRA-5850

=Rob