Mutation dropped and Read-Repair performance issue

2020-12-19 Thread sunil pawar
Hi All,

We are facing problems of failure of Read-Repair stages with error Digest
Mismatch and count is 300+ per day per node.
At the same time, we are experiencing node is getting overloaded for a
quick couple of seconds due to long GC pauses (of around 7-8 seconds). We
are not running a repair on regular basis as a maintenance activity owing
to the node is going down whenever we are running repair for the tables.
After running the repair node is going down due to long GC pauses again.
Except for one table for all other tables, we can run the repair with
option  --in-local-dc. Below is the configuration of the cluster:

   1. 15 node cluster.
   2. RF=3
   3. Xmx and Xms 31GB.
   4. G1GC algorithm is in use.
   5. Version 3.11.2
   6. Load on each node roughly around 500GB
   7. One table is having a maximum amount of load compared to other tables.

Please suggest if there are any configuration level changes which we can do
to avoid the above problems. Getting too many digest mismatch messages is a
sign of node is doing more read and write operations compared to without
those messages and it can be the cause of node is getting overloaded for
that particular moment?

-- 
Thanks,
S.R.


Re: repair performance

2017-03-20 Thread daemeon reiydelle
I would zero in on network throughput, especially interrack trunks


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 17, 2017 2:07 PM, "Roland Otta"  wrote:

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland


Re: repair performance

2017-03-20 Thread Roland Otta
good point! i did not (so far) i will do that - especially because i often see 
all compaction threads being used during repair (according to compactionstats).

thank you also for your link recommendations. i will go through them.



On Sat, 2017-03-18 at 16:54 +, Thakrar, Jayesh wrote:
You changed compaction_throughput_mb_per_sec, but did you also increase 
concurrent_compactors?

In reference to the reaper and some other info I received on the user forum to 
my question on "nodetool repair", here are some useful links/slides -



https://www.datastax.com/dev/blog/repair-in-cassandra



https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/



http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016



http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016




From: Roland Otta <roland.o...@willhaben.at>
Date: Friday, March 17, 2017 at 5:47 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: repair performance

did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland





Re: repair performance

2017-03-18 Thread Thakrar, Jayesh
You changed compaction_throughput_mb_per_sec, but did you also increase 
concurrent_compactors?

In reference to the reaper and some other info I received on the user forum to 
my question on "nodetool repair", here are some useful links/slides -



https://www.datastax.com/dev/blog/repair-in-cassandra



https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/



http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016



http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016




From: Roland Otta <roland.o...@willhaben.at>
Date: Friday, March 17, 2017 at 5:47 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: repair performance

did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>:

hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland





Re: repair performance

2017-03-17 Thread Roland Otta
did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
>:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland





Re: repair performance

2017-03-17 Thread benjamin roth
The fork from thelastpickle is. I'd recommend to give it a try over pure
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta :

> forgot to mention the version we are using:
>
> we are using 3.0.7 - so i guess we should have incremental repairs by
> default.
> it also prints out incremental:true when starting a repair
> INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 -
> Starting repair command #7, repairing keyspace xxx with repair options
> (parallelism: parallel, primary range: false, incremental: true, job
> threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of
> ranges: 1758)
>
> 3.0.7 is also the reason why we are not using reaper ... as far as i could
> figure out it's not compatible with 3.0+
>
>
>
> On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
>
> It depends a lot ...
>
> - Repairs can be very slow, yes! (And unreliable, due to timeouts,
> outages, whatever)
> - You can use incremental repairs to speed things up for regular repairs
> - You can use "reaper" to schedule repairs and run them sliced, automated,
> failsafe
>
> The time repairs actually may vary a lot depending on how much data has to
> be streamed or how inconsistent your cluster is.
>
> 50mbit/s is really a bit low! The actual performance depends on so many
> factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
> nodes" of the cluster.
> This is a quite individual problem you have to track down individually.
>
> 2017-03-17 22:07 GMT+01:00 Roland Otta :
>
> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland
>
>
>


Re: repair performance

2017-03-17 Thread Roland Otta
... maybe i should just try increasing the job threads with --job-threads

shame on me

On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland



Re: repair performance

2017-03-17 Thread Roland Otta
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland



Re: repair performance

2017-03-17 Thread benjamin roth
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages,
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated,
failsafe

The time repairs actually may vary a lot depending on how much data has to
be streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many
factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
nodes" of the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta :

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland


repair performance

2017-03-17 Thread Roland Otta
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: nodetool status inconsistencies, repair performance and system keyspace compactions

2013-04-05 Thread aaron morton
monitor the repair using nodetool compactionstats to see the merkle trees being 
created, and nodetool netstats to see data streaming. 

Also look in the logs for messages from AntiEntropyService.java , that will 
tell you how long the node waited for each replica to get back to it. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/04/2013, at 5:42 PM, Ondřej Černoš cern...@gmail.com wrote:

 Hi,
 
 most has been resolved - the failed to uncompress error was really a
 bug in cassandra (see
 https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem
 with different load reporting is a change between 1.2.1 (reports 100%
 for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the
 fraction. Is this correct?
 
 Anyway, the nodetool repair still takes ages to finish, considering
 only megabytes of not changing data are involved in my test:
 
 [root@host:/etc/puppet] nodetool repair ks
 [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536
 ranges for keyspace ks
 [2013-04-04 13:47:17,007] Repair session
 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range
 (-2270395505556181001,-2268004533044804266] finished
 ...
 [2013-04-04 13:47:17,063] Repair session
 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range
 (1069254279177813908,1070290707448386360] finished
 [2013-04-04 13:47:17,063] Repair command #1 finished
 
 This is the status before the repair (by the way, after the datacenter
 has been bootstrapped from the remote one):
 
 [root@host:/etc/puppet] nodetool status
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns   Host ID
Rack
 UN  xxx.xxx.xxx.xxx5.74 MB256 17.1%
 06ff8328-32a3-4196-a31f-1e0f608d0638  1d
 UN  xxx.xxx.xxx.xxx5.73 MB256 15.3%
 7a96bf16-e268-433a-9912-a0cf1668184e  1d
 UN  xxx.xxx.xxx.xxx5.72 MB256 17.5%
 67a68a2a-12a8-459d-9d18-221426646e84  1d
 Datacenter: na-dev
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns   Host ID
   Rack
 UN  xxx.xxx.xxx.xxx   5.74 MB256 16.4%
 eb86aaae-ef0d-40aa-9b74-2b9704c77c0a  cmp02
 UN  xxx.xxx.xxx.xxx   5.74 MB256 17.0%
 cd24af74-7f6a-4eaa-814f-62474b4e4df1  cmp01
 UN  xxx.xxx.xxx.xxx   5.74 MB256 16.7%
 1a55cfd4-bb30-4250-b868-a9ae13d81ae1  cmp05
 
 Why does it take 20 minutes to finish? Fortunately the big number of
 compactions I reported in the previous email was not triggered.
 
 And is there a documentation where I could find the exact semantics of
 repair when vnodes are used (and what -pr means in such a setup) and
 when run in multiple datacenter setup? I still don't quite get it.
 
 regards,
 Ondřej Černoš
 
 
 On Thu, Mar 28, 2013 at 3:30 AM, aaron morton aa...@thelastpickle.com wrote:
 During one of my tests - see this thread in this mailing list:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
 
 That thread has been updated, check the bug ondrej created.
 
 How will this perform in production with much bigger data if repair
 takes 25 minutes on 7MB and 11k compactions were triggered by the
 repair run?
 
 Seems a little odd.
 See what happens the next time you run repair.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote:
 
 Hi all,
 
 I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and
 writes.
 
 Currently I test various operational qualities of the setup.
 
 During one of my tests - see this thread in this mailing list:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
 - I ran into this situation:
 
 - all nodes have all data and agree on it:
 
 [user@host1-dc1:~] nodetool status
 
 Datacenter: na-prod
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad Tokens  Owns
 (effective)  Host IDRack
 UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
 UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
 039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
 UN  XXX.XXX.XXX.XXX   7.72 MB256 100.0%
 007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad Tokens  Owns
 (effective)  Host IDRack
 UN  

Re: nodetool status inconsistencies, repair performance and system keyspace compactions

2013-04-04 Thread Ondřej Černoš
Hi,

most has been resolved - the failed to uncompress error was really a
bug in cassandra (see
https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem
with different load reporting is a change between 1.2.1 (reports 100%
for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the
fraction. Is this correct?

Anyway, the nodetool repair still takes ages to finish, considering
only megabytes of not changing data are involved in my test:

[root@host:/etc/puppet] nodetool repair ks
[2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536
ranges for keyspace ks
[2013-04-04 13:47:17,007] Repair session
88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range
(-2270395505556181001,-2268004533044804266] finished
...
[2013-04-04 13:47:17,063] Repair session
65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range
(1069254279177813908,1070290707448386360] finished
[2013-04-04 13:47:17,063] Repair command #1 finished

This is the status before the repair (by the way, after the datacenter
has been bootstrapped from the remote one):

[root@host:/etc/puppet] nodetool status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host ID
Rack
UN  xxx.xxx.xxx.xxx5.74 MB256 17.1%
06ff8328-32a3-4196-a31f-1e0f608d0638  1d
UN  xxx.xxx.xxx.xxx5.73 MB256 15.3%
7a96bf16-e268-433a-9912-a0cf1668184e  1d
UN  xxx.xxx.xxx.xxx5.72 MB256 17.5%
67a68a2a-12a8-459d-9d18-221426646e84  1d
Datacenter: na-dev
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  Owns   Host ID
   Rack
UN  xxx.xxx.xxx.xxx   5.74 MB256 16.4%
eb86aaae-ef0d-40aa-9b74-2b9704c77c0a  cmp02
UN  xxx.xxx.xxx.xxx   5.74 MB256 17.0%
cd24af74-7f6a-4eaa-814f-62474b4e4df1  cmp01
UN  xxx.xxx.xxx.xxx   5.74 MB256 16.7%
1a55cfd4-bb30-4250-b868-a9ae13d81ae1  cmp05

Why does it take 20 minutes to finish? Fortunately the big number of
compactions I reported in the previous email was not triggered.

And is there a documentation where I could find the exact semantics of
repair when vnodes are used (and what -pr means in such a setup) and
when run in multiple datacenter setup? I still don't quite get it.

regards,
Ondřej Černoš


On Thu, Mar 28, 2013 at 3:30 AM, aaron morton aa...@thelastpickle.com wrote:
 During one of my tests - see this thread in this mailing list:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html

 That thread has been updated, check the bug ondrej created.

 How will this perform in production with much bigger data if repair
 takes 25 minutes on 7MB and 11k compactions were triggered by the
 repair run?

 Seems a little odd.
 See what happens the next time you run repair.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote:

 Hi all,

 I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and
 writes.

 Currently I test various operational qualities of the setup.

 During one of my tests - see this thread in this mailing list:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
 - I ran into this situation:

 - all nodes have all data and agree on it:

 [user@host1-dc1:~] nodetool status

 Datacenter: na-prod
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad Tokens  Owns
 (effective)  Host IDRack
 UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
 UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
 039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
 UN  XXX.XXX.XXX.XXX   7.72 MB256 100.0%
 007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad Tokens  Owns
 (effective)  Host IDRack
 UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
 a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
 UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
 ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
 UN  XXX.XXX.XXX.XXX 7.73 MB256 100.0%
 f53fd294-16cc-497e-9613-347f07ac3850  1d

 - only one node disagrees:

 [user@host1-dc2:~] nodetool status
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens   Owns   Host ID
  Rack
 UN  

Re: nodetool status inconsistencies, repair performance and system keyspace compactions

2013-03-27 Thread aaron morton
 During one of my tests - see this thread in this mailing list:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
That thread has been updated, check the bug ondrej created. 

 How will this perform in production with much bigger data if repair
 takes 25 minutes on 7MB and 11k compactions were triggered by the
 repair run?
Seems a little odd. 
See what happens the next time you run repair. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote:

 Hi all,
 
 I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and 
 writes.
 
 Currently I test various operational qualities of the setup.
 
 During one of my tests - see this thread in this mailing list:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
 - I ran into this situation:
 
 - all nodes have all data and agree on it:
 
 [user@host1-dc1:~] nodetool status
 
 Datacenter: na-prod
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad Tokens  Owns
 (effective)  Host IDRack
 UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
 UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
 039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
 UN  XXX.XXX.XXX.XXX   7.72 MB256 100.0%
 007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad Tokens  Owns
 (effective)  Host IDRack
 UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
 a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
 UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
 ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
 UN  XXX.XXX.XXX.XXX 7.73 MB256 100.0%
 f53fd294-16cc-497e-9613-347f07ac3850  1d
 
 - only one node disagrees:
 
 [user@host1-dc2:~] nodetool status
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens   Owns   Host ID
  Rack
 UN  XXX.XXX.XXX.XXX7.73 MB256 17.6%
 a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
 UN  XXX.XXX.XXX.XXX7.75 MB256 16.4%
 ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
 UN  XXX.XXX.XXX.XXX 7.73 MB256 15.7%
 f53fd294-16cc-497e-9613-347f07ac3850  1d
 Datacenter: na-prod
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens   Owns   Host ID
  Rack
 UN  XXX.XXX.XXX.XXX   7.74 MB256 16.9%
 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
 UN  XXX.XXX.XXX.XXX   7.72 MB256 17.1%
 007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
 UN  XXX.XXX.XXX.XXX   7.73 MB256 16.3%
 039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
 
 I tried to rebuild the node from scratch, repair the node, no results.
 Still the same owns stats.
 
 The cluster is built from cassandra 1.2.3 and uses vnodes.
 
 
 On the related note: the data size, as you can see, is really small.
 The cluster was created by setting up the us-east datacenter,
 populating it with the dataset, then building the na-prod datacenter
 and running nodetool rebuild us-east. When I tried to run nodetool
 repair it took 25 minutes to finish, on this small dataset. Is this
 ok?
 
 One other think I notices is the amount of compactions on the system keyspace:
 
 /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt
 /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db
 
 This is just after running the repair. Is this ok, considering the
 dataset is 7MB and during the repair no operations were running
 against the database, neither read, nor write, nothing?
 
 How will this perform in production with much bigger data if repair
 takes 25 minutes on 7MB and 11k compactions were triggered by the
 repair run?
 
 regards,
 
 Ondrej Cernos



nodetool status inconsistencies, repair performance and system keyspace compactions

2013-03-26 Thread Ondřej Černoš
Hi all,

I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes.

Currently I test various operational qualities of the setup.

During one of my tests - see this thread in this mailing list:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
- I ran into this situation:

- all nodes have all data and agree on it:

[user@host1-dc1:~] nodetool status

Datacenter: na-prod
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad Tokens  Owns
(effective)  Host IDRack
UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
UN  XXX.XXX.XXX.XXX   7.72 MB256 100.0%
007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad Tokens  Owns
(effective)  Host IDRack
UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
UN  XXX.XXX.XXX.XXX 7.73 MB256 100.0%
f53fd294-16cc-497e-9613-347f07ac3850  1d

- only one node disagrees:

[user@host1-dc2:~] nodetool status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens   Owns   Host ID
  Rack
UN  XXX.XXX.XXX.XXX7.73 MB256 17.6%
a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
UN  XXX.XXX.XXX.XXX7.75 MB256 16.4%
ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
UN  XXX.XXX.XXX.XXX 7.73 MB256 15.7%
f53fd294-16cc-497e-9613-347f07ac3850  1d
Datacenter: na-prod
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens   Owns   Host ID
  Rack
UN  XXX.XXX.XXX.XXX   7.74 MB256 16.9%
0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
UN  XXX.XXX.XXX.XXX   7.72 MB256 17.1%
007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
UN  XXX.XXX.XXX.XXX   7.73 MB256 16.3%
039f206e-da22-44b5-83bd-2513f96ddeac  cmp10

I tried to rebuild the node from scratch, repair the node, no results.
Still the same owns stats.

The cluster is built from cassandra 1.2.3 and uses vnodes.


On the related note: the data size, as you can see, is really small.
The cluster was created by setting up the us-east datacenter,
populating it with the dataset, then building the na-prod datacenter
and running nodetool rebuild us-east. When I tried to run nodetool
repair it took 25 minutes to finish, on this small dataset. Is this
ok?

One other think I notices is the amount of compactions on the system keyspace:

/.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt
/.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db

This is just after running the repair. Is this ok, considering the
dataset is 7MB and during the repair no operations were running
against the database, neither read, nor write, nothing?

How will this perform in production with much bigger data if repair
takes 25 minutes on 7MB and 11k compactions were triggered by the
repair run?

regards,

Ondrej Cernos