Mutation dropped and Read-Repair performance issue
Hi All, We are facing problems of failure of Read-Repair stages with error Digest Mismatch and count is 300+ per day per node. At the same time, we are experiencing node is getting overloaded for a quick couple of seconds due to long GC pauses (of around 7-8 seconds). We are not running a repair on regular basis as a maintenance activity owing to the node is going down whenever we are running repair for the tables. After running the repair node is going down due to long GC pauses again. Except for one table for all other tables, we can run the repair with option --in-local-dc. Below is the configuration of the cluster: 1. 15 node cluster. 2. RF=3 3. Xmx and Xms 31GB. 4. G1GC algorithm is in use. 5. Version 3.11.2 6. Load on each node roughly around 500GB 7. One table is having a maximum amount of load compared to other tables. Please suggest if there are any configuration level changes which we can do to avoid the above problems. Getting too many digest mismatch messages is a sign of node is doing more read and write operations compared to without those messages and it can be the cause of node is getting overloaded for that particular moment? -- Thanks, S.R.
Re: repair performance
I would zero in on network throughput, especially interrack trunks sent from my mobile Daemeon Reiydelle skype daemeon.c.m.reiydelle USA 415.501.0198 On Mar 17, 2017 2:07 PM, "Roland Otta"wrote: > hello, > > we are quite inexperienced with cassandra at the moment and are playing > around with a new cluster we built up for getting familiar with > cassandra and its possibilites. > > while getting familiar with that topic we recognized that repairs in > our cluster take a long time. To get an idea of our current setup here > are some numbers: > > our cluster currently consists of 4 nodes (replication factor 3). > these nodes are all on dedicated physical hardware in our own > datacenter. all of the nodes have > > 32 cores @2,9Ghz > 64 GB ram > 2 ssds (raid0) 900 GB each for data > 1 seperate hdd for OS + commitlogs > > current dataset: > approx 530 GB per node > 21 tables (biggest one has more than 200 GB / node) > > > i already tried setting compactionthroughput + streamingthroughput to > unlimited for testing purposes ... but that did not change anything. > > when checking system resources i cannot see any bottleneck (cpus are > pretty idle and we have no iowaits). > > when issuing a repair via > > nodetool repair -local on a node the repair takes longer than a day. > is this normal or could we normally expect a faster repair? > > i also recognized that initalizing of new nodes in the datacenter was > really slow (approx 50 mbit/s). also here i expected a much better > performance - could those 2 problems be somehow related? > > br// > roland
Re: repair performance
good point! i did not (so far) i will do that - especially because i often see all compaction threads being used during repair (according to compactionstats). thank you also for your link recommendations. i will go through them. On Sat, 2017-03-18 at 16:54 +, Thakrar, Jayesh wrote: You changed compaction_throughput_mb_per_sec, but did you also increase concurrent_compactors? In reference to the reaper and some other info I received on the user forum to my question on "nodetool repair", here are some useful links/slides - https://www.datastax.com/dev/blog/repair-in-cassandra https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/ http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016 http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016 From: Roland Otta <roland.o...@willhaben.at> Date: Friday, March 17, 2017 at 5:47 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: repair performance did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
You changed compaction_throughput_mb_per_sec, but did you also increase concurrent_compactors? In reference to the reaper and some other info I received on the user forum to my question on "nodetool repair", here are some useful links/slides - https://www.datastax.com/dev/blog/repair-in-cassandra https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/ http://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016 http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016 From: Roland Otta <roland.o...@willhaben.at> Date: Friday, March 17, 2017 at 5:47 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: repair performance did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta>: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta >: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta: > forgot to mention the version we are using: > > we are using 3.0.7 - so i guess we should have incremental repairs by > default. > it also prints out incremental:true when starting a repair > INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - > Starting repair command #7, repairing keyspace xxx with repair options > (parallelism: parallel, primary range: false, incremental: true, job > threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of > ranges: 1758) > > 3.0.7 is also the reason why we are not using reaper ... as far as i could > figure out it's not compatible with 3.0+ > > > > On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: > > It depends a lot ... > > - Repairs can be very slow, yes! (And unreliable, due to timeouts, > outages, whatever) > - You can use incremental repairs to speed things up for regular repairs > - You can use "reaper" to schedule repairs and run them sliced, automated, > failsafe > > The time repairs actually may vary a lot depending on how much data has to > be streamed or how inconsistent your cluster is. > > 50mbit/s is really a bit low! The actual performance depends on so many > factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old > nodes" of the cluster. > This is a quite individual problem you have to track down individually. > > 2017-03-17 22:07 GMT+01:00 Roland Otta : > > hello, > > we are quite inexperienced with cassandra at the moment and are playing > around with a new cluster we built up for getting familiar with > cassandra and its possibilites. > > while getting familiar with that topic we recognized that repairs in > our cluster take a long time. To get an idea of our current setup here > are some numbers: > > our cluster currently consists of 4 nodes (replication factor 3). > these nodes are all on dedicated physical hardware in our own > datacenter. all of the nodes have > > 32 cores @2,9Ghz > 64 GB ram > 2 ssds (raid0) 900 GB each for data > 1 seperate hdd for OS + commitlogs > > current dataset: > approx 530 GB per node > 21 tables (biggest one has more than 200 GB / node) > > > i already tried setting compactionthroughput + streamingthroughput to > unlimited for testing purposes ... but that did not change anything. > > when checking system resources i cannot see any bottleneck (cpus are > pretty idle and we have no iowaits). > > when issuing a repair via > > nodetool repair -local on a node the repair takes longer than a day. > is this normal or could we normally expect a faster repair? > > i also recognized that initalizing of new nodes in the datacenter was > really slow (approx 50 mbit/s). also here i expected a much better > performance - could those 2 problems be somehow related? > > br// > roland > > >
Re: repair performance
... maybe i should just try increasing the job threads with --job-threads shame on me On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta: > hello, > > we are quite inexperienced with cassandra at the moment and are playing > around with a new cluster we built up for getting familiar with > cassandra and its possibilites. > > while getting familiar with that topic we recognized that repairs in > our cluster take a long time. To get an idea of our current setup here > are some numbers: > > our cluster currently consists of 4 nodes (replication factor 3). > these nodes are all on dedicated physical hardware in our own > datacenter. all of the nodes have > > 32 cores @2,9Ghz > 64 GB ram > 2 ssds (raid0) 900 GB each for data > 1 seperate hdd for OS + commitlogs > > current dataset: > approx 530 GB per node > 21 tables (biggest one has more than 200 GB / node) > > > i already tried setting compactionthroughput + streamingthroughput to > unlimited for testing purposes ... but that did not change anything. > > when checking system resources i cannot see any bottleneck (cpus are > pretty idle and we have no iowaits). > > when issuing a repair via > > nodetool repair -local on a node the repair takes longer than a day. > is this normal or could we normally expect a faster repair? > > i also recognized that initalizing of new nodes in the datacenter was > really slow (approx 50 mbit/s). also here i expected a much better > performance - could those 2 problems be somehow related? > > br// > roland
repair performance
hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: nodetool status inconsistencies, repair performance and system keyspace compactions
monitor the repair using nodetool compactionstats to see the merkle trees being created, and nodetool netstats to see data streaming. Also look in the logs for messages from AntiEntropyService.java , that will tell you how long the node waited for each replica to get back to it. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 4/04/2013, at 5:42 PM, Ondřej Černoš cern...@gmail.com wrote: Hi, most has been resolved - the failed to uncompress error was really a bug in cassandra (see https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem with different load reporting is a change between 1.2.1 (reports 100% for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the fraction. Is this correct? Anyway, the nodetool repair still takes ages to finish, considering only megabytes of not changing data are involved in my test: [root@host:/etc/puppet] nodetool repair ks [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536 ranges for keyspace ks [2013-04-04 13:47:17,007] Repair session 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range (-2270395505556181001,-2268004533044804266] finished ... [2013-04-04 13:47:17,063] Repair session 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range (1069254279177813908,1070290707448386360] finished [2013-04-04 13:47:17,063] Repair command #1 finished This is the status before the repair (by the way, after the datacenter has been bootstrapped from the remote one): [root@host:/etc/puppet] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx5.74 MB256 17.1% 06ff8328-32a3-4196-a31f-1e0f608d0638 1d UN xxx.xxx.xxx.xxx5.73 MB256 15.3% 7a96bf16-e268-433a-9912-a0cf1668184e 1d UN xxx.xxx.xxx.xxx5.72 MB256 17.5% 67a68a2a-12a8-459d-9d18-221426646e84 1d Datacenter: na-dev == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx 5.74 MB256 16.4% eb86aaae-ef0d-40aa-9b74-2b9704c77c0a cmp02 UN xxx.xxx.xxx.xxx 5.74 MB256 17.0% cd24af74-7f6a-4eaa-814f-62474b4e4df1 cmp01 UN xxx.xxx.xxx.xxx 5.74 MB256 16.7% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 cmp05 Why does it take 20 minutes to finish? Fortunately the big number of compactions I reported in the previous email was not triggered. And is there a documentation where I could find the exact semantics of repair when vnodes are used (and what -pr means in such a setup) and when run in multiple datacenter setup? I still don't quite get it. regards, Ondřej Černoš On Thu, Mar 28, 2013 at 3:30 AM, aaron morton aa...@thelastpickle.com wrote: During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html That thread has been updated, check the bug ondrej created. How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? Seems a little odd. See what happens the next time you run repair. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes. Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN
Re: nodetool status inconsistencies, repair performance and system keyspace compactions
Hi, most has been resolved - the failed to uncompress error was really a bug in cassandra (see https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem with different load reporting is a change between 1.2.1 (reports 100% for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the fraction. Is this correct? Anyway, the nodetool repair still takes ages to finish, considering only megabytes of not changing data are involved in my test: [root@host:/etc/puppet] nodetool repair ks [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536 ranges for keyspace ks [2013-04-04 13:47:17,007] Repair session 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range (-2270395505556181001,-2268004533044804266] finished ... [2013-04-04 13:47:17,063] Repair session 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range (1069254279177813908,1070290707448386360] finished [2013-04-04 13:47:17,063] Repair command #1 finished This is the status before the repair (by the way, after the datacenter has been bootstrapped from the remote one): [root@host:/etc/puppet] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx5.74 MB256 17.1% 06ff8328-32a3-4196-a31f-1e0f608d0638 1d UN xxx.xxx.xxx.xxx5.73 MB256 15.3% 7a96bf16-e268-433a-9912-a0cf1668184e 1d UN xxx.xxx.xxx.xxx5.72 MB256 17.5% 67a68a2a-12a8-459d-9d18-221426646e84 1d Datacenter: na-dev == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx 5.74 MB256 16.4% eb86aaae-ef0d-40aa-9b74-2b9704c77c0a cmp02 UN xxx.xxx.xxx.xxx 5.74 MB256 17.0% cd24af74-7f6a-4eaa-814f-62474b4e4df1 cmp01 UN xxx.xxx.xxx.xxx 5.74 MB256 16.7% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 cmp05 Why does it take 20 minutes to finish? Fortunately the big number of compactions I reported in the previous email was not triggered. And is there a documentation where I could find the exact semantics of repair when vnodes are used (and what -pr means in such a setup) and when run in multiple datacenter setup? I still don't quite get it. regards, Ondřej Černoš On Thu, Mar 28, 2013 at 3:30 AM, aaron morton aa...@thelastpickle.com wrote: During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html That thread has been updated, check the bug ondrej created. How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? Seems a little odd. See what happens the next time you run repair. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes. Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX7.73 MB256 100.0% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.73 MB256 100.0% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 100.0% f53fd294-16cc-497e-9613-347f07ac3850 1d - only one node disagrees: [user@host1-dc2:~] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN
Re: nodetool status inconsistencies, repair performance and system keyspace compactions
During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html That thread has been updated, check the bug ondrej created. How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? Seems a little odd. See what happens the next time you run repair. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes. Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX7.73 MB256 100.0% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.73 MB256 100.0% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 100.0% f53fd294-16cc-497e-9613-347f07ac3850 1d - only one node disagrees: [user@host1-dc2:~] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX7.73 MB256 17.6% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.75 MB256 16.4% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 15.7% f53fd294-16cc-497e-9613-347f07ac3850 1d Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX 7.74 MB256 16.9% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.72 MB256 17.1% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 UN XXX.XXX.XXX.XXX 7.73 MB256 16.3% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 I tried to rebuild the node from scratch, repair the node, no results. Still the same owns stats. The cluster is built from cassandra 1.2.3 and uses vnodes. On the related note: the data size, as you can see, is really small. The cluster was created by setting up the us-east datacenter, populating it with the dataset, then building the na-prod datacenter and running nodetool rebuild us-east. When I tried to run nodetool repair it took 25 minutes to finish, on this small dataset. Is this ok? One other think I notices is the amount of compactions on the system keyspace: /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db This is just after running the repair. Is this ok, considering the dataset is 7MB and during the repair no operations were running against the database, neither read, nor write, nothing? How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? regards, Ondrej Cernos
nodetool status inconsistencies, repair performance and system keyspace compactions
Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes. Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX7.73 MB256 100.0% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.73 MB256 100.0% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 100.0% f53fd294-16cc-497e-9613-347f07ac3850 1d - only one node disagrees: [user@host1-dc2:~] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX7.73 MB256 17.6% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.75 MB256 16.4% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 15.7% f53fd294-16cc-497e-9613-347f07ac3850 1d Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX 7.74 MB256 16.9% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.72 MB256 17.1% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 UN XXX.XXX.XXX.XXX 7.73 MB256 16.3% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 I tried to rebuild the node from scratch, repair the node, no results. Still the same owns stats. The cluster is built from cassandra 1.2.3 and uses vnodes. On the related note: the data size, as you can see, is really small. The cluster was created by setting up the us-east datacenter, populating it with the dataset, then building the na-prod datacenter and running nodetool rebuild us-east. When I tried to run nodetool repair it took 25 minutes to finish, on this small dataset. Is this ok? One other think I notices is the amount of compactions on the system keyspace: /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db This is just after running the repair. Is this ok, considering the dataset is 7MB and during the repair no operations were running against the database, neither read, nor write, nothing? How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? regards, Ondrej Cernos