Re: nodetool repair keeping an empty cluster busy
Hi Rahul, thanks for replying. Could you please be a bit more specific, though. Eg what exactly is being compacted - there is/was no data at all in the cluster save for a few hundred kB in the system CF (see the nodetool status output). Or - how can those few hundred kB in data generate Gb of network traffic? Cheers, Sven On Wed, Dec 11, 2013 at 7:56 PM, Rahul Menon ra...@apigee.com wrote: Sven So basically when you run a repair you are essentially telling your cluster to run a validation compaction, which generates a merkle tree on all the nodes. These trees are used to identify the inconsistencies. So there is quite a bit of streaming which you see as your network traffic. Rahul On Wed, Dec 11, 2013 at 11:02 AM, Sven Stark sven.st...@m-square.com.auwrote: Corollary: what is getting shipped over the wire? The ganglia screenshot shows the network traffic on all the three hosts on which I ran the nodetool repair. [image: Inline image 1] remember UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.67 KB 256 32.4% bd6b2059-e9dc-4b01-95ab-d7c4fc0ec639 rack1 UN 10.1.2.12 107.62 KB 256 34.7% 5258f178-b20e-408f-a7bf-b6da2903e026 rack1 Much appreciated. Sven On Wed, Dec 11, 2013 at 3:56 PM, Sven Stark sven.st...@m-square.com.auwrote: Howdy! Not a matter of life or death, just curious. I've just stood up a three node cluster (v1.2.8) on three c3.2xlarge boxes in AWS. Silly me forgot the correct replication factor for one of the needed keyspaces. So I changed it via cli and ran a nodetool repair. Well .. there is no data at all in the keyspace yet, only the definition and nodetool repair ran about 20minutes using 2 of the 8 CPU fully. Any hints what nodetool repair is doing on an empty cluster that makes the host spin so hard? Cheers, Sven == Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie Cpu(s): 22.7%us, 1.0%sy, 2.9%ni, 73.0%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 15339196k total, 7474360k used, 7864836k free, 251904k buffers Swap:0k total,0k used,0k free, 798324k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 10840 cassandr 20 0 8354m 4.1g 19m S 218 28.0 35:25.73 jsvc 16675 kafka 20 0 3987m 192m 12m S2 1.3 0:47.89 java 20328 root 20 0 5613m 569m 16m S2 3.8 1:35.13 jsvc 5969 exhibito 20 0 6423m 116m 12m S1 0.8 0:25.87 java 14436 tomcat7 20 0 3701m 167m 11m S1 1.1 0:25.80 java 6278 exhibito 20 0 6487m 119m 9984 S0 0.8 0:22.63 java 17713 storm 20 0 6033m 159m 11m S0 1.1 0:10.99 java 18769 storm 20 0 5773m 156m 11m S0 1.0 0:10.71 java root@xxx-01:~# nodetool -h `hostname` status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.67 KB 256 32.4% bd6b2059-e9dc-4b01-95ab-d7c4fc0ec639 rack1 UN 10.1.2.12 107.62 KB 256 34.7% 5258f178-b20e-408f-a7bf-b6da2903e026 rack1 root@xxx-01:~# nodetool -h `hostname` compactionstats pending tasks: 1 compaction typekeyspace column family completed total unit progress Active compaction remaining time :n/a root@xxx-01:~# nodetool -h `hostname` netstats Mode: NORMAL Not sending any streams. Not receiving any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 57155 Responses n/a 0 14573 image.png
Re: nodetool repair keeping an empty cluster busy
Corollary: what is getting shipped over the wire? The ganglia screenshot shows the network traffic on all the three hosts on which I ran the nodetool repair. [image: Inline image 1] remember UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.67 KB 256 32.4% bd6b2059-e9dc-4b01-95ab-d7c4fc0ec639 rack1 UN 10.1.2.12 107.62 KB 256 34.7% 5258f178-b20e-408f-a7bf-b6da2903e026 rack1 Much appreciated. Sven On Wed, Dec 11, 2013 at 3:56 PM, Sven Stark sven.st...@m-square.com.auwrote: Howdy! Not a matter of life or death, just curious. I've just stood up a three node cluster (v1.2.8) on three c3.2xlarge boxes in AWS. Silly me forgot the correct replication factor for one of the needed keyspaces. So I changed it via cli and ran a nodetool repair. Well .. there is no data at all in the keyspace yet, only the definition and nodetool repair ran about 20minutes using 2 of the 8 CPU fully. Any hints what nodetool repair is doing on an empty cluster that makes the host spin so hard? Cheers, Sven == Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie Cpu(s): 22.7%us, 1.0%sy, 2.9%ni, 73.0%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 15339196k total, 7474360k used, 7864836k free, 251904k buffers Swap:0k total,0k used,0k free, 798324k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 10840 cassandr 20 0 8354m 4.1g 19m S 218 28.0 35:25.73 jsvc 16675 kafka 20 0 3987m 192m 12m S2 1.3 0:47.89 java 20328 root 20 0 5613m 569m 16m S2 3.8 1:35.13 jsvc 5969 exhibito 20 0 6423m 116m 12m S1 0.8 0:25.87 java 14436 tomcat7 20 0 3701m 167m 11m S1 1.1 0:25.80 java 6278 exhibito 20 0 6487m 119m 9984 S0 0.8 0:22.63 java 17713 storm 20 0 6033m 159m 11m S0 1.1 0:10.99 java 18769 storm 20 0 5773m 156m 11m S0 1.0 0:10.71 java root@xxx-01:~# nodetool -h `hostname` status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.67 KB 256 32.4% bd6b2059-e9dc-4b01-95ab-d7c4fc0ec639 rack1 UN 10.1.2.12 107.62 KB 256 34.7% 5258f178-b20e-408f-a7bf-b6da2903e026 rack1 root@xxx-01:~# nodetool -h `hostname` compactionstats pending tasks: 1 compaction typekeyspace column family completed total unit progress Active compaction remaining time :n/a root@xxx-01:~# nodetool -h `hostname` netstats Mode: NORMAL Not sending any streams. Not receiving any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 57155 Responses n/a 0 14573 image.png
Re: Opscenter 3.2.2 (?) jmx auth issues
Hi Nick, thanks for getting back. Much appreciated. Cheers, Sven On Sat, Oct 19, 2013 at 3:58 AM, Nick Bailey n...@datastax.com wrote: Sven, I've verified there is an issue with jmx authentication in the 3.2.2 release. Thanks for the bug report! Sorry it's giving you issues. The bug should be fixed in the next release of OpsCenter. Nick On Wed, Oct 16, 2013 at 8:07 PM, Sven Stark sven.st...@m-square.com.auwrote: Hi guys, we have secured C* jmx with username/pw. We upgraded our Opscenter from 3.0.2 to 3.2.2 last week and noticed that the agents could not connect anymore ERROR [jmx-metrics-4] 2013-10-17 00:45:54,437 Error getting general metrics java.lang.SecurityException: Authentication failed! Credentials required at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticationFailure(JMXPluggableAuthenticator.java:193) at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticate(JMXPluggableAuthenticator.java:145) even though the credentials were correctly in /etc/opscenter/clusters/foo-cluster.conf [jmx] username = secret password = verysecret port = 20001 Checks with other jmx based tools (nodetool, jmxtrans) confirm that the jmx setup is correct. Downgrading Opscenter to 3.0.2 immediately resolved the issue. Could anybody confirm whether that's a known bug? Cheers, Sven
Opscenter 3.2.2 (?) jmx auth issues
Hi guys, we have secured C* jmx with username/pw. We upgraded our Opscenter from 3.0.2 to 3.2.2 last week and noticed that the agents could not connect anymore ERROR [jmx-metrics-4] 2013-10-17 00:45:54,437 Error getting general metrics java.lang.SecurityException: Authentication failed! Credentials required at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticationFailure(JMXPluggableAuthenticator.java:193) at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticate(JMXPluggableAuthenticator.java:145) even though the credentials were correctly in /etc/opscenter/clusters/foo-cluster.conf [jmx] username = secret password = verysecret port = 20001 Checks with other jmx based tools (nodetool, jmxtrans) confirm that the jmx setup is correct. Downgrading Opscenter to 3.0.2 immediately resolved the issue. Could anybody confirm whether that's a known bug? Cheers, Sven