Re: Cassandra 2.0.8 MemoryMeter goes crazy
On Mon, Jun 16, 2014 at 11:03 AM, horschi wrote: > About running mixed versions: > I thought running mixed versions is ok. Running repair with mixed versions > is not though. Right? > Running with split major versions for longer than it takes to do a rolling restart is not supported. =Rob
Re: Cassandra 2.0.8 MemoryMeter goes crazy
Hi Robert, sorry, I am using our own internal terminology :-) The entire cluster was upgraded. All 3 nodes of that cluster are on 2.0.8 now. About the issue: To me it looks like there is something wrong in the Memtable class. Some very special edge case on CFs that are updated rarely. I cant say if it is new to 2.0 or if it already existed in 1.2. About running mixed versions: I thought running mixed versions is ok. Running repair with mixed versions is not though. Right? kind regards, Christian On Mon, Jun 16, 2014 at 7:50 PM, Robert Coli wrote: > On Sat, Jun 14, 2014 at 1:02 PM, horschi wrote: > >> this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8. >> All 3 nodes were upgraded. SStables are upgraded. >> > > One of your *clusters* or one of your *systems*? > > Running with split major versions is not supported. > > =Rob >
Re: Cassandra 2.0.8 MemoryMeter goes crazy
On Sat, Jun 14, 2014 at 1:02 PM, horschi wrote: > this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8. > All 3 nodes were upgraded. SStables are upgraded. > One of your *clusters* or one of your *systems*? Running with split major versions is not supported. =Rob
Re: Cassandra 2.0.8 MemoryMeter goes crazy
Hi again, before people start replying here: I just reported a Jira ticket: https://issues.apache.org/jira/browse/CASSANDRA-7401 I think Memtable.maybeUpdateLiveRatio() needs some love. kind regards, Christian On Sat, Jun 14, 2014 at 10:02 PM, horschi wrote: > Hi everyone, > > this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8. > All 3 nodes were upgraded. SStables are upgraded. > > Unfortunetaly we are now experiencing that Cassandra starts to hang every > 10 hours or so. > > We can see the MemoryMeter being very active, every time it is hanging. > Both in tpstats and in the system.log: > > INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481) > CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0 > (just-counted was 64.0). calculation took 0ms for 0 cells > > This line is logged hundreds of times per second (!) when Cassandra is > down. CPU is a 100% busy. > > Interestingly this is only logged for this particular Columnfamily. This > CF is used as a queue, which only contains a few entries (datafiles are > about 4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones). > > Table: ResponsePortal > SSTable count: 1 > Space used (live), bytes: 4863 > Space used (total), bytes: 4863 > SSTable Compression Ratio: 0.9545454545454546 > Number of keys (estimate): 128 > Memtable cell count: 0 > Memtable data size, bytes: 0 > Memtable switch count: 1 > Local read count: 0 > Local read latency: 0.000 ms > Local write count: 5 > Local write latency: 0.000 ms > Pending tasks: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used, bytes: 176 > Compacted partition minimum bytes: 43 > Compacted partition maximum bytes: 50 > Compacted partition mean bytes: 50 > Average live cells per slice (last five minutes): 0.0 > Average tombstones per slice (last five minutes): 0.0 > > > Table: ResponsePortal > SSTable count: 1 > Space used (live), bytes: 4765 > Space used (total), bytes: 5777 > SSTable Compression Ratio: 0.75 > Number of keys (estimate): 128 > Memtable cell count: 0 > Memtable data size, bytes: 0 > Memtable switch count: 12 > Local read count: 0 > Local read latency: 0.000 ms > Local write count: 1096 > Local write latency: 0.000 ms > Pending tasks: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used, bytes: 16 > Compacted partition minimum bytes: 43 > Compacted partition maximum bytes: 50 > Compacted partition mean bytes: 50 > Average live cells per slice (last five minutes): 0.0 > Average tombstones per slice (last five minutes): 0.0 > > > Has anyone ever seen this or has an idea what could be wrong? It seems > that 2.0 can handle this column family not as good as 1.2 could. > > Any hints on what could be wrong are greatly appreciated :-) > > Cheers, > Christian >
Re: Multi-DC Environment Question
Hello again, Back to this after a while... As far as I can tell whenever DC2 is unavailable, there is one node from DC1 that acts as a coordinator. When DC2 is available again, this one node sends the hints to only one node at DC2, which then sends any replicas to the other nodes in the local DC (DC2). This ensures efficient cross-DC bandwidth usage. I was watching "system.hints" on all nodes during this test and this is the conclusion I came to. Two things: 1. If the above is correct, does the same apply when performing anti-entropy repair (without specifying a particular DC)? I'm just hoping the answer to this is going to be YES, otherwise the VPN is not going to be very happy in our case and we would prefer to not saturate it whenever running nodetool repair. I suppose we could have a traffic limiter on the firewalls worst case scenario but I would appreciate your input if you know more on this. 2. As I described earlier, in order to test this I was watching the "system.hints" CF in order to monitor any hints. I was looking to add a Nagios check for this purpose. For that reason I was looking into JMX Concole. I noticed that when a node stores hints, "MBean org.apache.cassandra.db:type=ColumnFamilies,keyspace=system,columnfamily=hints", attribute "MemtableColumnsCount" goes up (although I would expect it to be MemtableRowCount or something?). This attribute will retain its value, until the other node becomes available and ready to receive the hints. I was looking for another attribute somewhere to monitor the active hints. I checked: "MBean org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=PendingTasks", "MBean org.apache.cassandra.metrics:type=Storage,name=TotalHints", "MBean org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress", "MBean org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintedHandoff,name=ActiveTasks" and even "MBean org.apache.cassandra.metrics:type=HintedHandOffManager,name=Hints_not_stored-/ 10.2.1.100" (this one will never go back to zero). All of them would not increase whenever any hints are being sent (or at least I didn't catch it because it was too fast or whatever?). Does anyone know what all these attributes represent? It looks like there are more specific hint attributes on a per CF basis, but I was looking for a more generic one to begin with. Any help would be much appreciated. Thanks in advance, Vasilis On Wed, Jun 4, 2014 at 1:42 PM, Vasileios Vlachos < vasileiosvlac...@gmail.com> wrote: > Hello Matt, > > nodetool status: > > Datacenter: MAN > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Owns (effective) Host ID Token Rack > UN 10.2.1.103 89.34 KB 99.2% b7f8bc93-bf39-475c-a251-8fbe2c7f7239 > -9211685935328163899 RAC1 > UN 10.2.1.102 86.32 KB 0.7% 1f8937e1-9ecb-4e59-896e-6d6ac42dc16d > -3511707179720619260 RAC1 > Datacenter: DER > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Owns (effective) Host ID Token Rack > UN 10.2.1.101 75.43 KB 0.2% e71c7ee7-d852-4819-81c0-e993ca87dd5c > -1277931707251349874 RAC1 > UN 10.2.1.100 104.53 KB 99.8% 7333b664-ce2d-40cf-986f-d4b4d4023726 > -9204412570946850701 RAC1 > > I do not know why the cluster is not balanced at the moment, but it holds > almost no data. I will populate it soon and see how that goes. The output > of 'nodetool ring' just lists all the tokens assigned to each individual > node, and as you can imagine it would be pointless to paste it here. I just > did 'nodetool ring | awk ... | unique | wc -l' and it works out to be 1024 > as expected (4 nodes x 256 tokens each). > > Still have not got the answers to the other questions though... > > Thanks, > > Vasilis > > > On Wed, Jun 4, 2014 at 12:28 AM, Matthew Allen > wrote: > >> Thanks Vasileios. I think I need to make a call as to whether to switch >> to vnodes or stick with tokens for my Multi-DC cluster. >> >> Would you be able to show a nodetool ring/status from your cluster to see >> what the token assignment looks like ? >> >> Thanks >> >> Matt >> >> >> On Wed, Jun 4, 2014 at 8:31 AM, Vasileios Vlachos < >> vasileiosvlac...@gmail.com> wrote: >> >>> I should have said that earlier really... I am using 1.2.16 and Vnodes >>> are enabled. >>> >>> Thanks, >>> >>> Vasilis >>> >>> -- >>> Kind Regards, >>> >>> Vasileios Vlachos >>> >>> >> >