Re: cascading failures due to memory
No. Upgraded to 0.8 and monitor the systems more. we schedule a repair every 24hrs via cron and so far no problems.. On Jun 15, 2011 5:44 PM, "AJ" wrote: > Sasha, > > Did you ever nail down the cause of this problem? > > On 5/31/2011 4:01 AM, Sasha Dolgy wrote: >> hi everyone, >> >> the current nodes i have deployed (4) have all been working fine, with >> not a lot of data ... more reads than writes at the moment. as i had >> monitoring disabled, when one node's OS killed the cassandra process >> due to out of memory problems ... that was fine. 24 hours later, >> another node, 24 hours later, another node ...until finally, all 4 >> nodes no longer had cassandra running. >> >> When all nodes are started fresh, CPU utilization is at about 21% on >> each box. after 24 hours, this goes up to 32% and then 51% 24 hours >> later. >> >> originally I had thought that this may be a result of 'nodetool >> repair' not being run consistently ... after adding a cronjob to run >> every 24 hours (staggered between nodes) the problem of the increasing >> memory utilization does not resolve. >> >> i've read the operations page and also the >> http://wiki.apache.org/cassandra/MemtableThresholds page. i am >> running defaults and 0.7.6-02 ... >> >> what are the best places to start in terms of finding why this is >> happening? CF design / usage? 'nodetool cfstats' gives me some good >> info ... and i've already implemented some changes to one CF based on >> how it had ballooned (too many rows versus not enough columns) >> >> suggestions appreciated >> >
Re: cascading failures due to memory
Sasha, Did you ever nail down the cause of this problem? On 5/31/2011 4:01 AM, Sasha Dolgy wrote: hi everyone, the current nodes i have deployed (4) have all been working fine, with not a lot of data ... more reads than writes at the moment. as i had monitoring disabled, when one node's OS killed the cassandra process due to out of memory problems ... that was fine. 24 hours later, another node, 24 hours later, another node ...until finally, all 4 nodes no longer had cassandra running. When all nodes are started fresh, CPU utilization is at about 21% on each box. after 24 hours, this goes up to 32% and then 51% 24 hours later. originally I had thought that this may be a result of 'nodetool repair' not being run consistently ... after adding a cronjob to run every 24 hours (staggered between nodes) the problem of the increasing memory utilization does not resolve. i've read the operations page and also the http://wiki.apache.org/cassandra/MemtableThresholds page. i am running defaults and 0.7.6-02 ... what are the best places to start in terms of finding why this is happening? CF design / usage? 'nodetool cfstats' gives me some good info ... and i've already implemented some changes to one CF based on how it had ballooned (too many rows versus not enough columns) suggestions appreciated
Re: cascading failures due to memory
look for GCInspector On Wed, Jun 1, 2011 at 2:30 PM, Sasha Dolgy wrote: > is there a specific string I should be looking for in the logs that > isn't super obvious to me at the moment... > > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis wrote: >> The place to start is with the statistics Cassandra logs after each GC. >> >> On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy wrote: >>> hi everyone, >>> >>> the current nodes i have deployed (4) have all been working fine, with >>> not a lot of data ... more reads than writes at the moment. as i had >>> monitoring disabled, when one node's OS killed the cassandra process >>> due to out of memory problems ... that was fine. 24 hours later, >>> another node, 24 hours later, another node ...until finally, all 4 >>> nodes no longer had cassandra running. >>> >>> When all nodes are started fresh, CPU utilization is at about 21% on >>> each box. after 24 hours, this goes up to 32% and then 51% 24 hours >>> later. >>> >>> originally I had thought that this may be a result of 'nodetool >>> repair' not being run consistently ... after adding a cronjob to run >>> every 24 hours (staggered between nodes) the problem of the increasing >>> memory utilization does not resolve. >>> >>> i've read the operations page and also the >>> http://wiki.apache.org/cassandra/MemtableThresholds page. i am >>> running defaults and 0.7.6-02 ... >>> >>> what are the best places to start in terms of finding why this is >>> happening? CF design / usage? 'nodetool cfstats' gives me some good >>> info ... and i've already implemented some changes to one CF based on >>> how it had ballooned (too many rows versus not enough columns) >>> >>> suggestions appreciated >>> >>> -- >>> Sasha Dolgy >>> sasha.do...@gmail.com >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > Sasha Dolgy > sasha.do...@gmail.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: cascading failures due to memory
and is there anything specific that could be causing the issue between Java SE 1.6.0_24 and 1.6.0_25 ? All nodes are _24 up to 64% memory usage today -sd On Wed, Jun 1, 2011 at 9:30 PM, Sasha Dolgy wrote: > is there a specific string I should be looking for in the logs that > isn't super obvious to me at the moment... > > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis wrote: >> The place to start is with the statistics Cassandra logs after each GC. >> >> On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy wrote: >>> hi everyone, >>> >>> the current nodes i have deployed (4) have all been working fine, with >>> not a lot of data ... more reads than writes at the moment. as i had >>> monitoring disabled, when one node's OS killed the cassandra process >>> due to out of memory problems ... that was fine. 24 hours later, >>> another node, 24 hours later, another node ...until finally, all 4 >>> nodes no longer had cassandra running. >>> >>> When all nodes are started fresh, CPU utilization is at about 21% on >>> each box. after 24 hours, this goes up to 32% and then 51% 24 hours >>> later. >>> >>> originally I had thought that this may be a result of 'nodetool >>> repair' not being run consistently ... after adding a cronjob to run >>> every 24 hours (staggered between nodes) the problem of the increasing >>> memory utilization does not resolve. >>> >>> i've read the operations page and also the >>> http://wiki.apache.org/cassandra/MemtableThresholds page. i am >>> running defaults and 0.7.6-02 ... >>> >>> what are the best places to start in terms of finding why this is >>> happening? CF design / usage? 'nodetool cfstats' gives me some good >>> info ... and i've already implemented some changes to one CF based on >>> how it had ballooned (too many rows versus not enough columns) >>> >>> suggestions appreciated >>> >>> -- >>> Sasha Dolgy >>> sasha.do...@gmail.com >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > Sasha Dolgy > sasha.do...@gmail.com > -- Sasha Dolgy sasha.do...@gmail.com
Re: cascading failures due to memory
is there a specific string I should be looking for in the logs that isn't super obvious to me at the moment... On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis wrote: > The place to start is with the statistics Cassandra logs after each GC. > > On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy wrote: >> hi everyone, >> >> the current nodes i have deployed (4) have all been working fine, with >> not a lot of data ... more reads than writes at the moment. as i had >> monitoring disabled, when one node's OS killed the cassandra process >> due to out of memory problems ... that was fine. 24 hours later, >> another node, 24 hours later, another node ...until finally, all 4 >> nodes no longer had cassandra running. >> >> When all nodes are started fresh, CPU utilization is at about 21% on >> each box. after 24 hours, this goes up to 32% and then 51% 24 hours >> later. >> >> originally I had thought that this may be a result of 'nodetool >> repair' not being run consistently ... after adding a cronjob to run >> every 24 hours (staggered between nodes) the problem of the increasing >> memory utilization does not resolve. >> >> i've read the operations page and also the >> http://wiki.apache.org/cassandra/MemtableThresholds page. i am >> running defaults and 0.7.6-02 ... >> >> what are the best places to start in terms of finding why this is >> happening? CF design / usage? 'nodetool cfstats' gives me some good >> info ... and i've already implemented some changes to one CF based on >> how it had ballooned (too many rows versus not enough columns) >> >> suggestions appreciated >> >> -- >> Sasha Dolgy >> sasha.do...@gmail.com >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Sasha Dolgy sasha.do...@gmail.com
Re: cascading failures due to memory
The place to start is with the statistics Cassandra logs after each GC. On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy wrote: > hi everyone, > > the current nodes i have deployed (4) have all been working fine, with > not a lot of data ... more reads than writes at the moment. as i had > monitoring disabled, when one node's OS killed the cassandra process > due to out of memory problems ... that was fine. 24 hours later, > another node, 24 hours later, another node ...until finally, all 4 > nodes no longer had cassandra running. > > When all nodes are started fresh, CPU utilization is at about 21% on > each box. after 24 hours, this goes up to 32% and then 51% 24 hours > later. > > originally I had thought that this may be a result of 'nodetool > repair' not being run consistently ... after adding a cronjob to run > every 24 hours (staggered between nodes) the problem of the increasing > memory utilization does not resolve. > > i've read the operations page and also the > http://wiki.apache.org/cassandra/MemtableThresholds page. i am > running defaults and 0.7.6-02 ... > > what are the best places to start in terms of finding why this is > happening? CF design / usage? 'nodetool cfstats' gives me some good > info ... and i've already implemented some changes to one CF based on > how it had ballooned (too many rows versus not enough columns) > > suggestions appreciated > > -- > Sasha Dolgy > sasha.do...@gmail.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com