Re: Cassandra crashes when using offheap_objects for memtable_allocation_type

2020-06-02 Thread Reid Pinchback
I’d also take a look at the O/S level.  You might be queued up on flushing of 
dirty pages, which would also throttle your ability to write mempages.  Once 
the I/O gets throttled badly, I’ve seen it push back into what you see in C*. 
To Aaron’s point, you want a balance in memory between C* and O/S buffer cache, 
because to write to disk you pass through buffer cache first.

From: Aaron Ploetz 
Reply-To: "user@cassandra.apache.org" 
Date: Tuesday, June 2, 2020 at 9:38 AM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra crashes when using offheap_objects for 
memtable_allocation_type

Message from External Sender
I would try running it with memtable_offheap_space_in_mb at the default for 
sure, but definitely lower than 8GB.  With 32GB of RAM, you're already 
allocating half of that for your heap, and then halving the remainder for off 
heap memtables.  What's left may not be enough for the OS, etc.  Giving some of 
that back, will allow more to be used for page cache, which always helps.

"JVM heap size: 16GB, CMS, 1GB newgen"

For CMS GC with a 16GB heap, 1GB is way too small for new gen.  You're going to 
want that to be at least 40% of the max heap size.  Some folks here even 
advocate for setting Xmn as high as 50% of Xmx/s.

If you want to stick with CMS GC, take a look at 
https://issues.apache.org/jira/browse/CASSANDRA-8150<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D8150=DwMFaQ=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc=eX5wEX3VXggU-3Y51C3hDF0XBO9xUZQZd12jW_Da8Qw=M9v_2_0-S3OhfynpBRwgk6dnPh_pJiNC9Fps-9Qu2S8=>.
  There's plenty of good info in there on CMS GC tuning.  Make sure to read 
through the whole ticket, so that you understand what each setting does.  You 
can't just pick-and-choose.

Regards,

Aaron


On Tue, Jun 2, 2020 at 1:31 AM onmstester onmstester 
 wrote:
I just changed these properties to increase flushed file size (decrease number 
of compactions):

  *   memtable_allocation_type from heap_buffers to offheap_objects
  *   memtable_offheap_space_in_mb: from default (2048) to 8192
Using default value for other memtable/compaction/commitlog configurations .

After a few hours some of nodes stopped to do any mutations (dropped mutaion 
increased) and also pending flushes increased, they were just up and running 
and there was only a single CPU core with 100% usage(other cores was 0%). other 
nodes on the cluster determines the node as DN. Could not access 7199 and also 
could not create thread dump even with jstack -F.

Restarting Cassandra service fixes the problem but after a while some other 
node would be DN.

Am i missing some configurations?  What should i change in cassandra default 
configuration to maximize write throughput in single node/cluster in 
write-heavy scenario for the data model:
Data mode is a single table:
  create table test(
  text partition_key,
  text clustering_key,
  set rows,
  primary key ((partition_key, clustering_key))


vCPU: 12
Memory: 32GB
Node data size: 2TB
Apache cassandra 3.11.2
JVM heap size: 16GB, CMS, 1GB newgen


Sent using Zoho 
Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMFaQ=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc=eX5wEX3VXggU-3Y51C3hDF0XBO9xUZQZd12jW_Da8Qw=OpD4crlAdz_RSLV3zi8jcRzgpKzGjKn2up0lzd8mDlE=>





Re: Cassandra crashes when using offheap_objects for memtable_allocation_type

2020-06-02 Thread Aaron Ploetz
primary key ((partition_key, clustering_key))

Also, this primary key definition does not define a partitioning key and a
clustering key.  It defines a *composite* partition key.

If you want it to instantiate both a partition and clustering key, get rid
of one set of parens.

primary key (partition_key, clustering_key)


On Tue, Jun 2, 2020 at 1:31 AM onmstester onmstester
 wrote:

> I just changed these properties to increase flushed file size (decrease
> number of compactions):
>
>- memtable_allocation_type from heap_buffers to offheap_objects
>- memtable_offheap_space_in_mb: from default (2048) to 8192
>
> Using default value for other memtable/compaction/commitlog configurations
> .
>
> After a few hours some of nodes stopped to do any mutations (dropped
> mutaion increased) and also pending flushes increased, they were just up
> and running and there was only a single CPU core with 100% usage(other
> cores was 0%). other nodes on the cluster determines the node as DN. Could
> not access 7199 and also could not create thread dump even with jstack -F.
>
> Restarting Cassandra service fixes the problem but after a while some
> other node would be DN.
>
> Am i missing some configurations?  What should i change in cassandra
> default configuration to maximize write throughput in single node/cluster
> in write-heavy scenario for the data model:
> Data mode is a single table:
>   create table test(
>   text partition_key,
>   text clustering_key,
>   set rows,
>   primary key ((partition_key, clustering_key))
>
>
> vCPU: 12
> Memory: 32GB
> Node data size: 2TB
> Apache cassandra 3.11.2
> JVM heap size: 16GB, CMS, 1GB newgen
>
> Sent using Zoho Mail 
>
>
>
>


Re: Cassandra crashes when using offheap_objects for memtable_allocation_type

2020-06-02 Thread Aaron Ploetz
I would try running it with memtable_offheap_space_in_mb at the default for
sure, but definitely lower than 8GB.  With 32GB of RAM, you're already
allocating half of that for your heap, and then halving the remainder for
off heap memtables.  What's left may not be enough for the OS, etc.  Giving
some of that back, will allow more to be used for page cache, which always
helps.

"JVM heap size: 16GB, CMS, 1GB newgen"

For CMS GC with a 16GB heap, 1GB is way too small for new gen.  You're
going to want that to be at least 40% of the max heap size.  Some folks
here even advocate for setting Xmn as high as 50% of Xmx/s.

If you want to stick with CMS GC, take a look at
https://issues.apache.org/jira/browse/CASSANDRA-8150.  There's plenty of
good info in there on CMS GC tuning.  Make sure to read through the whole
ticket, so that you understand what each setting does.  You can't just
pick-and-choose.

Regards,

Aaron


On Tue, Jun 2, 2020 at 1:31 AM onmstester onmstester
 wrote:

> I just changed these properties to increase flushed file size (decrease
> number of compactions):
>
>- memtable_allocation_type from heap_buffers to offheap_objects
>- memtable_offheap_space_in_mb: from default (2048) to 8192
>
> Using default value for other memtable/compaction/commitlog configurations
> .
>
> After a few hours some of nodes stopped to do any mutations (dropped
> mutaion increased) and also pending flushes increased, they were just up
> and running and there was only a single CPU core with 100% usage(other
> cores was 0%). other nodes on the cluster determines the node as DN. Could
> not access 7199 and also could not create thread dump even with jstack -F.
>
> Restarting Cassandra service fixes the problem but after a while some
> other node would be DN.
>
> Am i missing some configurations?  What should i change in cassandra
> default configuration to maximize write throughput in single node/cluster
> in write-heavy scenario for the data model:
> Data mode is a single table:
>   create table test(
>   text partition_key,
>   text clustering_key,
>   set rows,
>   primary key ((partition_key, clustering_key))
>
>
> vCPU: 12
> Memory: 32GB
> Node data size: 2TB
> Apache cassandra 3.11.2
> JVM heap size: 16GB, CMS, 1GB newgen
>
> Sent using Zoho Mail 
>
>
>
>


Re: Cassandra crashes after loading data with sstableloader

2018-07-29 Thread Jeff Jirsa
What’s the cardinality of hash? 

Do they have the same schema? If so you may be able to take a snapshot and 
hardlink it in / refresh instead of sstableloader. Alternatively you could drop 
the index from the destination keyspace and add it back in after the load 
finishes.

How big are the sstables? How big is your heap? Are you already serving 
traffic? 

-- 
Jeff Jirsa


> On Jul 29, 2018, at 3:43 PM, Rahul Singh  wrote:
> 
> What does “hash” Data look like?
> 
> Rahul
>> On Jul 24, 2018, 11:30 AM -0400, Arpan Khandelwal , 
>> wrote:
>> I need to clone data from one keyspace to another keyspace.
>> We do it by taking snapshot of keyspace1 and restoring in keyspace2 using 
>> sstableloader.
>> 
>> Suppose we have following table with index on hash column. Table has around 
>> 10M rows.
>> -
>> CREATE TABLE message (
>>  id uuid,
>>  messageid uuid,
>>  parentid uuid,
>>  label text,
>>  properties map,
>>  text1 text,
>>  text2 text,
>>  text3 text,
>>  category text,
>>  hash text,
>>  info map,
>>  creationtimestamp bigint,
>>  lastupdatedtimestamp bigint,
>>  PRIMARY KEY ( (id) )
>>  );
>> 
>> CREATE  INDEX  ON message ( hash );
>> -
>> Cassandra crashes when i load data using sstableloader. Load is happening 
>> correctly but seems that cassandra crashes when its trying to build index on 
>> table with huge data.
>> 
>> I have two questions.
>> 1. Is there any better way to clone keyspace?
>> 2. How can i optimize sstableloader to load data and not crash cassandra 
>> while building index.
>> 
>> Thanks
>> Arpan


Re: Cassandra crashes after loading data with sstableloader

2018-07-29 Thread Rahul Singh
What does “hash” Data look like?

Rahul
On Jul 24, 2018, 11:30 AM -0400, Arpan Khandelwal , wrote:
> I need to clone data from one keyspace to another keyspace.
> We do it by taking snapshot of keyspace1 and restoring in keyspace2 using 
> sstableloader.
>
> Suppose we have following table with index on hash column. Table has around 
> 10M rows.
> -
> CREATE TABLE message (
>  id     uuid,
>  messageid     uuid,
>  parentid     uuid,
>  label     text,
>  properties     map,
>  text1     text,
>  text2     text,
>  text3     text,
>  category     text,
>  hash     text,
>  info     map,
>  creationtimestamp     bigint,
>  lastupdatedtimestamp     bigint,
>  PRIMARY KEY ( (id) )
>  );
>
> CREATE  INDEX  ON message ( hash );
> -
> Cassandra crashes when i load data using sstableloader. Load is happening 
> correctly but seems that cassandra crashes when its trying to build index on 
> table with huge data.
>
> I have two questions.
> 1. Is there any better way to clone keyspace?
> 2. How can i optimize sstableloader to load data and not crash cassandra 
> while building index.
>
> Thanks
> Arpan


Re: Cassandra crashes....

2017-08-22 Thread Thakrar, Jayesh
Yep, similar symptoms - but no, there's no OOM killer
Also, if you look in the gc log around the time of failure, the heap memory was 
much below the 16 GB limit.

And if I look at the 2nd last GC log before the crash, here’s what we see.
And you will notice that cleaning up the 4 GB Eden (along with the other 
cleanup = full GC) took 3.48 seconds.

Hence I reduced the New space to a tiny amount (5% = 800 GB).
With this setting, there have been no crashes so far and all STW GC pauses have 
been under 200 ms.
We borrowed this "approach" from doing something similar with HBase (where we 
had a lot of read/write requests too).
As part of the tuning, we have also reduced the tenure from default of 15 to 2.
This pushes all medium/long-living objects into old-gen very early in the 
lifecycle.
Having them around in Eden/survivor space would just result in them doing 
hop-scotch until they are done with 15 iterations.
And each of those copy from survivor 0 to survivor 1 is a STW operation, so 
having smaller new space and having shorter tenure seems to be helping so far.

  region size 8192K, 544 young (4456448K), 5 survivors (40960K)
Metaspace   used 43707K, capacity 45602K, committed 45696K, reserved 
1089536K
  class spaceused 5872K, capacity 6217K, committed 6272K, reserved 1048576K
2017-08-22T07:11:46.594+: 96808.824: [Full GC (System.gc())  
8196M->1670M(16G), 3.4842221 secs]
   [Eden: 4312.0M(4760.0M)->0.0B(4800.0M) Survivors: 40.0M->0.0B Heap: 
8196.3M(16.0G)->1670.6M(16.0G)], [Metaspace: 43707K->43488K(1089536K)]
Heap after GC invocations=20378 (full 3):
garbage-first heap   total 16777216K, used 1710741K [0x0003c000, 
0x0007c000, 0x0007c000)
  region size 8192K, 0 young (0K), 0 survivors (0K)
Metaspace   used 43488K, capacity 45128K, committed 45696K, reserved 
1089536K
  class spaceused 5798K, capacity 6098K, committed 6272K, reserved 1048576K
}
[Times: user=5.48 sys=0.54, real=3.48 secs]


From: kurt greaves <k...@instaclustr.com>
Date: Tuesday, August 22, 2017 at 5:40 PM
To: User <user@cassandra.apache.org>
Subject: Re: Cassandra crashes

sounds like Cassandra is being killed by the oom killer. can you check dmesg to 
see if this is the case? sounds a bit absurd with 256g of memory but could be a 
config problem.


Re: Cassandra crashes....

2017-08-22 Thread Thakrar, Jayesh
So the reason for the large number of prepared statements is because of the 
nature of the application.
One of the periodic job does lookup with a partial key (key prefix, not 
filtered queries) for thousands of rows.
Hence the large number of prepared statements.

Almost of the queries once executed are not needed anymore, so its ok for the 
older prepared statements to be purged.

All the same, I will do some analysis on the prepared statements table.

Thanks for the tip/pointer!

On 8/22/17, 5:17 PM, "Alain Rastoul"  wrote:

On 08/22/2017 05:39 PM, Thakrar, Jayesh wrote:
> Surbhi and Fay,
>
> I agree we have plenty of RAM to spare.
>

Hi

At the very beginning of system.log there is a
INFO  [CompactionExecutor:487] 2017-08-21 23:21:01,684 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB
who comes from BufferPool exhaustion (several messages)
 From the source
file_cache_size_in_mb
 (Default: Smaller of 1/4 heap or 512) Total memory to use for 
SSTable-reading buffers.

So here in your configuration it is 512M, may be you should set it to a 
higher value in your cassandra.yaml (1/4 => 4G) ?
(also see https://issues.apache.org/jira/browse/CASSANDRA-11681, the 
default value may not be accurate)

Another strange thing is the number of prepared statements which also 
gives errors: lot of messages like
WARN  [ScheduledTasks:1] 2017-08-22 07:09:25,009 QueryProcessor.java:105 
- 1 prepared statements discarded in the last minute because cache limit 
reached (64 MB)
...
on startup you see:
INFO  [main] 2017-08-22 12:50:13,787 QueryProcessor.java:162 - Preloaded 
13357 prepared statements

13K different prepared statements sounds a lot...
an issue about that seems to be fixed in 3.11 
https://issues.apache.org/jira/browse/CASSANDRA-13641
May be youc should truncate your system.prepared_statements and restart 
your node


HTH


-- 
best,
Alain




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Cassandra crashes....

2017-08-22 Thread kurt greaves
sounds like Cassandra is being killed by the oom killer. can you check
dmesg to see if this is the case? sounds a bit absurd with 256g of memory
but could be a config problem.


Re: Cassandra crashes....

2017-08-22 Thread Alain Rastoul

On 08/22/2017 05:39 PM, Thakrar, Jayesh wrote:

Surbhi and Fay,

I agree we have plenty of RAM to spare.



Hi

At the very beginning of system.log there is a
INFO  [CompactionExecutor:487] 2017-08-21 23:21:01,684 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

who comes from BufferPool exhaustion (several messages)
From the source
file_cache_size_in_mb
(Default: Smaller of 1/4 heap or 512) Total memory to use for 
SSTable-reading buffers.


So here in your configuration it is 512M, may be you should set it to a 
higher value in your cassandra.yaml (1/4 => 4G) ?
(also see https://issues.apache.org/jira/browse/CASSANDRA-11681, the 
default value may not be accurate)


Another strange thing is the number of prepared statements which also 
gives errors: lot of messages like
WARN  [ScheduledTasks:1] 2017-08-22 07:09:25,009 QueryProcessor.java:105 
- 1 prepared statements discarded in the last minute because cache limit 
reached (64 MB)

...
on startup you see:
INFO  [main] 2017-08-22 12:50:13,787 QueryProcessor.java:162 - Preloaded 
13357 prepared statements


13K different prepared statements sounds a lot...
an issue about that seems to be fixed in 3.11 
https://issues.apache.org/jira/browse/CASSANDRA-13641
May be youc should truncate your system.prepared_statements and restart 
your node



HTH


--
best,
Alain

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra crashes....

2017-08-22 Thread Thakrar, Jayesh
We are using TWCS compaction.

Here's one sample table

CREATE TABLE ae.raw_logs_by_user (
dtm_id bigint,
company_id int,
source text,
status_id int,
log_date bigint,
uuid_least bigint,
uuid_most bigint,
profile_system_id int,
parent_message_id int,
parent_template_id int,
record text,
PRIMARY KEY (dtm_id, company_id, source, status_id, log_date, uuid_least, 
uuid_most, profile_system_id)
) WITH CLUSTERING ORDER BY (company_id ASC, source ASC, status_id ASC, log_date 
DESC, uuid_least ASC, uuid_most ASC, profile_system_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 
'max_threshold': '4', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';



From: "Fay Hou [Storage Service] ­" <fay...@coupang.com>
Date: Tuesday, August 22, 2017 at 10:52 AM
To: "Thakrar, Jayesh" <jthak...@conversantmedia.com>
Cc: "user@cassandra.apache.org" <user@cassandra.apache.org>, Surbhi Gupta 
<surbhi.gupt...@gmail.com>
Subject: Re: Cassandra crashes

what kind compaction? LCS ?

On Aug 22, 2017 8:39 AM, "Thakrar, Jayesh" 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote:

Surbhi and Fay,



I agree we have plenty of RAM to spare.

However, our data load and compaction churn is so high (partially thanks to 
SSDs!), its causing too much GC pressure.

And as you know the Edenspace and survivor space cleanup is a STW - hence 
larger heap will increase the gc pauses.



As for "what happens" during the crash - nothing.

It seems that the daemon just dies silently.



If you are interested, attached are the Cassandra system.log and the detailed 
gc log files.



system.log = Cassandra log (see line 424 - it’s the last line before the crash)



cassandra-gc.log.8.currrent = last gc log at the time of crash

Cassandra-gc.log.0 = gc log after startup



If you want compare the "gc pauses" grep the gc files for the word "stopped"

(e.g. grep stopped cassandra-gc.log.*)



Thanks for the quick replies!



Jayesh





From: Surbhi Gupta <surbhi.gupt...@gmail.com<mailto:surbhi.gupt...@gmail.com>>
Date: Tuesday, August 22, 2017 at 10:19 AM
To: "Thakrar, Jayesh" 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>>, 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra crashes



16GB heap is too small for G1GC . Try at least 32GB of heap size

On Tue, Aug 22, 2017 at 7:58 AM Fay Hou [Storage Service] ­ 
<fay...@coupang.com<mailto:fay...@coupang.com>> wrote:

What errors do you see?

16gb of 256 GB . Heap is too small. I would give heap at least 160gb.





On Aug 22, 2017 7:42 AM, "Thakrar, Jayesh" 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote:



















Hi All,







We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping the 
user group for their experiences.







Our usage profile is  batch jobs that load millions of rows to Cassandra every 
hour.



And there are similar period batch jobs that read millions of rows and do some 
processing, outputting the result to HDFS (no issues with HDFS).







We often seen Cassandra daemons crash.



Key points of our environment are:



Pretty good servers: 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB SSD 
drive



Compaction: TWCS compaction with 7 day windows as the data retention period is 
limited - about 120 days.



JDK: Java 1.8.0.71 and G1 GC



Heap Size: 16 GB



Large SSTables: 50 GB to 300+ GB






We see the daemons crash after some back-to-back long GCs (1.5 to 3.5 seconds).



Note that we had set the target for GC pauses to be 200 ms







We have been somewhat able to tame the crashes by updating the TWCS compaction 
properties



to have min/max compaction sstables = 4 and by drastically reducing the size of 
the New/Eden space (to 5% of heap space = 800 MB).



Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.



Since the servers have more than sufficient resources, we are not seeing any 
noticeable performance impact.







Is this kind of tuning normal/expected?







Thanks,



Jayesh















Re: Cassandra crashes....

2017-08-22 Thread Fay Hou [Storage Service] ­
what kind compaction? LCS ?

On Aug 22, 2017 8:39 AM, "Thakrar, Jayesh" <jthak...@conversantmedia.com>
wrote:

Surbhi and Fay,



I agree we have plenty of RAM to spare.

However, our data load and compaction churn is so high (partially thanks to
SSDs!), its causing too much GC pressure.

And as you know the Edenspace and survivor space cleanup is a STW - hence
larger heap will increase the gc pauses.



As for "what happens" during the crash - nothing.

It seems that the daemon just dies silently.



If you are interested, attached are the Cassandra system.log and the
detailed gc log files.



system.log = Cassandra log (see line 424 - it’s the last line before the
crash)



cassandra-gc.log.8.currrent = last gc log at the time of crash

Cassandra-gc.log.0 = gc log after startup



If you want compare the "gc pauses" grep the gc files for the word "stopped"

(e.g. grep stopped cassandra-gc.log.*)



Thanks for the quick replies!



Jayesh





*From: *Surbhi Gupta <surbhi.gupt...@gmail.com>
*Date: *Tuesday, August 22, 2017 at 10:19 AM
*To: *"Thakrar, Jayesh" <jthak...@conversantmedia.com>, "
user@cassandra.apache.org" <user@cassandra.apache.org>
*Subject: *Re: Cassandra crashes



16GB heap is too small for G1GC . Try at least 32GB of heap size

On Tue, Aug 22, 2017 at 7:58 AM Fay Hou [Storage Service] ­ <
fay...@coupang.com> wrote:

What errors do you see?

16gb of 256 GB . Heap is too small. I would give heap at least 160gb.





On Aug 22, 2017 7:42 AM, "Thakrar, Jayesh" <jthak...@conversantmedia.com>
wrote:




















Hi All,







We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping the
user group for their experiences.







Our usage profile is  batch jobs that load millions of rows to Cassandra
every hour.



And there are similar period batch jobs that read millions of rows and do
some processing, outputting the result to HDFS (no issues with HDFS).







We often seen Cassandra daemons crash.



Key points of our environment are:



*Pretty good servers:* 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB
SSD drive



*Compaction:* TWCS compaction with 7 day windows as the data retention
period is limited - about 120 days.



*JDK: *Java 1.8.0.71 and G1 GC



*Heap Size:* 16 GB



*Large SSTables:* 50 GB to 300+ GB






We see the daemons crash after some back-to-back long GCs (1.5 to 3.5
seconds).



Note that we had set the target for GC pauses to be 200 ms







We have been somewhat able to tame the crashes by updating the TWCS
compaction properties



to have min/max compaction sstables = 4 and by drastically reducing the
size of the New/Eden space (to 5% of heap space = 800 MB).



Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.



Since the servers have more than sufficient resources, we are not seeing
any noticeable performance impact.







Is this kind of tuning normal/expected?







Thanks,



Jayesh


Re: Cassandra crashes....

2017-08-22 Thread Surbhi Gupta
16GB heap is too small for G1GC . Try at least 32GB of heap size
On Tue, Aug 22, 2017 at 7:58 AM Fay Hou [Storage Service] ­ <
fay...@coupang.com> wrote:

> What errors do you see?
> 16gb of 256 GB . Heap is too small. I would give heap at least 160gb.
>
>
> On Aug 22, 2017 7:42 AM, "Thakrar, Jayesh" 
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hi All,
>
>
>
>
>
> We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping
> the user group for their experiences.
>
>
>
>
>
> Our usage profile is  batch jobs that load millions of rows to Cassandra
> every hour.
>
>
> And there are similar period batch jobs that read millions of rows and do
> some processing, outputting the result to HDFS (no issues with HDFS).
>
>
>
>
>
> We often seen Cassandra daemons crash.
>
>
> Key points of our environment are:
>
>
> *Pretty good servers:* 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB
> SSD drive
>
>
> *Compaction:* TWCS compaction with 7 day windows as the data retention
> period is limited - about 120 days.
>
>
> *JDK: *Java 1.8.0.71 and G1 GC
>
>
>
> *Heap Size:* 16 GB
>
>
> *Large SSTables:* 50 GB to 300+ GB
>
>
>
>
>
>
>
> We see the daemons crash after some back-to-back long GCs (1.5 to 3.5
> seconds).
>
>
> Note that we had set the target for GC pauses to be 200 ms
>
>
>
>
>
> We have been somewhat able to tame the crashes by updating the TWCS
> compaction properties
>
>
>
> to have min/max compaction sstables = 4 and by drastically reducing the
> size of the New/Eden space (to 5% of heap space = 800 MB).
>
>
> Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.
>
>
> Since the servers have more than sufficient resources, we are not seeing
> any noticeable performance impact.
>
>
>
>
>
> Is this kind of tuning normal/expected?
>
>
>
>
>
> Thanks,
>
>
> Jayesh
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Cassandra crashes....

2017-08-22 Thread Fay Hou [Storage Service] ­
What errors do you see?
16gb of 256 GB . Heap is too small. I would give heap at least 160gb.

On Aug 22, 2017 7:42 AM, "Thakrar, Jayesh" 
wrote:

Hi All,



We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping the
user group for their experiences.



Our usage profile is  batch jobs that load millions of rows to Cassandra
every hour.

And there are similar period batch jobs that read millions of rows and do
some processing, outputting the result to HDFS (no issues with HDFS).



We often seen Cassandra daemons crash.

Key points of our environment are:

*Pretty good servers:* 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB
SSD drive

*Compaction:* TWCS compaction with 7 day windows as the data retention
period is limited - about 120 days.

*JDK: *Java 1.8.0.71 and G1 GC

*Heap Size:* 16 GB

*Large SSTables:* 50 GB to 300+ GB

We see the daemons crash after some back-to-back long GCs (1.5 to 3.5
seconds).

Note that we had set the target for GC pauses to be 200 ms



We have been somewhat able to tame the crashes by updating the TWCS
compaction properties

to have min/max compaction sstables = 4 and by drastically reducing the
size of the New/Eden space (to 5% of heap space = 800 MB).

Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.

Since the servers have more than sufficient resources, we are not seeing
any noticeable performance impact.



Is this kind of tuning normal/expected?



Thanks,

Jayesh


Re: Cassandra crashes....

2017-08-22 Thread Jeff Jirsa
You typically don't want to set the eden space when you're using G1

-- 
Jeff Jirsa


> On Aug 22, 2017, at 7:42 AM, Thakrar, Jayesh  
> wrote:
> 
> Hi All,
>  
> We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping the 
> user group for their experiences.
>  
> Our usage profile is  batch jobs that load millions of rows to Cassandra 
> every hour.
> And there are similar period batch jobs that read millions of rows and do 
> some processing, outputting the result to HDFS (no issues with HDFS).
>  
> We often seen Cassandra daemons crash.
> Key points of our environment are:
> Pretty good servers: 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB SSD 
> drive
> Compaction: TWCS compaction with 7 day windows as the data retention period 
> is limited - about 120 days.
> JDK: Java 1.8.0.71 and G1 GC
> Heap Size: 16 GB
> Large SSTables: 50 GB to 300+ GB
> 
> We see the daemons crash after some back-to-back long GCs (1.5 to 3.5 
> seconds).
> Note that we had set the target for GC pauses to be 200 ms
>  
> We have been somewhat able to tame the crashes by updating the TWCS 
> compaction properties
> to have min/max compaction sstables = 4 and by drastically reducing the size 
> of the New/Eden space (to 5% of heap space = 800 MB).
> Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.
> Since the servers have more than sufficient resources, we are not seeing any 
> noticeable performance impact.
>  
> Is this kind of tuning normal/expected?
>  
> Thanks,
> Jayesh
>  


Re: Cassandra crashes daily; nothing on the log

2015-06-08 Thread Bryan Holladay
It could be the linux kernel killing Cassandra b/c of memory usage. When
this happens, nothing is logged in Cassandra. Check the system
logs: /var/log/messages  Look for a message saying Out of Memory... kill
process...

On Mon, Jun 8, 2015 at 1:37 PM, Paulo Motta pauloricard...@gmail.com
wrote:

 try checking your system logs (generally /var/log/syslog) to check if the
 cassandra process was killed by the OS oom-killer

 2015-06-06 15:39 GMT-03:00 Brian Sam-Bodden bsbod...@integrallis.com:

 Berk,
1 GB is not enough to run C*, the minimum memory we use on Digital
 Ocean is 4GB.

 Cheers,
 Brian
 http://integrallis.com

 On Sat, Jun 6, 2015 at 10:50 AM, graffit...@yahoo.com wrote:

 Hi all,

 I've installed Cassandra on a test server hosted on Digital Ocean. The
 server has 1GB RAM, and is running a single docker container alongside C*.
 Somehow, every night, the Cassandra instance crashes. The annoying part is
 that I cannot see anything wrong with the log files, so I can't tell what's
 going on.

 The log files are here:
 http://pastebin.com/Zquu5wvd

 Do you have any idea what's going on? Can you suggest some ways I can
 try to troubleshoot this?

 Thanks!
  Berk




 --
 Cheers,
 Brian
 http://www.integrallis.com





Re: Cassandra crashes daily; nothing on the log

2015-06-08 Thread Paulo Motta
try checking your system logs (generally /var/log/syslog) to check if the
cassandra process was killed by the OS oom-killer

2015-06-06 15:39 GMT-03:00 Brian Sam-Bodden bsbod...@integrallis.com:

 Berk,
1 GB is not enough to run C*, the minimum memory we use on Digital
 Ocean is 4GB.

 Cheers,
 Brian
 http://integrallis.com

 On Sat, Jun 6, 2015 at 10:50 AM, graffit...@yahoo.com wrote:

 Hi all,

 I've installed Cassandra on a test server hosted on Digital Ocean. The
 server has 1GB RAM, and is running a single docker container alongside C*.
 Somehow, every night, the Cassandra instance crashes. The annoying part is
 that I cannot see anything wrong with the log files, so I can't tell what's
 going on.

 The log files are here:
 http://pastebin.com/Zquu5wvd

 Do you have any idea what's going on? Can you suggest some ways I can try
 to troubleshoot this?

 Thanks!
  Berk




 --
 Cheers,
 Brian
 http://www.integrallis.com



Re: Cassandra crashes daily; nothing on the log

2015-06-06 Thread Brian Sam-Bodden
Berk,
   1 GB is not enough to run C*, the minimum memory we use on Digital Ocean
is 4GB.

Cheers,
Brian
http://integrallis.com

On Sat, Jun 6, 2015 at 10:50 AM, graffit...@yahoo.com wrote:

 Hi all,

 I've installed Cassandra on a test server hosted on Digital Ocean. The
 server has 1GB RAM, and is running a single docker container alongside C*.
 Somehow, every night, the Cassandra instance crashes. The annoying part is
 that I cannot see anything wrong with the log files, so I can't tell what's
 going on.

 The log files are here:
 http://pastebin.com/Zquu5wvd

 Do you have any idea what's going on? Can you suggest some ways I can try
 to troubleshoot this?

 Thanks!
  Berk




-- 
Cheers,
Brian
http://www.integrallis.com


Re: Cassandra crashes

2013-09-09 Thread John Sanda
Check your file limits -
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

On Friday, September 6, 2013, Jan Algermissen wrote:


 On 06.09.2013, at 13:12, Alex Major al3...@gmail.com javascript:;
 wrote:

  Have you changed the appropriate config settings so that Cassandra will
 run with only 2GB RAM? You shouldn't find the nodes go down.
 
  Check out this blog post
 http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/,
  it outlines the configuration settings needed to run Cassandra on 64MB
 RAM and might give you some insights.

 Yes, I have my fingers on the knobs and have also seen the article you
 mention - very helpful indeed. As well as the replies so far. Thanks very
 much.

 However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my
 data import :-(

 Now, while it would be easy to scale out and up a bit until the default
 config of C* is sufficient, I really like to dive deep and try to
 understand why the thing is still going down, IOW, which of my config
 settings is so darn wrong that in most cases kill -9 remains the only way
 to shutdown the Java process in the end.


 The problem seems to be the heap size (set to MAX_HEAP_SIZE=640M   and
 HEAP_NEWSIZE=120M ) in combination with some cassandra activity that
 demands too much heap, right?

 So how do I find out what activity this is and how do I sufficiently
 reduce that activity.

 What bugs me in general is that AFAIU C* is so eager at giving massive
 write speed, that it sort of forgets to protect itself from client demand.
 I would very much like to understand why and how that happens.  I mean: no
 matter how many clients are flooding the database, it should not die due to
 out of memory situations, regardless of any configuration specifics, or?


 tl;dr

 Currently my client side (with java-driver) after a while reports more and
 more timeouts and then the following exception:

 com.datastax.driver.core.ex
 ceptions.DriverInternalError: An unexpected error occured server side:
 java.lang.OutOfMemoryError: unable
 to create new native thread ;

 On the server side, my cluster remains more or less in this condition:

 DN  x 71,33 MB   256 34,1%
  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
 UN  x  189,38 MB  256 32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f
  rack1
 UN  x198,49 MB  256 33,9%
  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1

 The host that is down (it is the seed host, if that matters) still shows
 the running java process, but I cannot shut down cassandra or connect with
 nodetool, hence kill -9 to the rescue.

 In that host, I still see a load of around 1.

 jstack -F lists 892 threads, all blocked, except for 5 inactive ones.


 The system.log after a few seconds of import shows the following exception:

 java.lang.AssertionError: incorrect row data size 771030 written to
 /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db;
 correct is 771200
 at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
 at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)


 And then, after about 2 minutes there are out of memory errors:

  ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java
 (line 192) Exception in thread Thread[CompactionExecutor
 :5,1,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:693)
 at
 org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.init(ParallelCompactionIterable.java:296)
 at
 org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
 at
 

Re: Cassandra crashes

2013-09-09 Thread Jan Algermissen
Hi John,


On 10.09.2013, at 01:06, John Sanda john.sa...@gmail.com wrote:

 Check your file limits - 
 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

Did that already - without success.

Meanwhile I upgraded servers and I am getting closer.

I assume by now that heavy writes of rows with considerable size (as in: more 
than a couple of numbers) require a certain amount of RAM due to the C* 
architecture.

IOW, my through put limit is how fast I can get it to disk, but the minimal 
memory I need for that cannot be tuned down but depends on the size of the 
stuff written to C*. (Due to C* doing its memtable magic) to save using 
sequential IO.

It is an interesting trade off. (if I get it right by now :-)

Jan

 
 On Friday, September 6, 2013, Jan Algermissen wrote:
 
 On 06.09.2013, at 13:12, Alex Major al3...@gmail.com wrote:
 
  Have you changed the appropriate config settings so that Cassandra will run 
  with only 2GB RAM? You shouldn't find the nodes go down.
 
  Check out this blog post 
  http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
   , it outlines the configuration settings needed to run Cassandra on 64MB 
  RAM and might give you some insights.
 
 Yes, I have my fingers on the knobs and have also seen the article you 
 mention - very helpful indeed. As well as the replies so far. Thanks very 
 much.
 
 However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my 
 data import :-(
 
 Now, while it would be easy to scale out and up a bit until the default 
 config of C* is sufficient, I really like to dive deep and try to understand 
 why the thing is still going down, IOW, which of my config settings is so 
 darn wrong that in most cases kill -9 remains the only way to shutdown the 
 Java process in the end.
 
 
 The problem seems to be the heap size (set to MAX_HEAP_SIZE=640M   and 
 HEAP_NEWSIZE=120M ) in combination with some cassandra activity that 
 demands too much heap, right?
 
 So how do I find out what activity this is and how do I sufficiently reduce 
 that activity.
 
 What bugs me in general is that AFAIU C* is so eager at giving massive write 
 speed, that it sort of forgets to protect itself from client demand. I would 
 very much like to understand why and how that happens.  I mean: no matter how 
 many clients are flooding the database, it should not die due to out of 
 memory situations, regardless of any configuration specifics, or?
 
 
 tl;dr
 
 Currently my client side (with java-driver) after a while reports more and 
 more timeouts and then the following exception:
 
 com.datastax.driver.core.ex
 ceptions.DriverInternalError: An unexpected error occured server side: 
 java.lang.OutOfMemoryError: unable
 to create new native thread ;
 
 On the server side, my cluster remains more or less in this condition:
 
 DN  x 71,33 MB   256 34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
 rack1
 UN  x  189,38 MB  256 32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  
 rack1
 UN  x198,49 MB  256 33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
 rack1
 
 The host that is down (it is the seed host, if that matters) still shows the 
 running java process, but I cannot shut down cassandra or connect with 
 nodetool, hence kill -9 to the rescue.
 
 In that host, I still see a load of around 1.
 
 jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
 
 
 The system.log after a few seconds of import shows the following exception:
 
 java.lang.AssertionError: incorrect row data size 771030 written to 
 /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
 correct is 771200
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 
 
 And then, 

Re: Cassandra crashes - solved

2013-09-08 Thread Jan Algermissen

On 06.09.2013, at 17:07, Jan Algermissen jan.algermis...@nordsc.com wrote:

 
 On 06.09.2013, at 13:12, Alex Major al3...@gmail.com wrote:
 
 Have you changed the appropriate config settings so that Cassandra will run 
 with only 2GB RAM? You shouldn't find the nodes go down.
 
 Check out this blog post 
 http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
  , it outlines the configuration settings needed to run Cassandra on 64MB 
 RAM and might give you some insights.
 
 Yes, I have my fingers on the knobs and have also seen the article you 
 mention - very helpful indeed. As well as the replies so far. Thanks very 
 much.
 
 However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my 
 data import :-(

The problem for me was

  in_memory_compaction_limit_in_mb: 1

it seems that the combination of my rather large rows (70 cols each) in 
combination with the slower two-pass compaction process mentioned in the 
comment of the config switch caused the java.lang.AssertionError: incorrect 
row data size exceptions.

After turning in_memory_compaction_limit_in_mb back to 64 all I am getting are 
write tmeouts.

AFAIU that is fine because now C* is stable and i all have is a capacity 
problem solvable with more nodes or more RAM (maybe, depends on whether IO is 
an issue).

Jan



 
 Now, while it would be easy to scale out and up a bit until the default 
 config of C* is sufficient, I really like to dive deep and try to understand 
 why the thing is still going down, IOW, which of my config settings is so 
 darn wrong that in most cases kill -9 remains the only way to shutdown the 
 Java process in the end.
 
 
 The problem seems to be the heap size (set to MAX_HEAP_SIZE=640M   and 
 HEAP_NEWSIZE=120M ) in combination with some cassandra activity that 
 demands too much heap, right?
 
 So how do I find out what activity this is and how do I sufficiently reduce 
 that activity.
 
 What bugs me in general is that AFAIU C* is so eager at giving massive write 
 speed, that it sort of forgets to protect itself from client demand. I would 
 very much like to understand why and how that happens.  I mean: no matter how 
 many clients are flooding the database, it should not die due to out of 
 memory situations, regardless of any configuration specifics, or?
 
 
 tl;dr
 
 Currently my client side (with java-driver) after a while reports more and 
 more timeouts and then the following exception:
 
 com.datastax.driver.core.ex
 ceptions.DriverInternalError: An unexpected error occured server side: 
 java.lang.OutOfMemoryError: unable 
 to create new native thread ;
 
 On the server side, my cluster remains more or less in this condition:
 
 DN  x 71,33 MB   256 34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
 rack1
 UN  x  189,38 MB  256 32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  
 rack1
 UN  x198,49 MB  256 33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
 rack1
 
 The host that is down (it is the seed host, if that matters) still shows the 
 running java process, but I cannot shut down cassandra or connect with 
 nodetool, hence kill -9 to the rescue.
 
 In that host, I still see a load of around 1.
 
 jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
 
 
 The system.log after a few seconds of import shows the following exception:
 
 java.lang.AssertionError: incorrect row data size 771030 written to 
 /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
 correct is 771200
at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
 
 
 And then, after about 2 minutes there are out of memory errors:
 
 ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java 
 (line 192) Exception in thread Thread[CompactionExecutor
 :5,1,main]
 java.lang.OutOfMemoryError: unable to create new 

Re: Cassandra crashes

2013-09-06 Thread Alex Major
Have you changed the appropriate config settings so that Cassandra will run
with only 2GB RAM? You shouldn't find the nodes go down.

Check out this blog post
http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/,
it outlines the configuration settings needed to run Cassandra on 64MB
RAM and might give you some insights.


On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen
jan.algermis...@nordsc.comwrote:

 Hi,

 I have set up C* in a very limited environment: 3 VMs at digitalocean with
 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.

 Keyspace uses replication level of 2.

 I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small
 texts, 300.000 wide rows effektively) in a quite 'agressive' way, using
 java-driver and async update statements.

 After a while of importing data, I start seeing timeouts reported by the
 driver:

 com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
 timeout during write query at consistency ONE (1 replica were required but
 only 0 acknowledged the write

 and then later, host-unavailability exceptions:

 com.datastax.driver.core.exceptions.UnavailableException: Not enough
 replica available for query at consistency ONE (1 required but only 0
 alive).

 Looking at the 3 hosts, I see two C*s went down - which explains that I
 still see some writes succeeding (that must be the one host left,
 satisfying the consitency level ONE).


 The logs tell me AFAIU that the servers shutdown due to reaching the heap
 size limit.

 I am irritated by the fact that the instances (it seems) shut themselves
 down instead of limiting their amount of work. I understand that I need to
 tweak the configuration and likely get more RAM, but still, I would
 actually be satisfied with reduced service (and likely more timeouts in the
 client).  Right now it looks as if I would have to slow down the client
 'artificially'  to prevent the loss of hosts - does that make sense?

 Can anyone explain whether this is intended behavior, meaning I'll just
 have to accept the self-shutdown of the hosts? Or alternatively, what data
 I should collect to investigate the cause further?

 Jan








Re: Cassandra crashes

2013-09-06 Thread Jan Algermissen

On 06.09.2013, at 13:12, Alex Major al3...@gmail.com wrote:

 Have you changed the appropriate config settings so that Cassandra will run 
 with only 2GB RAM? You shouldn't find the nodes go down.
 
 Check out this blog post 
 http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
  , it outlines the configuration settings needed to run Cassandra on 64MB RAM 
 and might give you some insights.

Yes, I have my fingers on the knobs and have also seen the article you mention 
- very helpful indeed. As well as the replies so far. Thanks very much.

However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data 
import :-(

Now, while it would be easy to scale out and up a bit until the default config 
of C* is sufficient, I really like to dive deep and try to understand why the 
thing is still going down, IOW, which of my config settings is so darn wrong 
that in most cases kill -9 remains the only way to shutdown the Java process in 
the end.


The problem seems to be the heap size (set to MAX_HEAP_SIZE=640M   and 
HEAP_NEWSIZE=120M ) in combination with some cassandra activity that demands 
too much heap, right?

So how do I find out what activity this is and how do I sufficiently reduce 
that activity.

What bugs me in general is that AFAIU C* is so eager at giving massive write 
speed, that it sort of forgets to protect itself from client demand. I would 
very much like to understand why and how that happens.  I mean: no matter how 
many clients are flooding the database, it should not die due to out of memory 
situations, regardless of any configuration specifics, or?


tl;dr

Currently my client side (with java-driver) after a while reports more and more 
timeouts and then the following exception:

com.datastax.driver.core.ex
ceptions.DriverInternalError: An unexpected error occured server side: 
java.lang.OutOfMemoryError: unable 
to create new native thread ;

On the server side, my cluster remains more or less in this condition:

DN  x 71,33 MB   256 34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
rack1
UN  x  189,38 MB  256 32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  rack1
UN  x198,49 MB  256 33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
rack1

The host that is down (it is the seed host, if that matters) still shows the 
running java process, but I cannot shut down cassandra or connect with 
nodetool, hence kill -9 to the rescue.

In that host, I still see a load of around 1.

jstack -F lists 892 threads, all blocked, except for 5 inactive ones.


The system.log after a few seconds of import shows the following exception:

java.lang.AssertionError: incorrect row data size 771030 written to 
/var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
correct is 771200
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


And then, after about 2 minutes there are out of memory errors:

 ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java 
(line 192) Exception in thread Thread[CompactionExecutor
:5,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:693)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.init(ParallelCompactionIterable.java:296)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 

Re: Cassandra crashes after reboot

2010-07-26 Thread Peter Schuller
 I'm sorry for the lack of information
 I'm using 0.6.3.
 The move was moving the data dir and the commitlog dir
 But i now removed them and let the system bootstrap from the ring.
 i know i'm lacking in information here.. but i thought i needed to be
 mentioned overhere this could happen.

Do you happen to still have the original data (the commit log) on the
old system?

-- 
/ Peter Schuller


Re: Cassandra crashes after reboot

2010-07-25 Thread Peter Schuller
 I've moved my cassandra to another machine, started it up again, but got
 this error

Which version of Cassandra exactly? (So that one can look at matching
source code)

Also, were you running the exact same version of Cassandra on both
servers (i.e., both the source and the destination)?

Was the source node completely turned off before you began copying
files to the destination node? (Though even if this were not the
case checksumming should prevent this particular problem from
happening, at least based on current trunk's CommitLog.)

The closest thing I found by quick googling, in terms of reported
cassandra bugs, was this:

   https://issues.apache.org/jira/browse/CASSANDRA-370

But presumably you're using a newer version than 0.4?

-- 
/ Peter Schuller


Re: Cassandra crashes after reboot

2010-07-25 Thread Pieter Maes
 Hi,

I'm sorry for the lack of information
I'm using 0.6.3.
The move was moving the data dir and the commitlog dir
But i now removed them and let the system bootstrap from the ring.
i know i'm lacking in information here.. but i thought i needed to be
mentioned overhere this could happen.

Pieter Maes

Op 26/07/10 00:21, Peter Schuller schreef:
 I've moved my cassandra to another machine, started it up again, but got
 this error
 Which version of Cassandra exactly? (So that one can look at matching
 source code)

 Also, were you running the exact same version of Cassandra on both
 servers (i.e., both the source and the destination)?

 Was the source node completely turned off before you began copying
 files to the destination node? (Though even if this were not the
 case checksumming should prevent this particular problem from
 happening, at least based on current trunk's CommitLog.)

 The closest thing I found by quick googling, in terms of reported
 cassandra bugs, was this:

https://issues.apache.org/jira/browse/CASSANDRA-370

 But presumably you're using a newer version than 0.4?