[ 
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110798#comment-15110798
 ] 

Jack Krupansky commented on CASSANDRA-10937:
--------------------------------------------

A few more questions:

1. When nodes do crash, what happens when you restart them? Do they immediately 
crash again immediately or run for many hours?
2. Is it just a single node crashing or do like all the nodes fail around the 
same time, like falling dominoes?

Just to be clear, the fact that the cluster seemed fine for 48 hours does not 
tell us whether it might have been near the edge of failing for quite some time 
and maybe the precise pattern of load just statistically became the straw that 
broke the camel's back at that moment. That's why it's important to know what 
happened after you restarted and resumed the test after the crash as 48 hours.

It it really was a resource leak, then reducing the heap would make the failure 
occur sooner. Determine what the minimal heap size is to run the test at all - 
set it low enough so the test won't run even for a minute, then increase the 
heap so it does run, then decrease it by less than you increased it - a binary 
search for the exact heap size that is needed for the test to run even for a 
few minutes or an hour. At least then you would have an easy to reproduce test 
case. So if you can tune the heap so that the test can run successfully for say 
10 minutes before reliably hitting the OOM, then you can see how much you need 
to reduce the load (throttling the app) to be able to run without hitting OOM.

I'm not saying that there is absolutely no chance that there is a resource 
leak, just simply that there are still a lot of open questions to answer about 
usage before we can leap to that conclusion. Ultimately, we do have to have a 
reliable repo test case before anything can be done.

In any case, at least at this stage it seems clear that you probably do need a 
much larger cluster (more nodes with less load on each node.) Yes, it's 
unfortunate the Cassandra won't give you a nice clean message that says that, 
but that ultimate requirement remains unchanged - pending answers to all of the 
open questions.


> OOM on multiple nodes on write load (v. 3.0.0), problem also present on 
> DSE-4.8.3, but there it survives more time
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 
> (mockbu...@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each 
> physical host keeps 4 guests.
> Physical host parameters(shared by all 4 guests):
> Model: HP ProLiant DL380 Gen9
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> 46 logical processors.
> Hyperthreading - enabled
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log (NOT SSD)
> 1 disk 4T for data (NOT SSD)
> 11 CPU cores
> Disks are local, not shared.
> Memory on each host -  24 Gb total.
> 8 (or 6, tested both) Gb - cassandra heap
> (lshw and cpuinfo attached in file test2.rar)
>            Reporter: Peter Kovgan
>            Priority: Critical
>         Attachments: cassandra-to-jack-krupansky.docx, gc-stat.txt, 
> more-logs.rar, some-heap-stats.rar, test2.rar, test3.rar, test4.rar, 
> test5.rar, test_2.1.rar, test_2.1_logs_older.rar, 
> test_2.1_restart_attempt_log.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each 
> running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = 
> {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, except one is BLOB of 256 bytes.
> After about a 2-3 hours of work, I was forced to increase timeout from 2000 
> to 5000ms, for some requests failed for short timeout.
> Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
> (all failed nodes logs attached)
> I attach also java load client and instructions how set-up and use 
> it.(test2.rar)
> Update:
> Later on test repeated with lesser load (100000 mes/sec) with more relaxed 
> CPU (idle 25%), with only 2 test clients, but anyway test failed.
> Update:
> DSE-4.8.3 also failed on OOM (3 nodes from 8), but here it survived 48 hours, 
> not 10-12.
> Attachments:
> test2.rar -contains most of material
> more-logs.rar - contains additional nodes logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to