[ 
https://issues.apache.org/jira/browse/CASSANDRA-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635158#comment-14635158
 ] 

Łukasz Mrożkiewicz commented on CASSANDRA-9798:
-----------------------------------------------

Thanks Benedict for support,
I don't understand how those threads could consume all CPU when top shows more 
than 90% idle? 

> Cassandra seems to have deadlocks during flush operations
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9798
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9798
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 4x HP Gen9 dl 360 servers
> 2x8 cpu each (Intel(R) Xeon E5-2667 v3 @ 3.20GHz)
> 6x900GB 10kRPM disk for data
> 1x900GB 10kRPM disk for commitlog
> 64GB ram
> ETH: 10Gb/s
> Red Hat Enterprise Linux Server release 6.6 (Santiago) 2.6.32-504.el6.x86_64
> java build 1.8.0_45-b14 (openjdk) (tested on oracle java 8 too)
>            Reporter: Łukasz Mrożkiewicz
>             Fix For: 2.1.x
>
>         Attachments: cassandra.2.1.8.log, cassandra.log, cassandra.yaml, 
> cassandra.yaml, gc.log.0.current, stack.txt, topHbn1.txt
>
>
> Hi,
> We noticed some problem with dropped mutationstages. Usually on one random 
> node there is a situation that:
> MutationStage "active" is full, "pending" is increasing  "completed" is 
> stalled.
> MemtableFlushWriter "active" 6, pending: 25 completed: stalled 
> MemtablePostFlush "active" is 1, pending 29 completed: stalled
> after a some time (30s-10min) pending mutations are dropped and everything is 
> working.
> When it happened:
> 1. Cpu idle is ~95%
> 2. no gc long pauses or more activity.
> 3. memory usage 3.5GB form 8GB
> 4. only writes is processed by cassandra
> 5. when LOAD > 400GB/node problems appeared 
> 6. cassandra 2.1.6
> There is gap in logs:
> {code}
> INFO  08:47:01 Timed out replaying hints to /192.168.100.83; aborting (0 
> delivered)
> INFO  08:47:01 Enqueuing flush of hints: 7870567 (0%) on-heap, 0 (0%) off-heap
> INFO  08:47:30 Enqueuing flush of table1: 95301807 (4%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:31 Enqueuing flush of table1: 60462632 (3%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:31 Enqueuing flush of table2: 76973746 (4%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:31 Enqueuing flush of table1: 84290135 (4%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:32 Enqueuing flush of table3: 56926652 (3%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:32 Enqueuing flush of table1: 85124218 (4%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:33 Enqueuing flush of table2: 95663415 (4%) on-heap, 0 (0%) 
> off-heap
> INFO  08:47:58 CompactionManager                 2        39
> INFO  08:47:58 Writing Memtable-table2@1767938721(13843064 serialized bytes, 
> 162359 ops, 4%/0% of on/off-heap l
> imit)
> INFO  08:47:58 Writing Memtable-hints@1433125911(478703 serialized bytes, 424 
> ops, 0%/0% of on/off-heap limit)
> INFO  08:47:58 Writing Memtable-table2@1318583275(11783615 serialized bytes, 
> 137378 ops, 4%/0% of on/off-heap l
> imit)
> INFO  08:47:58 Enqueuing flush of compactions_in_progress: 969 (0%) on-heap, 
> 0 (0%) off-heap
> INFO  08:47:58 Writing Memtable-table1@541175113(17221327 serialized bytes, 
> 180792 ops, 4%/0% of on/off-heap
>  limit)
> INFO  08:47:58 Writing Memtable-table1@1361154669(27138519 serialized bytes, 
> 273472 ops, 6%/0% of on/off-hea
> p limit)
> INFO  08:48:03 2176 MUTATION messages dropped in last 5000ms
> {code}
> use case:
> 100% write - 100Mb/s, couples of CF ~10column each. max cell size 100B
> CMS and G1GC tested - no difference



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to