[ https://issues.apache.org/jira/browse/CASSANDRA-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283834#comment-14283834 ]
Marcus Eriksson commented on CASSANDRA-8641: -------------------------------------------- and the flushing is likely due to being memory constrained after repairs > Repair causes a large number of tiny SSTables > --------------------------------------------- > > Key: CASSANDRA-8641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8641 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04 > Reporter: Flavien Charlon > Fix For: 2.1.3 > > > I have a 3 nodes cluster with RF = 3, quad core and 32 GB or RAM. I am > running 2.1.2 with all the default settings. I'm seeing some strange > behaviors during incremental repair (under write load). > Taking the example of one particular column family, before running an > incremental repair, I have about 13 SSTables. After finishing the incremental > repair, I have over 114000 SSTables. > {noformat} > Table: customers > SSTable count: 114688 > Space used (live): 97203707290 > Space used (total): 99175455072 > Space used by snapshots (total): 0 > SSTable Compression Ratio: 0.28281112416526505 > Memtable cell count: 0 > Memtable data size: 0 > Memtable switch count: 1069 > Local read count: 0 > Local read latency: NaN ms > Local write count: 11548705 > Local write latency: 0.030 ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.00000 > Bloom filter space used: 144145152 > Compacted partition minimum bytes: 311 > Compacted partition maximum bytes: 1996099046 > Compacted partition mean bytes: 3419 > Average live cells per slice (last five minutes): 0.0 > Maximum live cells per slice (last five minutes): 0.0 > Average tombstones per slice (last five minutes): 0.0 > Maximum tombstones per slice (last five minutes): 0.0 > {noformat} > Looking at the logs during the repair, it seems Cassandra is struggling to > compact minuscule memtables (often just a few kilobytes): > {noformat} > INFO [CompactionExecutor:337] 2015-01-17 01:44:27,011 > CompactionTask.java:251 - Compacted 32 sstables to > [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-228341,]. > 8,332 bytes to 6,547 (~78% of original) in 80,476ms = 0.000078MB/s. 32 > total partitions merged to 32. Partition merge counts were {1:32, } > INFO [CompactionExecutor:337] 2015-01-17 01:45:35,519 > CompactionTask.java:251 - Compacted 32 sstables to > [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229348,]. > 8,384 bytes to 6,563 (~78% of original) in 6,880ms = 0.000910MB/s. 32 > total partitions merged to 32. Partition merge counts were {1:32, } > INFO [CompactionExecutor:339] 2015-01-17 01:47:46,475 > CompactionTask.java:251 - Compacted 32 sstables to > [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229351,]. > 8,423 bytes to 6,401 (~75% of original) in 10,416ms = 0.000586MB/s. 32 > total partitions merged to 32. Partition merge counts were {1:32, } > {noformat} > > Here is an excerpt of the system logs showing the abnormal flushing: > {noformat} > INFO [AntiEntropyStage:1] 2015-01-17 15:28:43,807 ColumnFamilyStore.java:840 > - Enqueuing flush of customers: 634484 (0%) on-heap, 2599489 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:06,823 ColumnFamilyStore.java:840 > - Enqueuing flush of levels: 129504 (0%) on-heap, 222168 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:07,940 ColumnFamilyStore.java:840 > - Enqueuing flush of chain: 4508 (0%) on-heap, 6880 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:08,124 ColumnFamilyStore.java:840 > - Enqueuing flush of invoices: 1469772 (0%) on-heap, 2542675 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:09,471 ColumnFamilyStore.java:840 > - Enqueuing flush of customers: 809844 (0%) on-heap, 3364728 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:24,368 ColumnFamilyStore.java:840 > - Enqueuing flush of levels: 28212 (0%) on-heap, 44220 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:24,822 ColumnFamilyStore.java:840 > - Enqueuing flush of chain: 860 (0%) on-heap, 1130 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:24,985 ColumnFamilyStore.java:840 > - Enqueuing flush of invoices: 334480 (0%) on-heap, 568959 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:27,375 ColumnFamilyStore.java:840 > - Enqueuing flush of customers: 221568 (0%) on-heap, 929962 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:35,755 ColumnFamilyStore.java:840 > - Enqueuing flush of invoices: 7916 (0%) on-heap, 11080 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:36,239 ColumnFamilyStore.java:840 > - Enqueuing flush of customers: 9968 (0%) on-heap, 33041 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:37,935 ColumnFamilyStore.java:840 > - Enqueuing flush of invoices: 42108 (0%) on-heap, 69494 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:41,182 ColumnFamilyStore.java:840 > - Enqueuing flush of customers: 40936 (0%) on-heap, 159099 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:49,573 ColumnFamilyStore.java:840 > - Enqueuing flush of levels: 17236 (0%) on-heap, 27048 (0%) off-heap > INFO [AntiEntropyStage:1] 2015-01-17 15:29:50,440 ColumnFamilyStore.java:840 > - Enqueuing flush of chain: 548 (0%) on-heap, 630 (0%) off-heap > {noformat} > At the end of the repair, the cluster has become unusable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)