Moving to the user list. Aaron
On 20 Apr 2011, at 21:25, Shotaro Kamio wrote: > Hi, > > I found that our cluster repeats compacting a single file forever > (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd > like to have comments from you guys. > > Situation: > - After trying to repair a column family, our cluster's disk usage is > quite high. Cassandra cannot compact all sstables at once. I think it > repeats compacting single file at the end. (you can check the attached > log below) > - Our data doesn't have deletes. So, the compaction of single file > doesn't make free disk space. > > We are approaching to full-disk. But I believe that the repair > operation made a lot of duplicate data on the disk and it requires > compaction. However, most of nodes stuck on compacting a single file. > The only thing we can do is to restart the nodes. > > My question is why the compaction doesn't stop. > > I looked at the logic in CompactionManager.java: > ----------------- > String compactionFileLocation = > table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables)); > // If the compaction file path is null that means we have no > space left for this compaction. > // try again w/o the largest one. > List<SSTableReader> smallerSSTables = new > ArrayList<SSTableReader>(sstables); > while (compactionFileLocation == null && smallerSSTables.size() > 1) > { > logger.warn("insufficient space to compact all requested > files " + StringUtils.join(smallerSSTables, ", ")); > smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables)); > compactionFileLocation = > table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables)); > } > if (compactionFileLocation == null) > { > logger.error("insufficient space to compact even the two > smallest files, aborting"); > return 0; > } > ----------------- > > The while condition: smallerSSTables.size() > 1 > Is this should be "smallerSSTables.size() > 2" ? > > In my understanding, compaction of single file makes free disk space > only when the sstable has a lot of tombstone and only if the tombstone > is removed in the compaction. If cassandra knows the sstable has > tombstones to be removed, it's worth to compact it. Otherwise, it > might makes free space if you are lucky. In worst case, it leads to > infinite loop like our case. > > What do you think the code change? > > > Best regards, > Shotaro > > > * Cassandra compaction log > ------------------------- > WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader( > path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3035-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 9,855,385ms. > > WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path='foobar-f-3020-Data.db'), > SSTableReader(path='foobar-f-3035-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3036-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 9,809,882ms. > > WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path='foobar-f-3020-Data.db'), > SSTableReader(path='foobar-f-3036-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3037-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 10,087,424ms. > ------------------------- > You can see that compacted size is always the same. It repeats > compacting the same single sstable.