Moving to the user list.
Aaron
On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
> Hi,
>
> I found that our cluster repeats compacting a single file forever
> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
> like to have comments from you guys.
>
> Situation:
> - After trying to repair a column family, our cluster's disk usage is
> quite high. Cassandra cannot compact all sstables at once. I think it
> repeats compacting single file at the end. (you can check the attached
> log below)
> - Our data doesn't have deletes. So, the compaction of single file
> doesn't make free disk space.
>
> We are approaching to full-disk. But I believe that the repair
> operation made a lot of duplicate data on the disk and it requires
> compaction. However, most of nodes stuck on compacting a single file.
> The only thing we can do is to restart the nodes.
>
> My question is why the compaction doesn't stop.
>
> I looked at the logic in CompactionManager.java:
> -----------------
> String compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
> // If the compaction file path is null that means we have no
> space left for this compaction.
> // try again w/o the largest one.
> List<SSTableReader> smallerSSTables = new
> ArrayList<SSTableReader>(sstables);
> while (compactionFileLocation == null && smallerSSTables.size() > 1)
> {
> logger.warn("insufficient space to compact all requested
> files " + StringUtils.join(smallerSSTables, ", "));
> smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
> compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
> }
> if (compactionFileLocation == null)
> {
> logger.error("insufficient space to compact even the two
> smallest files, aborting");
> return 0;
> }
> -----------------
>
> The while condition: smallerSSTables.size() > 1
> Is this should be "smallerSSTables.size() > 2" ?
>
> In my understanding, compaction of single file makes free disk space
> only when the sstable has a lot of tombstone and only if the tombstone
> is removed in the compaction. If cassandra knows the sstable has
> tombstones to be removed, it's worth to compact it. Otherwise, it
> might makes free space if you are lucky. In worst case, it leads to
> infinite loop like our case.
>
> What do you think the code change?
>
>
> Best regards,
> Shotaro
>
>
> * Cassandra compaction log
> -------------------------
> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(
> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3035-Data.db. 260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys. Time: 9,855,385ms.
>
> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3035-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3036-Data.db. 260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys. Time: 9,809,882ms.
>
> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3036-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3037-Data.db. 260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys. Time: 10,087,424ms.
> -------------------------
> You can see that compacted size is always the same. It repeats
> compacting the same single sstable.