[ https://issues.apache.org/jira/browse/CASSANDRA-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868309#comment-13868309 ]
Chris Burroughs commented on CASSANDRA-6568: -------------------------------------------- CASSANDRA-6503 looks promising but I'm not sure it's the whole story. The sstable with id 402383 (the oldest one and it could not be user compacted) was created by a cleanup. {noformat} INFO [CompactionExecutor:88] 2013-11-25 19:46:56,706 CompactionManager.java (line 677) Cleaned up to /data/sstables/data/urlapi_v2/cf/ks-cf-tmp-ic-402383-Data.db. 1,500,391,202 to 1,481,333,401 (~98% of original) bytes for 4,988,394 keys. Time: 1,108,381ms. {noformat} So while 402383 was *sent* by repair it was created locally. I could be totally off base but I don't think repair creates temporary sstables on the nodes that are being streamed *from*. > sstables incorrectly getting marked as "not live" > ------------------------------------------------- > > Key: CASSANDRA-6568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6568 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 1.2.12 with several 1.2.13 patches > Reporter: Chris Burroughs > > {noformat} > -rw-rw-r-- 14 cassandra cassandra 1.4G Nov 25 19:46 > /data/sstables/data/ks/cf/ks-cf-ic-402383-Data.db > -rw-rw-r-- 14 cassandra cassandra 13G Nov 26 00:04 > /data/sstables/data/ks/cf/ks-cf-ic-402430-Data.db > -rw-rw-r-- 14 cassandra cassandra 13G Nov 26 05:03 > /data/sstables/data/ks/cf/ks-cf-ic-405231-Data.db > -rw-rw-r-- 31 cassandra cassandra 21G Nov 26 08:38 > /data/sstables/data/ks/cf/ks-cf-ic-405232-Data.db > -rw-rw-r-- 2 cassandra cassandra 2.6G Dec 3 13:44 > /data/sstables/data/ks/cf/ks-cf-ic-434662-Data.db > -rw-rw-r-- 14 cassandra cassandra 1.5G Dec 5 09:05 > /data/sstables/data/ks/cf/ks-cf-ic-438698-Data.db > -rw-rw-r-- 2 cassandra cassandra 3.1G Dec 6 12:10 > /data/sstables/data/ks/cf/ks-cf-ic-440983-Data.db > -rw-rw-r-- 2 cassandra cassandra 96M Dec 8 01:52 > /data/sstables/data/ks/cf/ks-cf-ic-444041-Data.db > -rw-rw-r-- 2 cassandra cassandra 3.3G Dec 9 16:37 > /data/sstables/data/ks/cf/ks-cf-ic-451116-Data.db > -rw-rw-r-- 2 cassandra cassandra 876M Dec 10 11:23 > /data/sstables/data/ks/cf/ks-cf-ic-453552-Data.db > -rw-rw-r-- 2 cassandra cassandra 891M Dec 11 03:21 > /data/sstables/data/ks/cf/ks-cf-ic-454518-Data.db > -rw-rw-r-- 2 cassandra cassandra 102M Dec 11 12:27 > /data/sstables/data/ks/cf/ks-cf-ic-455429-Data.db > -rw-rw-r-- 2 cassandra cassandra 906M Dec 11 23:54 > /data/sstables/data/ks/cf/ks-cf-ic-455533-Data.db > -rw-rw-r-- 1 cassandra cassandra 214M Dec 12 05:02 > /data/sstables/data/ks/cf/ks-cf-ic-456426-Data.db > -rw-rw-r-- 1 cassandra cassandra 203M Dec 12 10:49 > /data/sstables/data/ks/cf/ks-cf-ic-456879-Data.db > -rw-rw-r-- 1 cassandra cassandra 49M Dec 12 12:03 > /data/sstables/data/ks/cf/ks-cf-ic-456963-Data.db > -rw-rw-r-- 18 cassandra cassandra 20G Dec 25 01:09 > /data/sstables/data/ks/cf/ks-cf-ic-507770-Data.db > -rw-rw-r-- 3 cassandra cassandra 12G Jan 8 04:22 > /data/sstables/data/ks/cf/ks-cf-ic-567100-Data.db > -rw-rw-r-- 3 cassandra cassandra 957M Jan 8 22:51 > /data/sstables/data/ks/cf/ks-cf-ic-569015-Data.db > -rw-rw-r-- 2 cassandra cassandra 923M Jan 9 17:04 > /data/sstables/data/ks/cf/ks-cf-ic-571303-Data.db > -rw-rw-r-- 1 cassandra cassandra 821M Jan 10 08:20 > /data/sstables/data/ks/cf/ks-cf-ic-574642-Data.db > -rw-rw-r-- 1 cassandra cassandra 18M Jan 10 08:48 > /data/sstables/data/ks/cf/ks-cf-ic-574723-Data.db > {noformat} > I tried to do a user defined compaction on sstables from November and got "it > is not an active sstable". Live sstable count from jmx was about 7 while on > disk there were over 20. Live vs total size showed about a ~50 GiB > difference. > Forcing a gc from jconsole had no effect. However, restarting the node > resulted in live sstables/bytes *increasing* to match what was on disk. User > compaction could now compact the November sstables. This cluster was last > restarted in mid December. > I'm not sure what affect "not live" had on other operations of the cluster. > From the logs it seems that the files were sent at least at some point as > part of repair, but I don't know if they were being being used for read > requests or not. Because the problem that got me looking in the first place > was poor performance I suspect they were used for reads (and the reads were > slow because so many sstables were being read). I presume based on their age > at the least they were being excluded from compaction. > I'm not aware of any isLive() or getRefCount() to problematically confirm > which nodes have this problem. In this cluster almost all columns have a 14 > day TTL, based on the number of nodes with November sstables it appears to be > occurring on a significant fraction of the nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)