[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419006#comment-13419006 ] Sylvain Lebresne commented on CASSANDRA-3855: - Agreed that it is wrong, but I think that it's more than the first line that is wrong. I think that method should be: {noformat} public boolean hasIrrelevantData(int gcBefore) { if (deletionInfo().isLive()) return false; // Do we have gcable deletion infos? if (!deletionInfo().purge(gcbefore).equals(deletionInfo())) return true; // Do we have colums that are either deleted by the container or gcable tombstone? for (IColumn column : columns) if (deletionInfo().isDeleteted(column) || column.hasIrrelevantData(gcBefore)) return true; return false; } {noformat} RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Attachments: with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419177#comment-13419177 ] Jonathan Ellis commented on CASSANDRA-3855: --- We definitely don't want if row is live, nothing to do here behavior, otherwise we'll never purge column-level tombstones without a full row deletion. RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Attachments: with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419183#comment-13419183 ] Jonathan Ellis commented on CASSANDRA-3855: --- +1 for proposed method w/ first 2 lines removed RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Attachments: with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419677#comment-13419677 ] Hudson commented on CASSANDRA-3855: --- Integrated in Cassandra #1734 (See [https://builds.apache.org/job/Cassandra/1734/]) fix incorrect hasIrrelevantData result for live CF; patch by yukim, reviewed by jbellis/slebresne for CASSANDRA-3855 (Revision d74103735126658d64cb92a16f4bb40f63d5e2e8) Result = ABORTED yukim : Files : * src/java/org/apache/cassandra/db/AbstractColumnContainer.java RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Fix For: 1.2 Attachments: 3855.txt, with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418759#comment-13418759 ] Yuki Morishita commented on CASSANDRA-3855: --- I ran cpu profile on trunk and 1.1 with LCS and about 1000 sstables. On 1.1 branch, there is no indication of dominating removeDeletedAndOldShards. But for trunk, I noticed that it seemed unnecessary CompactionController#shouldPurge is called inside removeDeletedAndOldShards, where shouldPurge is supposed to be called only when CF has tombstones. So I looked up the code and I'm not sure if this line(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java#L201) is correct. If CF is live, returning false for hasIrrelevantData seems right. Sylvain, what do you think? RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Attachments: with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418775#comment-13418775 ] Jonathan Ellis commented on CASSANDRA-3855: --- That's definitely wrong... I think it should be {{if (info != LIVE) return false}} RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Attachments: with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts
[ https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295818#comment-13295818 ] Sylvain Lebresne commented on CASSANDRA-3855: - I'll precise that I try to do a quick test to see if I could reproduce back in the days but wasn't really able to reproduce something similar to the attached hprof log. I didn't wait up to 100,000,000 keys though. RemoveDeleted dominates compaction time for large sstable counts Key: CASSANDRA-3855 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Reporter: Stu Hood Assignee: Yuki Morishita Labels: compaction, deletes, leveled Attachments: with-cleaning-java.hprof.txt With very large numbers of sstables (2000+ generated by a `bin/stress -n 100,000,000` run with LeveledCompactionStrategy), PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such that commenting it out takes compaction throughput from 200KB/s to 12MB/s. Stack attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira