[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-07-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419006#comment-13419006
 ] 

Sylvain Lebresne commented on CASSANDRA-3855:
-

Agreed that it is wrong, but I think that it's more than the first line that is 
wrong. I think that method should be:
{noformat}
public boolean hasIrrelevantData(int gcBefore)
{
if (deletionInfo().isLive())
return false;

// Do we have gcable deletion infos?
if (!deletionInfo().purge(gcbefore).equals(deletionInfo()))
return true;

// Do we have colums that are either deleted by the container or gcable 
tombstone?
for (IColumn column : columns)
if (deletionInfo().isDeleteted(column) || 
column.hasIrrelevantData(gcBefore))
return true;

return false;
}
{noformat}

 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Attachments: with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-07-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419177#comment-13419177
 ] 

Jonathan Ellis commented on CASSANDRA-3855:
---

We definitely don't want if row is live, nothing to do here behavior, 
otherwise we'll never purge column-level tombstones without a full row deletion.

 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Attachments: with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-07-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419183#comment-13419183
 ] 

Jonathan Ellis commented on CASSANDRA-3855:
---

+1 for proposed method w/ first 2 lines removed

 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Attachments: with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419677#comment-13419677
 ] 

Hudson commented on CASSANDRA-3855:
---

Integrated in Cassandra #1734 (See 
[https://builds.apache.org/job/Cassandra/1734/])
fix incorrect hasIrrelevantData result for live CF; patch by yukim, 
reviewed by jbellis/slebresne for CASSANDRA-3855 (Revision 
d74103735126658d64cb92a16f4bb40f63d5e2e8)

 Result = ABORTED
yukim : 
Files : 
* src/java/org/apache/cassandra/db/AbstractColumnContainer.java


 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Fix For: 1.2

 Attachments: 3855.txt, with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-07-19 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418759#comment-13418759
 ] 

Yuki Morishita commented on CASSANDRA-3855:
---

I ran cpu profile on trunk and 1.1 with LCS and about 1000 sstables. On 1.1 
branch, there is no indication of dominating removeDeletedAndOldShards. But for 
trunk, I noticed that it seemed unnecessary CompactionController#shouldPurge is 
called inside removeDeletedAndOldShards, where shouldPurge is supposed to be 
called only when CF has tombstones. So I looked up the code and I'm not sure if 
this 
line(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java#L201)
 is correct. If CF is live, returning false for hasIrrelevantData seems right. 
Sylvain, what do you think?

 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Attachments: with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-07-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418775#comment-13418775
 ] 

Jonathan Ellis commented on CASSANDRA-3855:
---

That's definitely wrong...  I think it should be {{if (info != LIVE) return 
false}}

 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Attachments: with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3855) RemoveDeleted dominates compaction time for large sstable counts

2012-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295818#comment-13295818
 ] 

Sylvain Lebresne commented on CASSANDRA-3855:
-

I'll precise that I try to do a quick test to see if I could reproduce back in 
the days but wasn't really able to reproduce something similar to the attached 
hprof log. I didn't wait up to 100,000,000 keys though.

 RemoveDeleted dominates compaction time for large sstable counts
 

 Key: CASSANDRA-3855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3855
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Stu Hood
Assignee: Yuki Morishita
  Labels: compaction, deletes, leveled
 Attachments: with-cleaning-java.hprof.txt


 With very large numbers of sstables (2000+ generated by a `bin/stress -n 
 100,000,000` run with LeveledCompactionStrategy), 
 PrecompactedRow.removeDeletedAndOldShards dominates compaction runtime, such 
 that commenting it out takes compaction throughput from 200KB/s to 12MB/s.
 Stack attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira