[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433739#comment-17433739 ] Brandon Williams commented on CASSANDRA-14160: -- I believe it is still valid, [~subkanthi] > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Josh Snyder >Priority: Normal > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433379#comment-17433379 ] Kanthi Subramanian commented on CASSANDRA-14160: [~e.dimitrova], [~brandon.williams] is this issue still valid, would be a good experience to work on SSTables. > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Josh Snyder >Priority: Normal > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391755#comment-17391755 ] Ekaterina Dimitrova commented on CASSANDRA-14160: - I just had a quick check in ASF Slack with [~josnyder], he confirmed he won't be able to work on this and he will be happy if someone else take over on this work. I will un-assign him based on this conversation. > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder >Priority: Normal > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380169#comment-17380169 ] Ekaterina Dimitrova commented on CASSANDRA-14160: - Hi [~josnyder], [~marcuse], [~jjirsa], I am trying to look at tickets which need a reviewer and I saw this one. Was there any further discussion? > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder >Priority: Normal > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441124#comment-16441124 ] Jeff Jirsa commented on CASSANDRA-14160: cc [~cnlwsu] as well (we've seen some meaningful perf improvements with NEVER_PURGE_TOMBSTONES=true, tagging Chris in case he remembers if the benefit was in the purge evaluator or getFullyExpiredSSTables() ) > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder >Priority: Major > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440885#comment-16440885 ] Marcus Eriksson commented on CASSANDRA-14160: - hey [~josnyder] the code looks good to me, but, as [~jjirsa] mentioned above, a unit test that makes sure the sstables are returned in the correct order should be added. You also mentioned a that you had not yet benchmarked the change, have you done that? If not, that would also be nice. > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder >Priority: Major > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434939#comment-16434939 ] Jeff Jirsa commented on CASSANDRA-14160: Re-pushed [here|https://github.com/jeffjirsa/cassandra/commits/14160] , tests running [here|https://circleci.com/gh/jeffjirsa/cassandra/tree/14160] (unit tests + dtests) > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder >Priority: Major > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392018#comment-16392018 ] Jeff Jirsa commented on CASSANDRA-14160: [~krummas] can you review? > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder >Priority: Major > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325397#comment-16325397 ] Josh Snyder commented on CASSANDRA-14160: - I believe I've now fixed the issues. Take a look at: https://github.com/hashbrowncipher/cassandra/tree/cassandra-14160 https://circleci.com/gh/hashbrowncipher/cassandra/2 > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324826#comment-16324826 ] Jeff Jirsa commented on CASSANDRA-14160: Hey [~josnyder], the code looks sane, but it also doesn't compile (or rather, the existing unit tests for the {{OverlapIterator}} no longer compile). Any chance you can address that? > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14160) maxPurgeableTimestamp should traverse tables in order of minTimestamp
[ https://issues.apache.org/jira/browse/CASSANDRA-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324804#comment-16324804 ] Jeff Jirsa commented on CASSANDRA-14160: Pushed the patch to one of my github branches, running CI [here|https://circleci.com/gh/jeffjirsa/cassandra/tree/cassandra-14160] > maxPurgeableTimestamp should traverse tables in order of minTimestamp > - > > Key: CASSANDRA-14160 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14160 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Josh Snyder >Assignee: Josh Snyder > Labels: performance > Fix For: 4.x > > > In maxPurgeableTimestamp, we iterate over the bloom filters of each > overlapping SSTable. Of the bloom filter hits, we take the SSTable with the > lowest minTimestamp. If we kept the SSTables in sorted order of minTimestamp, > then we could short-circuit the operation at the first bloom filter hit, > reducing cache pressure (or worse, I/O) and CPU time. > I've written (but not yet benchmarked) [some > code|https://github.com/hashbrowncipher/cassandra/commit/29859a4a2e617f6775be49448858bc59fdafab44] > to demonstrate this possibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org