Re: Cassandra 2017 Wrapup
Thanks Jeff for the very comprehensive list of actions taken this year. Can't wait to put my hands on 4.0 once it's released On Fri, Dec 22, 2017 at 10:20 PM, Jeff Jirsawrote: > Happy holidays all, > > I imagine most people are about to disappear to celebrate holidays, so I > wanted to try to summarize the state of Cassandra dev for 2017, as I see > it. Standard disclaimers apply (this is my personal opinion, not that of my > employer, not officially endorsed by the Apache Cassandra PMC, or the ASF). > > Some quick stats about Cassandra development efforts in 2017 (using > imperfect git log | awk/sed counting, only looking at trunk, buyer beware, > it's probably off by a few): > > The first commit of 2017 was: Ben Manes, transforming the on-heap cache to > Caffeine ( > https://github.com/apache/cassandra/commit/c607d76413be81a0e125c5780e068d > 7ab7594612 > ) > Alex Petrov removed the most code (~7500 lines, according to github) > Benjamin Lerer added the most code (~8000 lines, according to github) > We put to bed the tick/tock release cycle, but still cut 14 different > releases across 5 different branches. > We had a total of 136 different contributors, with 48 of those contributors > contributing more than one patch during the year. > We had a total of 47 different reviewers > There were 661 non-merge commits to trunk > There were 56 non-merge commits to docs/ > We end the year with roughly 173 pending changes for 4.0 > We resolved (either fixed or disqualified) 781 issues in JIRA > I count something like 273 email threads to dev@, and 903 email threads to > user@ > The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex > Petrov, Blake Eggleston, and Philip Thompson as committers. > The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the > Apache Cassandra PMC > > At NGCC (which Eric and Gary managed to organize with the help of > Instaclustr sponsoring, an achievement in itself), we had people talk > about: > - Two different talks (from Apple and FB/Instagram). I'm struggling to > describe these in simple terms, they both sorta involving using hints and > changing some of the consistency concepts to help deal with latency / > durability / availability, especially in cross-DC workloads. Grouping these > together isn't really fair, but no one-email summary is going to be fair to > either of these talks. If you missed NGCC, I guess you get to wait for the > JIRAs / patches. > - A new storage engine (FB/Instagram) using RocksDB > - Some notes on using CDC at scale (and some proposed changes to make it > easier) from Uber ( > https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf ) > - Michael Shuler (Datastax / Cassandra PMC / release master / etc) spent > some time talking about testing and CI. > > Some other big'ish development efforts worth mentioning (from personal > memory, perhaps the worst possible way to create such a list): > - We spent a fair amount of time talking about testing. Francois @ > Instagram lead the way in codifying a new set of principles around testing > and quality ( > https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486 > cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E > / https://issues.apache.org/jira/browse/CASSANDRA-13497 ). > - We've also spent some time making tests work in CircleCI, which should > make life much easier for occasional contributors - no need to figure out > how to run tests in ASF Jenkins. > - The internode messaging rewrite to use async/netty is probably the single > largest that comes to mind. It went in earlier this year, and should make > it easier to have HUGE clusters. All of you running thousand instance > clusters will probably benefit from this patch (I know you're out there, > I've talked to you in IRC) - will be in 4.0 ( > https://issues.apache.org/jira/browse/CASSANDRA-8457 ) > - We have a company working on making Cassandra happy with proprietary > flash storage and PPC64LE (IBM's recent patches, > https://developer.ibm.com/linuxonpower/2017/03/31/using- > capi-improve-performance-apache-cassandra-work-progress-update/ > ) > - We have a new commitlog mode added for the first time in quite some time > - the GroupCommitLog will be in 4.0 ( > https://issues.apache.org/jira/browse/CASSANDRA-13530 ) > - Michael Kjellman spent some time porting dtests from nose to pytest, and > from python 2.7 to python 3, removing dependencies on dead projects like > pycassa and the old thrift-cql library. Still needs to be reviewed ( > https://issues.apache.org/jira/browse/CASSANDRA-14134 ) > - Robert Stupp spent some time porting to java9 - again, still need to be > reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-9608 ) > > Overall, the state of the project appears to be strong. We're seeing active > contributions driven primarily by users (like you), the 8099/3.0 engine is > looking pretty good here in December, and the code base is stabilizing > towards
Cassandra 2017 Wrapup
Happy holidays all, I imagine most people are about to disappear to celebrate holidays, so I wanted to try to summarize the state of Cassandra dev for 2017, as I see it. Standard disclaimers apply (this is my personal opinion, not that of my employer, not officially endorsed by the Apache Cassandra PMC, or the ASF). Some quick stats about Cassandra development efforts in 2017 (using imperfect git log | awk/sed counting, only looking at trunk, buyer beware, it's probably off by a few): The first commit of 2017 was: Ben Manes, transforming the on-heap cache to Caffeine ( https://github.com/apache/cassandra/commit/c607d76413be81a0e125c5780e068d7ab7594612 ) Alex Petrov removed the most code (~7500 lines, according to github) Benjamin Lerer added the most code (~8000 lines, according to github) We put to bed the tick/tock release cycle, but still cut 14 different releases across 5 different branches. We had a total of 136 different contributors, with 48 of those contributors contributing more than one patch during the year. We had a total of 47 different reviewers There were 661 non-merge commits to trunk There were 56 non-merge commits to docs/ We end the year with roughly 173 pending changes for 4.0 We resolved (either fixed or disqualified) 781 issues in JIRA I count something like 273 email threads to dev@, and 903 email threads to user@ The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex Petrov, Blake Eggleston, and Philip Thompson as committers. The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the Apache Cassandra PMC At NGCC (which Eric and Gary managed to organize with the help of Instaclustr sponsoring, an achievement in itself), we had people talk about: - Two different talks (from Apple and FB/Instagram). I'm struggling to describe these in simple terms, they both sorta involving using hints and changing some of the consistency concepts to help deal with latency / durability / availability, especially in cross-DC workloads. Grouping these together isn't really fair, but no one-email summary is going to be fair to either of these talks. If you missed NGCC, I guess you get to wait for the JIRAs / patches. - A new storage engine (FB/Instagram) using RocksDB - Some notes on using CDC at scale (and some proposed changes to make it easier) from Uber ( https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf ) - Michael Shuler (Datastax / Cassandra PMC / release master / etc) spent some time talking about testing and CI. Some other big'ish development efforts worth mentioning (from personal memory, perhaps the worst possible way to create such a list): - We spent a fair amount of time talking about testing. Francois @ Instagram lead the way in codifying a new set of principles around testing and quality ( https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E / https://issues.apache.org/jira/browse/CASSANDRA-13497 ). - We've also spent some time making tests work in CircleCI, which should make life much easier for occasional contributors - no need to figure out how to run tests in ASF Jenkins. - The internode messaging rewrite to use async/netty is probably the single largest that comes to mind. It went in earlier this year, and should make it easier to have HUGE clusters. All of you running thousand instance clusters will probably benefit from this patch (I know you're out there, I've talked to you in IRC) - will be in 4.0 ( https://issues.apache.org/jira/browse/CASSANDRA-8457 ) - We have a company working on making Cassandra happy with proprietary flash storage and PPC64LE (IBM's recent patches, https://developer.ibm.com/linuxonpower/2017/03/31/using-capi-improve-performance-apache-cassandra-work-progress-update/ ) - We have a new commitlog mode added for the first time in quite some time - the GroupCommitLog will be in 4.0 ( https://issues.apache.org/jira/browse/CASSANDRA-13530 ) - Michael Kjellman spent some time porting dtests from nose to pytest, and from python 2.7 to python 3, removing dependencies on dead projects like pycassa and the old thrift-cql library. Still needs to be reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-14134 ) - Robert Stupp spent some time porting to java9 - again, still need to be reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-9608 ) Overall, the state of the project appears to be strong. We're seeing active contributions driven primarily by users (like you), the 8099/3.0 engine is looking pretty good here in December, and the code base is stabilizing towards a product all of us should be happy to run in production. Despite some irrationally skeptical sky-is-falling threads near the end of 2016, I feel confident in saying it was a pretty good year for Cassandra, and as the project continues to move forward, I'm looking forward to seeing 4.0 launch in 2018 (hopefully with a real user conference!) - Jeff
Re: CASSANDRA-8527
Hi folks, thanks for the feedback so far. @Jeff, there are two distinct cases here : 1. The range tombstones created on a partial primary key (that doesn't include the last column of the PK for example) : a single tombstone can shadow many rows 2. The range tombstones created on the full PK : a single tombstone can shadow a single row only In the first case, the range tombstones are (almost) correctly counted (after merge since that happens really early in the code). We cannot know how many shadowed rows/cells were read because they get merged early with the tombstones. In the second case (full PK delete), there is no tombstone counted and we only get a count of the live rows after merge. I'll illustrate this with the example below : CREATE TABLE users.test (id int, clust1 text, clust2 text, val1 text, val2 text, PRIMARY KEY(id, clust1, clust2)); insert into users.test(id , clust1, clust2 , val1 , val2) values(1,'c1','cc1', 'v1','v2'); insert into users.test(id , clust1, clust2 , val1 , val2) values(1,'c1','cc2', 'v1','v2'); insert into users.test(id , clust1, clust2 , val1 , val2) values(1,'c1','cc3', 'v1','v2'); insert into users.test(id , clust1, clust2 , val1 , val2) values(1,'c2','cc1', 'v1','v2'); insert into users.test(id , clust1, clust2 , val1 , val2) values(1,'c2','cc2', 'v1','v2'); insert into users.test(id , clust1, clust2 , val1 , val2) values(1,'c2','cc3', 'v1','v2'); cqlsh> select * from users.test ... ; id | clust1 | clust2 | val1 | val2 +++--+-- 1 | c1 |cc1 | v1 | v2 1 | c1 |cc2 | v1 | v2 1 | c1 |cc3 | v1 | v2 1 | c2 |cc1 | v1 | v2 1 | c2 |cc2 | v1 | v2 1 | c2 |cc3 | v1 | v2 Tracing session: 4c9804c0-e73a-11e7-931f-517010c60cf9 activity | timestamp | source| source_elapsed | client --++---++--- ... Read 6 live rows, 0 deleted rows and 0 tombstone cells [ReadStage-1] | 2017-12-22 18:12:55.954000 | 127.0.0.1 | 5567 | 127.0.0.2 Then I issue a range tombstone on id = 1 and clust1 = 'c1' : cqlsh> delete from users.test where id = 1 and clust1 = 'c1'; cqlsh> select * from users.test; id | clust1 | clust2 | val1 | val2 +++--+-- 1 | c2 |cc1 | v1 | v2 1 | c2 |cc2 | v1 | v2 1 | c2 |cc3 | v1 | v2 Tracing session: 8597e320-e73b-11e7-931f-517010c60cf9 activity | timestamp | source| source_elapsed | client ---++---++--- ... Read 3 live rows, 0 deleted rows and 2 tombstone cells [ReadStage-1] | 2017-12-22 18:14:08.855000 | 127.0.0.1 | 2878 | 127.0.0.2 Each range tombstone apparently counts for 2 tombstone cells (probably due the the start bound marker and the end bound marker ?). Then if I just delete a single row : cqlsh> delete from users.test where id = 1 and clust1 = 'c2' and clust2 = 'cc1'; cqlsh> select * from users.test; id | clust1 | clust2 | val1 | val2 +++--+-- 1 | c2 |cc2 | v1 | v2 1 | c2 |cc3 | v1 | v2 Tracing session: b43d9170-e73b-11e7-931f-517010c60cf9 activity | timestamp | source| source_elapsed | client ---++---++--- ... Read 2 live rows, 1 deleted rows and 2 tombstone cells [ReadStage-1] | 2017-12-22 18:15:27.117000 | 127.0.0.1 | 4487 | 127.0.0.2 ... My patch is applied here so the deleted row appears in the new counter, but the tombstone cell count is unchanged (I haven't touched the way they are counted). It seems like only PK tombstones are the ones that are currently impossible to notice. Are they even considered as range tombstones ? I've failed to identify them as such while debugging the code. We still cannot know how many tombstones or non live cells we're really reading from disk due to early merging. What we get in the ReadCommand class is expectedly pretty different from what Cassandra reads from disk/memory. @Kurt : I'd be in favor of not adding a new setting in the yaml file. I think it's better for folks to realize early that they're reading a lot of deleted rows. Makes sense to do this only in 4.x. @DuyHai : what would you consider a non wise use of