Re: Cassandra 2017 Wrapup

2017-12-22 Thread DuyHai Doan
Thanks Jeff for the very comprehensive list of actions taken this year.
Can't wait to put my hands on 4.0 once it's released



On Fri, Dec 22, 2017 at 10:20 PM, Jeff Jirsa  wrote:

> Happy holidays all,
>
> I imagine most people are about to disappear to celebrate holidays, so I
> wanted to try to summarize the state of Cassandra dev for 2017, as I see
> it. Standard disclaimers apply (this is my personal opinion, not that of my
> employer, not officially endorsed by the Apache Cassandra PMC, or the ASF).
>
> Some quick stats about Cassandra development efforts in 2017 (using
> imperfect git log | awk/sed counting, only looking at trunk, buyer beware,
> it's probably off by a few):
>
> The first commit of 2017 was: Ben Manes, transforming the on-heap cache to
> Caffeine (
> https://github.com/apache/cassandra/commit/c607d76413be81a0e125c5780e068d
> 7ab7594612
> )
> Alex Petrov removed the most code (~7500 lines, according to github)
> Benjamin Lerer added the most code (~8000 lines, according to github)
> We put to bed the tick/tock release cycle, but still cut 14 different
> releases across 5 different branches.
> We had a total of 136 different contributors, with 48 of those contributors
> contributing more than one patch during the year.
> We had a total of 47 different reviewers
> There were 661 non-merge commits to trunk
> There were 56 non-merge commits to docs/
> We end the year with roughly 173 pending changes for 4.0
> We resolved (either fixed or disqualified) 781 issues in JIRA
> I count something like 273 email threads to dev@, and 903 email threads to
> user@
> The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex
> Petrov, Blake Eggleston, and Philip Thompson as committers.
> The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the
> Apache Cassandra PMC
>
> At NGCC (which Eric and Gary managed to organize with the help of
> Instaclustr sponsoring, an achievement in itself), we had people talk
> about:
> - Two different talks (from Apple and FB/Instagram). I'm struggling to
> describe these in simple terms, they both sorta involving using hints and
> changing some of the consistency concepts to help deal with latency /
> durability / availability, especially in cross-DC workloads. Grouping these
> together isn't really fair, but no one-email summary is going to be fair to
> either of these talks. If you missed NGCC, I guess you get to wait for the
> JIRAs / patches.
> - A new storage engine (FB/Instagram) using RocksDB
> - Some notes on using CDC at scale (and some proposed changes to make it
> easier) from Uber (
> https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf )
> - Michael Shuler (Datastax /  Cassandra PMC / release master / etc) spent
> some time talking about testing and CI.
>
> Some other big'ish development efforts worth mentioning (from personal
> memory, perhaps the worst possible way to create such a list):
> - We spent a fair amount of time talking about testing. Francois @
> Instagram lead the way in codifying a new set of principles around testing
> and quality (
> https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486
> cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E
> / https://issues.apache.org/jira/browse/CASSANDRA-13497 ).
> - We've also spent some time making tests work in CircleCI, which should
> make life much easier for occasional contributors - no need to figure out
> how to run tests in ASF Jenkins.
> - The internode messaging rewrite to use async/netty is probably the single
> largest that comes to mind. It went in earlier this year, and should make
> it easier to have HUGE clusters. All of you running thousand instance
> clusters will probably benefit from this patch (I know you're out there,
> I've talked to you in IRC) - will be in 4.0 (
> https://issues.apache.org/jira/browse/CASSANDRA-8457 )
> - We have a company working on making Cassandra happy with proprietary
> flash storage and PPC64LE (IBM's recent patches,
> https://developer.ibm.com/linuxonpower/2017/03/31/using-
> capi-improve-performance-apache-cassandra-work-progress-update/
> )
> - We have a new commitlog mode added for the first time in quite some time
> - the GroupCommitLog will be in 4.0 (
> https://issues.apache.org/jira/browse/CASSANDRA-13530 )
> - Michael Kjellman spent some time porting dtests from nose to pytest, and
> from python 2.7 to python 3, removing dependencies on dead projects like
> pycassa and the old thrift-cql library. Still needs to be reviewed (
> https://issues.apache.org/jira/browse/CASSANDRA-14134 )
> - Robert Stupp spent some time porting to java9 - again, still need to be
> reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-9608 )
>
> Overall, the state of the project appears to be strong. We're seeing active
> contributions driven primarily by users (like you), the 8099/3.0 engine is
> looking pretty good here in December, and the code base is stabilizing
> towards 

Cassandra 2017 Wrapup

2017-12-22 Thread Jeff Jirsa
Happy holidays all,

I imagine most people are about to disappear to celebrate holidays, so I
wanted to try to summarize the state of Cassandra dev for 2017, as I see
it. Standard disclaimers apply (this is my personal opinion, not that of my
employer, not officially endorsed by the Apache Cassandra PMC, or the ASF).

Some quick stats about Cassandra development efforts in 2017 (using
imperfect git log | awk/sed counting, only looking at trunk, buyer beware,
it's probably off by a few):

The first commit of 2017 was: Ben Manes, transforming the on-heap cache to
Caffeine (
https://github.com/apache/cassandra/commit/c607d76413be81a0e125c5780e068d7ab7594612
)
Alex Petrov removed the most code (~7500 lines, according to github)
Benjamin Lerer added the most code (~8000 lines, according to github)
We put to bed the tick/tock release cycle, but still cut 14 different
releases across 5 different branches.
We had a total of 136 different contributors, with 48 of those contributors
contributing more than one patch during the year.
We had a total of 47 different reviewers
There were 661 non-merge commits to trunk
There were 56 non-merge commits to docs/
We end the year with roughly 173 pending changes for 4.0
We resolved (either fixed or disqualified) 781 issues in JIRA
I count something like 273 email threads to dev@, and 903 email threads to
user@
The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex
Petrov, Blake Eggleston, and Philip Thompson as committers.
The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the
Apache Cassandra PMC

At NGCC (which Eric and Gary managed to organize with the help of
Instaclustr sponsoring, an achievement in itself), we had people talk about:
- Two different talks (from Apple and FB/Instagram). I'm struggling to
describe these in simple terms, they both sorta involving using hints and
changing some of the consistency concepts to help deal with latency /
durability / availability, especially in cross-DC workloads. Grouping these
together isn't really fair, but no one-email summary is going to be fair to
either of these talks. If you missed NGCC, I guess you get to wait for the
JIRAs / patches.
- A new storage engine (FB/Instagram) using RocksDB
- Some notes on using CDC at scale (and some proposed changes to make it
easier) from Uber (
https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf )
- Michael Shuler (Datastax /  Cassandra PMC / release master / etc) spent
some time talking about testing and CI.

Some other big'ish development efforts worth mentioning (from personal
memory, perhaps the worst possible way to create such a list):
- We spent a fair amount of time talking about testing. Francois @
Instagram lead the way in codifying a new set of principles around testing
and quality (
https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E
/ https://issues.apache.org/jira/browse/CASSANDRA-13497 ).
- We've also spent some time making tests work in CircleCI, which should
make life much easier for occasional contributors - no need to figure out
how to run tests in ASF Jenkins.
- The internode messaging rewrite to use async/netty is probably the single
largest that comes to mind. It went in earlier this year, and should make
it easier to have HUGE clusters. All of you running thousand instance
clusters will probably benefit from this patch (I know you're out there,
I've talked to you in IRC) - will be in 4.0 (
https://issues.apache.org/jira/browse/CASSANDRA-8457 )
- We have a company working on making Cassandra happy with proprietary
flash storage and PPC64LE (IBM's recent patches,
https://developer.ibm.com/linuxonpower/2017/03/31/using-capi-improve-performance-apache-cassandra-work-progress-update/
)
- We have a new commitlog mode added for the first time in quite some time
- the GroupCommitLog will be in 4.0 (
https://issues.apache.org/jira/browse/CASSANDRA-13530 )
- Michael Kjellman spent some time porting dtests from nose to pytest, and
from python 2.7 to python 3, removing dependencies on dead projects like
pycassa and the old thrift-cql library. Still needs to be reviewed (
https://issues.apache.org/jira/browse/CASSANDRA-14134 )
- Robert Stupp spent some time porting to java9 - again, still need to be
reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-9608 )

Overall, the state of the project appears to be strong. We're seeing active
contributions driven primarily by users (like you), the 8099/3.0 engine is
looking pretty good here in December, and the code base is stabilizing
towards a product all of us should be happy to run in production. Despite
some irrationally skeptical sky-is-falling threads near the end of 2016, I
feel confident in saying it was a pretty good year for Cassandra, and as
the project continues to move forward, I'm looking forward to seeing 4.0
launch in 2018 (hopefully with a real user conference!)

- Jeff


Re: CASSANDRA-8527

2017-12-22 Thread Alexander Dejanovski
Hi folks,

thanks for the feedback so far.

@Jeff, there are two distinct cases here :

   1. The range tombstones created on a partial primary key (that doesn't
   include the last column of the PK for example) : a single tombstone can
   shadow many rows
   2. The range tombstones created on the full PK : a single tombstone can
   shadow a single row only

In the first case, the range tombstones are (almost) correctly counted
(after merge since that happens really early in the code). We cannot know
how many shadowed rows/cells were read because they get merged early with
the tombstones.

In the second case (full PK delete), there is no tombstone counted and we
only get a count of the live rows after merge.

I'll illustrate this with the example below :

CREATE TABLE users.test (id int, clust1 text, clust2 text, val1 text, val2
text, PRIMARY KEY(id, clust1, clust2));
insert into users.test(id , clust1, clust2 , val1 , val2)
values(1,'c1','cc1', 'v1','v2');
insert into users.test(id , clust1, clust2 , val1 , val2)
values(1,'c1','cc2', 'v1','v2');
insert into users.test(id , clust1, clust2 , val1 , val2)
values(1,'c1','cc3', 'v1','v2');
insert into users.test(id , clust1, clust2 , val1 , val2)
values(1,'c2','cc1', 'v1','v2');
insert into users.test(id , clust1, clust2 , val1 , val2)
values(1,'c2','cc2', 'v1','v2');
insert into users.test(id , clust1, clust2 , val1 , val2)
values(1,'c2','cc3', 'v1','v2');

cqlsh> select * from users.test
   ... ;

 id | clust1 | clust2 | val1 | val2
+++--+--
  1 | c1 |cc1 |   v1 |   v2
  1 | c1 |cc2 |   v1 |   v2
  1 | c1 |cc3 |   v1 |   v2
  1 | c2 |cc1 |   v1 |   v2
  1 | c2 |cc2 |   v1 |   v2
  1 | c2 |cc3 |   v1 |   v2


Tracing session: 4c9804c0-e73a-11e7-931f-517010c60cf9

 activity
   | timestamp
| source| source_elapsed | client
--++---++---
...
Read 6 live rows, 0 deleted rows and 0 tombstone cells [ReadStage-1] |
2017-12-22 18:12:55.954000 | 127.0.0.1 |   5567 | 127.0.0.2

Then I issue a range tombstone on id = 1 and clust1 =  'c1' :

cqlsh> delete from users.test where id = 1 and clust1 = 'c1';
cqlsh> select * from users.test;

 id | clust1 | clust2 | val1 | val2
+++--+--
  1 | c2 |cc1 |   v1 |   v2
  1 | c2 |cc2 |   v1 |   v2
  1 | c2 |cc3 |   v1 |   v2

Tracing session: 8597e320-e73b-11e7-931f-517010c60cf9

 activity
| timestamp
  | source| source_elapsed | client
---++---++---
...
Read 3 live rows, 0 deleted rows and 2 tombstone cells [ReadStage-1] |
2017-12-22 18:14:08.855000 | 127.0.0.1 |   2878 | 127.0.0.2

Each range tombstone apparently counts for 2 tombstone cells (probably due
the the start bound marker and the end bound marker ?).

Then if I just delete a single row :

cqlsh> delete from users.test where id = 1 and clust1 = 'c2' and clust2 =
'cc1';
cqlsh> select * from users.test;

 id | clust1 | clust2 | val1 | val2
+++--+--
  1 | c2 |cc2 |   v1 |   v2
  1 | c2 |cc3 |   v1 |   v2

  Tracing session: b43d9170-e73b-11e7-931f-517010c60cf9

 activity
| timestamp
  | source| source_elapsed | client
---++---++---
...
  Read 2 live rows, 1
deleted rows and 2 tombstone cells [ReadStage-1] | 2017-12-22
18:15:27.117000 | 127.0.0.1 |   4487 | 127.0.0.2
...

My patch is applied here so the deleted row appears in the new counter, but
the tombstone cell count is unchanged (I haven't touched the way they are
counted).

It seems like only PK tombstones are the ones that are currently impossible
to notice. Are they even considered as range tombstones ? I've failed to
identify them as such while debugging the code.

We still cannot know how many tombstones or non live cells we're really
reading from disk due to early merging. What we get in the ReadCommand
class is expectedly pretty different from what Cassandra reads from
disk/memory.

@Kurt : I'd be in favor of not adding a new setting in the yaml file. I
think it's better for folks to realize early that they're reading a lot of
deleted rows. Makes sense to do this only in 4.x.

@DuyHai : what would you consider a non wise use of