Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman
i’m not trying to get into a fight here jeremiah. and this will be my last 
reply on this as i’ve made my opinion pretty clear. but ask yourself: would you 
run c* in idea debugger and then do performance testing? no. because it’s a 
DEBUGger.

> On Mar 18, 2018, at 11:43 AM, J. D. Jordan  wrote:
> 
> If there are some log messages you think should be improved to make them more 
> useful please do so.  Saying things are “crap” is not productive.
> 
> I have seen having the extra information from the debug.log be very helpful 
> in debugging production issues after the fact on operational clusters many 
> times.
> 
> Also if you think there are things logged at DEBUG, since it was cleaned it 
> up, that are not useful, then please improve them or change their logging 
> level.
> 
> You are also free to change the logging level on clusters you run if you 
> don’t want the extra information.
> 
> And again we are only talking about versions where DEBUG has been cleaned up. 
> When running 2.1 or earlier, yes there is a ton of stuff at DEBUG and you 
> would not want that on by default, even asynchronously.
> 
> It is up to reviewers and committers to understand the impact of and rules 
> around the use of different log levels. Said reviewers and committers should 
> teach new contributors those rules during reviews if they are violated.
> 
> -Jeremiah
> 
>> On Mar 18, 2018, at 2:31 PM, Michael Kjellman  wrote:
>> 
>> what really baffles me with this entire thing is as a project we don’t 
>> even log things like partition keys along with the tombstone overwhelming or 
>> batch to large log messages.. this would immediately be helpful to thousands 
>> and thousands of people... yet somehow we think it’s okay to log tons of 
>> crap at debug to users drives that will shorten their ssds and objectively 
>> reduce the performance of the actual database due to logging overhead for 
>> some possible day in the future when we might need them to debug a problem 
>> really we should have figured out and reproduced ourselves in the first 
>> place.
>> 
>>> On Mar 18, 2018, at 11:24 AM, Michael Kjellman  wrote:
>>> 
>>> it’s too easy to make a regression there. and does anyone even have a 
>>> splunk (or equivalent) infrastructure to actually keep debug logs around 
>>> for a long enough retention period to even have them be helpful?
>>> 
>>> again: this is something engineers for the project want. it’s not in the 
>>> best interest for our users. 
>>> 
>>> 
>>>> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis  wrote:
>>>> 
>>>> That really depends on whether you're judicious in deciding what to log at
>>>> debug, doesn't it?
>>>> 
>>>> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman 
>>>> wrote:
>>>> 
>>>>> +1. this is how it works.
>>>>> 
>>>>> your computer doesn’t run at debug logging by default. your phone 
>>>>> doesn’t
>>>>> either. neither does your smart tv. your database can’t be running at 
>>>>> debug
>>>>> just because it makes our lives as engineers easier.
>>>>> 
>>>>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>>>>> a...@thelastpickle.com> wrote:
>>>>>> 
>>>>>> It's a tiny bit unusual to turn on debug logging for all users by default
>>>>>> though, and there should be occasions to turn it on when facing issues
>>>>> that
>>>>>> you want to debug (if they can be easily reproduced).
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jonathan Ellis
>>>> co-founder, http://www.datastax.com
>>>> @spyced
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> ТÐÐ¥FòVç7V'67&–&RÂRÖÖ–âFWb×Vç7V'67&–&T676æG&æ6†Ræ÷&pФf÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–âFWbÖ†VÇ676æG&æ6†Ræ÷&pÐ
>>  Ð
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman
what really baffles me with this entire thing is as a project we don’t even log 
things like partition keys along with the tombstone overwhelming or batch to 
large log messages.. this would immediately be helpful to thousands and 
thousands of people... yet somehow we think it’s okay to log tons of crap at 
debug to users drives that will shorten their ssds and objectively reduce the 
performance of the actual database due to logging overhead for some possible 
day in the future when we might need them to debug a problem really we should 
have figured out and reproduced ourselves in the first place.

> On Mar 18, 2018, at 11:24 AM, Michael Kjellman  wrote:
> 
> it’s too easy to make a regression there. and does anyone even have a splunk 
> (or equivalent) infrastructure to actually keep debug logs around for a long 
> enough retention period to even have them be helpful?
> 
> again: this is something engineers for the project want. it’s not in the best 
> interest for our users. 
> 
> 
>> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis  wrote:
>> 
>> That really depends on whether you're judicious in deciding what to log at
>> debug, doesn't it?
>> 
>> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman 
>> wrote:
>> 
>>> +1. this is how it works.
>>> 
>>> your computer doesn’t run at debug logging by default. your phone doesn’t
>>> either. neither does your smart tv. your database can’t be running at debug
>>> just because it makes our lives as engineers easier.
>>> 
>>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>> 
>>>> It's a tiny bit unusual to turn on debug logging for all users by default
>>>> though, and there should be occasions to turn it on when facing issues
>>> that
>>>> you want to debug (if they can be easily reproduced).
>>> 
>> 
>> 
>> 
>> -- 
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman
it’s too easy to make a regression there. and does anyone even have a splunk 
(or equivalent) infrastructure to actually keep debug logs around for a long 
enough retention period to even have them be helpful?

again: this is something engineers for the project want. it’s not in the best 
interest for our users. 


> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis  wrote:
> 
> That really depends on whether you're judicious in deciding what to log at
> debug, doesn't it?
> 
> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman 
> wrote:
> 
>> +1. this is how it works.
>> 
>> your computer doesn’t run at debug logging by default. your phone doesn’t
>> either. neither does your smart tv. your database can’t be running at debug
>> just because it makes our lives as engineers easier.
>> 
>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>> 
>>> It's a tiny bit unusual to turn on debug logging for all users by default
>>> though, and there should be occasions to turn it on when facing issues
>> that
>>> you want to debug (if they can be easily reproduced).
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Michael Kjellman
+1. this is how it works.

your computer doesn’t run at debug logging by default. your phone doesn’t 
either. neither does your smart tv. your database can’t be running at debug 
just because it makes our lives as engineers easier. 

> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski  
> wrote:
> 
> It's a tiny bit unusual to turn on debug logging for all users by default
> though, and there should be occasions to turn it on when facing issues that
> you want to debug (if they can be easily reproduced).


Re: Debug logging enabled by default since 2.2

2018-03-17 Thread Michael Kjellman
ive never understood this change. and it’s been explained to me multiple times.

DEBUG shouldn’t run by default in prod. and it certainly shouldn’t be enabled 
by default for users.

but hey, what do i know! just my 2 cents. 

> On Mar 17, 2018, at 10:55 AM, J. D. Jordan  wrote:
> 
> We went through an exercise of setting things up so that DEBUG logging was 
> asynchronous would give people a “production” debug log. 
> https://issues.apache.org/jira/browse/CASSANDRA-10241
> If there are some things going out at DEBUG that cause performance issues 
> then most likely those should be moved to TRACE so that debug logging can 
> stay enabled for all the useful information found there.
> 
> -Jeremiah
> 
>> On Mar 17, 2018, at 1:49 PM, Alexander Dejanovski  
>> wrote:
>> 
>> Hi folks,
>> 
>> we've been upgrading clusters from 2.0 to 2.2 recently and we've noticed
>> that debug logging was causing serious performance issues in some cases,
>> specifically because of its use in the query pager.
>> 
>> I've opened a ticket with some benchmarks and flame graphs :
>> https://issues.apache.org/jira/browse/CASSANDRA-14318
>> 
>> The problem should be less serious in the read path with Cassandra 3.0 and
>> above as the query pager code has been reworked and doesn't log at debug
>> level.
>> I think that debug logging shouldn't be turned on by default though, since
>> we see it doesn't come for free and that it lowers read performance in 2.2.
>> 
>> Was there any specific reason why it was enabled by default in 2.2 ?
>> 
>> Is anyone opposed to disabling debug logging by default in all branches ?
>> 
>> -- 
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>> 
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018

2018-02-27 Thread Michael Kjellman
2.2: yes
2.1: no.

i don't think it's worth the effort to get it working on 2.1 at this point -- 
and i hope we've fully moved on from 2.1 by August 31, 2018 ;)

> On Feb 27, 2018, at 5:35 PM, kurt greaves  wrote:
> 
> Not that much gets committed to 2.1 and 2.2 anymore, but is this also true
> for those branches?
> 
> On 27 February 2018 at 22:58, Michael Kjellman  wrote:
> 
>> FYI: we're already fully on circleci 2.0 for the 3.0, 3.11, and trunk
>> branches so no action required for us here!
>> 
>> best,
>> kjellman
>> 
>> Begin forwarded message:
>> 
>> From: The CircleCI Team > no-re...@circleci.com>>
>> Subject: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018
>> Date: February 27, 2018 at 2:44:01 PM PST
>> To: mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>
>> Reply-To: mailto:no-re...@circleci.com>>
>> 
>> 
>> Dear customer,
>> 
>> We wanted to let you know that we are planning on sunsetting CircleCI 1.0
>> on August 31st, 2018. Our goal as a company for 2018 is to invest in
>> delivering more features and better performance on CircleCI 2.0, which
>> unlocks faster builds and greater control. For more information, you can
>> read our blog post<http://go.circleci.com/Mk0H040mZ0a12006G0UM052> on
>> sunsetting CircleCI 1.0.
>> 
>> You’ll need to update all of your config files to the CircleCI 2.0 syntax
>> in order to migrate your projects to CircleCI 2.0 over the next 6 months.
>> 
>> If all of your projects are already on 2.0, congratulations! No action is
>> necessary. We are sending this announcement to all active users to make
>> sure you have all of the information you need. Take a look at your builds
>> dashboard<http://go.circleci.com/ra060H200M7GmkZ1U004020> to see if your
>> projects are still building on CircleCI 1.0:
>> 
>> [https://www2.circleci.com/rs/485-ZMH-626/images/CircleCI%
>> 20Version%20Number.png]
>> 
>> 
>> These resources will help you get migrate your projects from 1.0 to 2.0:
>> 
>> Config.yml translator.<http://go.circleci.com/ZMkaU018240m720GHZ0><
>> http://go.circleci.com/ZMkaU018240m720GHZ0> Note: This will generate
>> a baseline config.yml file that you can adjust to fit your needs.
>> 1.0 to 2.0 migration documentation.<http://go.circleci.com/
>> X2G0U9M0k000H0am4021Z80>
>> Language-specific 2.0 tutorials.<http://go.circleci.
>> com/Q00H0mM0Z92G40k1200aaU0>
>> 
>> We will be sending you email reminders periodically with additional
>> resources and links as they become available to help with your migration
>> plan. We will also be updating this page<http://go.circleci.com/
>> ga0makG04201MH02Z000b0U> with information relevant to sunsetting CircleCI
>> 1.0. If you need additional migration assistance, open a support request<
>> http://go.circleci.com/R00Gam1M420020U0ZbH0ck0> and our support team will
>> be in touch.
>> 
>> 
>> Cheers,
>> 
>> The CircleCI Team
>> 
>> 
>> 
>> 
>> 



Re: Timeout unit tests in trunk

2018-02-27 Thread Michael Kjellman
i've seen it timeout a lot too. if you think breaking it up will fix it that 
definitely sounds like a good approach!

> On Feb 27, 2018, at 2:57 PM, Dikang Gu  wrote:
> 
> I took some look at the cql3.ViewTest, it seems too big and timeout very
> often. Any objections if I split it into two or multiple tests?
> 
> On Tue, Feb 27, 2018 at 1:32 PM, Michael Kjellman 
> wrote:
> 
>> well, turns out we already have a jira tracking the MV tests being broken
>> on trunk. they are legit broken :) thanks jaso
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-14194
>> 
>> not sure about the batch test timeout there though.. did you debug it at
>> all by chance?
>> 
>> 
>> On Feb 27, 2018, at 1:27 PM, Michael Kjellman > kjell...@apple.com>> wrote:
>> 
>> hey dikang: just chatted a little bit about this. proposal: let's add the
>> equivalent of @resource_intensive to unit tests too.. and the first one is
>> to stop from running the MV unit tests in the free circleci containers.
>> thoughts?
>> 
>> also, might want to bug your management to see if you can get some paid
>> circleci resources. it's game changing!
>> 
>> best,
>> kjellman
>> 
>> On Feb 27, 2018, at 12:12 PM, Dinesh Joshi > INVALID<mailto:dinesh.jo...@yahoo.com.INVALID>> wrote:
>> 
>> Some tests might require additional resources to spin up the required
>> components. 2 CPU / 4GB might not be sufficient. You may need to bump up
>> the resources to 8CPU / 16GB.
>> Dinesh
>> 
>>  On Tuesday, February 27, 2018, 11:24:34 AM PST, Dikang Gu <
>> dikan...@gmail.com<mailto:dikan...@gmail.com>> wrote:
>> 
>> Looks like there are a few flaky/timeout unit tests in trunk, wondering is
>> there anyone looking at them already?
>> 
>> testBuildRange - org.apache.cassandra.db.view.ViewBuilderTaskTest
>> testUnloggedPartitionsPerBatch -
>> org.apache.cassandra.metrics.BatchMetricsTest
>> testViewBuilderResume - org.apache.cassandra.cql3.ViewTest
>> 
>> https://circleci.com/gh/DikangGu/cassandra/20
>> 
>> --
>> Dikang
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev-
>> unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: dev-h...@cassandra.apache.org> dev-h...@cassandra.apache.org>
>> 
>> 
>> 
> 
> 
> -- 
> Dikang


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Fwd: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018

2018-02-27 Thread Michael Kjellman
FYI: we're already fully on circleci 2.0 for the 3.0, 3.11, and trunk branches 
so no action required for us here!

best,
kjellman

Begin forwarded message:

From: The CircleCI Team mailto:no-re...@circleci.com>>
Subject: Action Required: We are sunsetting CircleCI 1.0 on August 31, 2018
Date: February 27, 2018 at 2:44:01 PM PST
To: mkjell...@internalcircle.com
Reply-To: mailto:no-re...@circleci.com>>


Dear customer,

We wanted to let you know that we are planning on sunsetting CircleCI 1.0 on 
August 31st, 2018. Our goal as a company for 2018 is to invest in delivering 
more features and better performance on CircleCI 2.0, which unlocks faster 
builds and greater control. For more information, you can read our blog 
post on sunsetting CircleCI 1.0.

You’ll need to update all of your config files to the CircleCI 2.0 syntax in 
order to migrate your projects to CircleCI 2.0 over the next 6 months.

If all of your projects are already on 2.0, congratulations! No action is 
necessary. We are sending this announcement to all active users to make sure 
you have all of the information you need. Take a look at your builds 
dashboard to see if your 
projects are still building on CircleCI 1.0:

[https://www2.circleci.com/rs/485-ZMH-626/images/CircleCI%20Version%20Number.png]


These resources will help you get migrate your projects from 1.0 to 2.0:

Config.yml 
translator.
 Note: This will generate a baseline config.yml file that you can adjust to fit 
your needs.
1.0 to 2.0 migration 
documentation.
Language-specific 2.0 tutorials.

We will be sending you email reminders periodically with additional resources 
and links as they become available to help with your migration plan. We will 
also be updating this page with 
information relevant to sunsetting CircleCI 1.0. If you need additional 
migration assistance, open a support 
request and our support team 
will be in touch.


Cheers,

The CircleCI Team






Re: Timeout unit tests in trunk

2018-02-27 Thread Michael Kjellman
well, turns out we already have a jira tracking the MV tests being broken on 
trunk. they are legit broken :) thanks jaso

https://issues.apache.org/jira/browse/CASSANDRA-14194

not sure about the batch test timeout there though.. did you debug it at all by 
chance?


On Feb 27, 2018, at 1:27 PM, Michael Kjellman 
mailto:kjell...@apple.com>> wrote:

hey dikang: just chatted a little bit about this. proposal: let's add the 
equivalent of @resource_intensive to unit tests too.. and the first one is to 
stop from running the MV unit tests in the free circleci containers. thoughts?

also, might want to bug your management to see if you can get some paid 
circleci resources. it's game changing!

best,
kjellman

On Feb 27, 2018, at 12:12 PM, Dinesh Joshi 
mailto:dinesh.jo...@yahoo.com.INVALID>> wrote:

Some tests might require additional resources to spin up the required 
components. 2 CPU / 4GB might not be sufficient. You may need to bump up the 
resources to 8CPU / 16GB.
Dinesh

  On Tuesday, February 27, 2018, 11:24:34 AM PST, Dikang Gu 
mailto:dikan...@gmail.com>> wrote:

Looks like there are a few flaky/timeout unit tests in trunk, wondering is
there anyone looking at them already?

testBuildRange - org.apache.cassandra.db.view.ViewBuilderTaskTest
testUnloggedPartitionsPerBatch -
org.apache.cassandra.metrics.BatchMetricsTest
testViewBuilderResume - org.apache.cassandra.cql3.ViewTest

https://circleci.com/gh/DikangGu/cassandra/20

--
Dikang


-
To unsubscribe, e-mail: 
dev-unsubscr...@cassandra.apache.org<mailto:dev-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
dev-h...@cassandra.apache.org<mailto:dev-h...@cassandra.apache.org>




Re: Timeout unit tests in trunk

2018-02-27 Thread Michael Kjellman
hey dikang: just chatted a little bit about this. proposal: let's add the 
equivalent of @resource_intensive to unit tests too.. and the first one is to 
stop from running the MV unit tests in the free circleci containers. thoughts?

also, might want to bug your management to see if you can get some paid 
circleci resources. it's game changing!

best,
kjellman

> On Feb 27, 2018, at 12:12 PM, Dinesh Joshi  
> wrote:
> 
> Some tests might require additional resources to spin up the required 
> components. 2 CPU / 4GB might not be sufficient. You may need to bump up the 
> resources to 8CPU / 16GB.
> Dinesh 
> 
>On Tuesday, February 27, 2018, 11:24:34 AM PST, Dikang Gu 
>  wrote:  
> 
> Looks like there are a few flaky/timeout unit tests in trunk, wondering is
> there anyone looking at them already?
> 
> testBuildRange - org.apache.cassandra.db.view.ViewBuilderTaskTest
> testUnloggedPartitionsPerBatch -
> org.apache.cassandra.metrics.BatchMetricsTest
> testViewBuilderResume - org.apache.cassandra.cql3.ViewTest
> 
> https://circleci.com/gh/DikangGu/cassandra/20
> 
> -- 
> Dikang


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Michael Kjellman
it's an interesting idea. i'd wonder how much overhead you'd end up with 
message parsing and negate any potential GC wins. rick branson had played 
around a bunch with running storage nodes and doubling down on the old "fat 
client" model. if you had 1 tables (yes, barely works but we don't 
explicitly prevent it) you can't really run that many jvm processes on a single 
box.

> On Feb 22, 2018, at 12:39 PM, Carl Mueller  
> wrote:
> 
> GC pauses may have been improved in newer releases, since we are in 2.1.x,
> but I was wondering why cassandra uses one jvm for all tables and
> keyspaces, intermingling the heap for on-JVM objects.
> 
> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
> tuned per table and gc tuned and gc impacts not impact other tables? It
> would probably increase the number of endpoints if we avoid having an
> overarching query router.


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Michael Kjellman
Please do send them! There was a *lot* of really hard great work by a lot of 
people over the past year to significantly improve the documentation in tree.

http://cassandra.apache.org/doc/latest/
https://github.com/apache/cassandra/tree/trunk/doc

I still didn't see a reply from you re: my request for your jira information so 
i'm unable to follow what issues you're referring to as you haven't linked to 
any in your emails either. If you still see holes in the new and improved 
documentation above, _please_ do create tickets to track that so we can improve 
that asap! a fresh set of eyes on areas not covered is obviously welcomed; 
especially those with overlap with the links you're referring to in your email 
obviously.

best,
kjellman

On Feb 21, 2018, at 4:13 PM, Kenneth Brotman 
mailto:kenbrot...@yahoo.com.INVALID>> wrote:



Jeff,



I already addressed everything you said.  Boy! Would I like to bring up the out 
of date articles on the web that trip people up and the lousy documentation on 
the Apache website but I can’t because a lot of folks don’t know me or why I’m 
saying these things.



I will be making another post that I hope clarifies what’s going on with me.  
After that I will either be a freakishly valuable asset to this community or I 
will be a freakishly valuable asset to another community.



You sure have a funny way of reigning in people that are used to helping out.  
You sure misjudged me.  Wow.



Kenneth Brotman



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, February 21, 2018 3:12 PM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!





On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:

Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.





There's no aversion to usability, you're assuming things that just aren't true 
Nobody's against usability, we've just prioritized other things HIGHER. We make 
those decisions in part by looking at open JIRAs and determining what's asked 
for the most, what members of the community have contributed, and then balance 
that against what we ourselves care about. You're making a statement that it 
should be the top priority for the next release, with no JIRA, and history of 
contributing (and indeed, no real clear sign that you even understand the full 
extent of the database), no sign that you're willing to do the work yourself, 
and making a ton of assumptions about the level of effort and ROI.



I would love for Cassandra to be easier to use, I'm sure everyone does. There's 
a dozen features I'd love to add if I had infinite budget and infinite 
manpower. But what you're asking for is A LOT of effort and / or A LOT of 
money, and you're assuming someone's going to step up and foot the bill, but 
there's no real reason to believe that's the case.



In the mean time, everyone's spending hours replying to this thread that is 0% 
actionable. We would all have been objectively better off had everyone ignored 
this thread and just spent 10 minutes writing some section of the docs. So the 
next time I get the urge to reply, I'm just going to do that instead.










Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Michael Kjellman
kenneth: could you please send your jira information? i'm unable to even find 
an account on http://issues.apache.org with your name despite multiple 
attempts. thanks!

best,
kjellman

> On Feb 21, 2018, at 2:20 PM, Kenneth Brotman  
> wrote:
> 
> Jon,
> 
> Very sorry that you don't see the value of the time I'm taking for this.  I 
> don't have demands; I do have a stern warning and I'm right Jon.  Please be 
> very careful not to mischaracterized my words Jon.
> 
> You suggest I put things in JIRA's, then seem to suggest that I'd be lucky if 
> anyone looked at it and did anything. That's what I figured too.  
> 
> I don't appreciate the hostility.  You will understand more fully in the next 
> post where I'm coming from.  Try to keep the conversation civilized.  I'm 
> trying or at least so you understand I think what I'm doing is saving your 
> gig and mine.  I really like a lot of people is this group.
> 
> I've come to a preliminary assessment on things.  Soon the cloud will clear 
> or I'll be gone.  Don't worry.  I'm a very peaceful person and like you I am 
> driven by real important projects that I feel compelled to work on for the 
> good of others.  I don't have time for people to hand hold a database and I 
> can't get stuck with my projects on the wrong stuff.  
> 
> Kenneth Brotman
> 
> 
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: u...@cassandra.apache.org
> Cc: dev@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
> 
> Ken,
> 
> Maybe it’s not clear how open source projects work, so let me try to explain. 
>  There’s a bunch of us who either get paid by someone or volunteer on our 
> free time.  The folks that get paid, (yay!) usually take direction on what 
> the priorities are, and work on projects that directly affect our jobs.  That 
> means that someone needs to care enough about the features you want to work 
> on them, if you’re not going to do it yourself. 
> 
> Now as others have said already, please put your list of demands in JIRA, if 
> someone is interested, they will work on it.  You may need to contribute a 
> little more than you’ve done already, be prepared to get involved if you 
> actually want to to see something get done.  Perhaps learning a little more 
> about Cassandra’s internals and the people involved will reveal some of the 
> design decisions and priorities of the project.  
> 
> Third, you seem to be a little obsessed with market share.  While market 
> share is fun to talk about, *most* of us that are working on and contributing 
> to Cassandra do so because it does actually solve a problem we have, and 
> solves it reasonably well.  If some magic open source DB appears out of no 
> where and does everything you want Cassandra to, and is bug free, keeps your 
> data consistent, automatically does backups, comes with really nice cert 
> management, ad hoc querying, amazing materialized views that are perfect, no 
> caveats to secondary indexes, and somehow still gives you linear scalability 
> without any mental overhead whatsoever then sure, people might start using 
> it.  And that’s actually OK, because if that happens we’ll all be incredibly 
> pumped out of our minds because we won’t have to work as hard.  If on the 
> slim chance that doesn’t manifest, those of us that use Cassandra and are 
> part of the community will keep working on the things we care about, 
> iterating, and improving things.  Maybe someone will even take a look at your 
> JIRA issues.  
> 
> Further filling the mailing list with your grievances will likely not help 
> you progress towards your goal of a Cassandra that’s easier to use, so I 
> encourage you to try to be a little more productive and try to help rather 
> than just complain, which is not constructive.  I did a quick search for your 
> name on the mailing list, and I’ve seen very little from you, so to 
> everyone’s who’s been around for a while and trying to help you it looks like 
> you’re just some random dude asking for people to work for free on the things 
> you’re asking for, without offering anything back in return.
> 
> Jon
> 
> 
>> On Feb 21, 2018, at 11:56 AM, Kenneth Brotman  
>> wrote:
>> 
>> Josh,
>> 
>> To say nothing is indifference.  If you care about your community, sometimes 
>> don't you have to bring up a subject even though you know it's also 
>> temporarily adding some discomfort?  
>> 
>> As to opening a JIRA, I've got a very specific topic to try in mind now.  An 
>> easy one I'll work on and then announce.  Someone else will have to do the 
>> coding.  A year from now I would probably just knock it out to make sure 
>> it's as easy as I expect it to be but to be honest, as I've been saying, I'm 
>> not set up to do that right now.  I've barely looked at any Cassandra code; 
>> for one; everyone on this list probably codes more than I do, secondly; and 

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread Michael Kjellman
the things you are asking for are unfortunately not tiny effort. as you don’t 
seem to have the time to contribute code the best way you personally create 
change would be (again) to file individual jiras for each enhancement or 
feature request.

highlight key ones you filed via the mailing list that you’d personally like to 
see prioritized - and advocate to have resources allocated towards implementing 
and ultimately get those scheduled for a release over other ones.

best,
kjellman

> On Feb 18, 2018, at 11:07 PM, Kenneth Brotman  
> wrote:
> 
> Hi Michael, actually I do very much like the database.  thanks for the 
> thoughts... a few comments:
> 
> 1) Lots of big companies like, let's see, Apple is a big one, probably could 
> easily justify contributing resources to finish up the basic development of 
> Cassandra. 
> 2) There are lots of big companies using Cassandra.  Each could contribute a 
> tiny effort and everyone would benefit greatly.
> 3) A focused effort by a small group of talented people like there are in 
> this group could knock it out easily.
> 4) Not everyone is a Cassandra coder.  It's not for me to do Michael.
> 5) I'm an individual.  I am not working at a big company at the moment 
> Michael.  
> 
> Best,
> Kenneth Brotman
> 
> 
> -Original Message-
> From: Michael Kjellman [mailto:kjell...@apple.com] 
> Sent: Sunday, February 18, 2018 10:18 PM
> To: dev@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
> 
> hi ken, sorry you don’t like the database. some thoughts:
> 
> 1) please file actionable jiras for places you feel need to be improved in 
> the database... this is the best way to make and encourage the change you’re 
> looking for. it seems you have quite a few ideas from your post that could be 
> broken down into individual actionable jiras.
> 2) please don’t cross post between mailing lists.
> 3) pull requests are always welcomed!
> 
> best,
> kjellman
> 
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
>> wrote:
>> 
>> Cassandra feels like an unfinished program to me.  The problem is not 
>> that it's open source or cutting edge.  It's an open source cutting 
>> edge program that lacks some of its basic functionality.  We are all 
>> stuck addressing fundamental mechanical tasks for Cassandra because 
>> the basic code that would do that part has not been contributed yet.
>> 
>> Ease of use issues need to be given much more attention.  For an 
>> administrator, the ease of use of Cassandra is very poor.
>> 
>> Furthermore, currently Cassandra is an idiot.  We have to do 
>> everything for Cassandra. Contrast that with the fact that we are in 
>> the dawn of artificial intelligence.
>> 
>> Software exists to automate tasks for humans, not mechanize humans to 
>> administer tasks for a database.  I'm an engineering type.  My job is 
>> to apply science and technology to solve real world problems.  And 
>> that's where I need an organization's I.T. talent to focus; not in 
>> crank starting an unfinished database.
>> 
>> For example, I should be able to go to any node, replace the 
>> Cassandra.yaml file and have a prompt on the display ask me if I want 
>> to update all the yaml files across the cluster.  I shouldn't have to 
>> manually modify yaml files on each node or have to create a script for 
>> some third party automation tool to do it.
>> 
>> I should not have to turn off service, clear directories, restart 
>> service in coordination with the other nodes.  It's already a computer 
>> system.  It can do those things on its own.
>> 
>> How about read repair.  First there is something wrong with the name.  
>> Maybe it should be called Consistency Repair.  An administrator 
>> shouldn't have to do anything.  It should be a behavior of Cassandra 
>> that is programmed in. It should consider the GC setting of each node, 
>> calculate how often it has to run repair, when it should run it so all 
>> the nodes aren't trying at the same time and when other circumstances 
>> indicate it should also run it.
>> 
>> Certificate management should be automated.
>> 
>> Cluster wide management should be a big theme in any next major release.
>> What is a major release?  How many major releases could a program have 
>> before all the coding for basic stuff like installation, configuration 
>> and maintenance is included!
>> 
>> Finish the basic coding of Cassandra, make it easy to use for 
>> administrators, make is smart, add cluster wide management.  Keep 
>> Cassand

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Michael Kjellman
hi ken, sorry you don’t like the database. some thoughts:

1) please file actionable jiras for places you feel need to be improved in the 
database... this is the best way to make and encourage the change you’re 
looking for. it seems you have quite a few ideas from your post that could be 
broken down into individual actionable jiras.
2) please don’t cross post between mailing lists.
3) pull requests are always welcomed!

best,
kjellman

> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me.  The problem is not that
> it's open source or cutting edge.  It's an open source cutting edge program
> that lacks some of its basic functionality.  We are all stuck addressing
> fundamental mechanical tasks for Cassandra because the basic code that would
> do that part has not been contributed yet.
> 
> Ease of use issues need to be given much more attention.  For an
> administrator, the ease of use of Cassandra is very poor.  
> 
> Furthermore, currently Cassandra is an idiot.  We have to do everything for
> Cassandra. Contrast that with the fact that we are in the dawn of artificial
> intelligence.
> 
> Software exists to automate tasks for humans, not mechanize humans to
> administer tasks for a database.  I'm an engineering type.  My job is to
> apply science and technology to solve real world problems.  And that's where
> I need an organization's I.T. talent to focus; not in crank starting an
> unfinished database.
> 
> For example, I should be able to go to any node, replace the Cassandra.yaml
> file and have a prompt on the display ask me if I want to update all the
> yaml files across the cluster.  I shouldn't have to manually modify yaml
> files on each node or have to create a script for some third party
> automation tool to do it.  
> 
> I should not have to turn off service, clear directories, restart service in
> coordination with the other nodes.  It's already a computer system.  It can
> do those things on its own.
> 
> How about read repair.  First there is something wrong with the name.  Maybe
> it should be called Consistency Repair.  An administrator shouldn't have to
> do anything.  It should be a behavior of Cassandra that is programmed in. It
> should consider the GC setting of each node, calculate how often it has to
> run repair, when it should run it so all the nodes aren't trying at the same
> time and when other circumstances indicate it should also run it.
> 
> Certificate management should be automated.
> 
> Cluster wide management should be a big theme in any next major release.
> What is a major release?  How many major releases could a program have
> before all the coding for basic stuff like installation, configuration and
> maintenance is included!
> 
> Finish the basic coding of Cassandra, make it easy to use for
> administrators, make is smart, add cluster wide management.  Keep Cassandra
> competitive or it will soon be the old Model T we all remember fondly.
> 
> I ask the Committee to compile a list of all such items, make a plan, and
> commit to including the completed and tested code as part of major release
> 5.0.  I further ask that release 4.0 not be delayed and then there be an
> unusually short skip to version 5.0. 
> 
> Kenneth Brotman
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman
No worries.. it looks like we didn't update the circleci config in tree for the 
cassandra-3.0 and cassandra-3.11 branches. We should do that -- i'll take that 
as an action item on me... for now though you can grab the same one from trunk:

CircleCI Configuration: 
https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml

FYI: kjellman/cassandra-test:0.4.3 is was built from the Docker file as 
committed here: 
https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile

best,
kjellman

On Feb 14, 2018, at 10:24 AM, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:

I apologize for being naive. The test runs do not come from the in-tree
circle.yml file and I need to do something different?

On 02/14/2018 12:21 PM, Michael Kjellman wrote:
please use the latest circleci config that is in trunk. looking at the config 
you used in your run you’re using the old circleci 1.0 based config.

On Feb 14, 2018, at 10:16 AM, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:

So far, I have had very unpredictable test runs from CirceCI and ASF
Jenkins. Commit 890f319 resulted in 4 completely different test failures
for me today in CircleCI.

https://circleci.com/gh/mshuler/cassandra/150

I trust results from the static ASF slaves even less.

`ant test-all -Dtest.name=CommitLogSegmentBackpressureTest` passed for
me locally.

¯\_(ツ)_/¯

I do not see any permissions issues in the output of this repeating test
failure from our internal Jenkins, but it appears that an instance of
Cassandra is left running from some other test, perhaps, and this test
cannot bind.

I'm fine with cutting a new release set from 890f319 and chalking this
up to test flakiness.

--
Kind regards,
Michael

On 02/14/2018 11:50 AM, Michael Kjellman wrote:
the tests are writing something to java.io.tmpdir which for whatever reason 
isn’t writable. the same exact tests work locally and in the cassandra-test 
docker image.

given everyone can partake on circleci runs and asf jenkins i think we should 
send out those links and base the vote off those runs.

On Feb 14, 2018, at 9:48 AM, Michael Shuler  wrote:

This is an internal Jenkins instance that's not reachable on the internet.

What's the permissions issue? The test runs on this internal instance
are exactly the way cassci used to run - launch a scratch machine, check
out, build, run tests.

I'll see if I can repro locally.

Michael

On 02/14/2018 11:44 AM, Michael Kjellman wrote:
i looked at this a few weeks back. this is asf jenkins right? if so, it’s a 
permissions issue on the build executors

On Feb 14, 2018, at 9:40 AM, Michael Shuler  wrote:

Thanks for the feedback on 3.0.16 tentative release.

Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
(in both standard and -compression suites) in CI for me. This test has
failed 20 times in the last 20 runs. Test output attached.

Do we wish to fix this before the next cut, since we're here? :)

--
Kind regards,
Michael

On 02/14/2018 07:30 AM, Jason Brown wrote:
I think we can attempt another build and vote now.

On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown  wrote:

CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).

On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams 
wrote:

I change my vote to -1 binding as well.

On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown 
wrote:

-1, binding. Unit tests are broken:
https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50

Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to update
some logging messages, which broke ViewComplexTest. The errors like
this:

junit.framework.AssertionFailedError: Expected error message to contain
'Cannot drop column a on base table with materialized views', but got
'Cannot drop column a on base table table_21 with materialized views.'

Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
which
fixed a few of the errors, but there are four outstanding failures. I
created CASSANDRA-14219 last week, and assigned it to Dave, but he might
have missed the notification. Dinesh Joshi has a patch that I will
review
ASAP.

Michael, is there a link of where you ran the tests? If so, can you
include
it in the future [VOTE] emails?

Thanks,

-Jason



On Tue, Feb 13, 2018 at 11:03 AM, Jon Haddad  wrote:

+1

On Feb 13, 2018, at 10:52 AM, Josh McKenzie 
wrote:

+1

On Feb 13, 2018 9:20 AM, "Marcus Eriksson" 
wrote:

+1

On Tue, Feb 13, 2018 at 1:29 PM, Aleksey Yeshchenko <
alek...@apple.com>
wrote:

+1

—
AY

On 12 February 2018 at 20:31:23, Michael Shuler (
mich...@pbandjelly.org)
wrote:

I propose the following artifacts for release as 3.0.16.

sha1: 91e83c72de109521074b14a8eeae1309c3b1f215
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
shortlog;h=refs/tags/3.0.16-tentative
Artifacts:
https://repository.apache.org/content/

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman
please use the latest circleci config that is in trunk. looking at the config 
you used in your run you’re using the old circleci 1.0 based config. 

> On Feb 14, 2018, at 10:16 AM, Michael Shuler  wrote:
> 
> So far, I have had very unpredictable test runs from CirceCI and ASF
> Jenkins. Commit 890f319 resulted in 4 completely different test failures
> for me today in CircleCI.
> 
> https://circleci.com/gh/mshuler/cassandra/150
> 
> I trust results from the static ASF slaves even less.
> 
> `ant test-all -Dtest.name=CommitLogSegmentBackpressureTest` passed for
> me locally.
> 
>  ¯\_(ツ)_/¯
> 
> I do not see any permissions issues in the output of this repeating test
> failure from our internal Jenkins, but it appears that an instance of
> Cassandra is left running from some other test, perhaps, and this test
> cannot bind.
> 
> I'm fine with cutting a new release set from 890f319 and chalking this
> up to test flakiness.
> 
> -- 
> Kind regards,
> Michael
> 
>> On 02/14/2018 11:50 AM, Michael Kjellman wrote:
>> the tests are writing something to java.io.tmpdir which for whatever reason 
>> isn’t writable. the same exact tests work locally and in the cassandra-test 
>> docker image.
>> 
>> given everyone can partake on circleci runs and asf jenkins i think we 
>> should send out those links and base the vote off those runs.
>> 
>>> On Feb 14, 2018, at 9:48 AM, Michael Shuler  wrote:
>>> 
>>> This is an internal Jenkins instance that's not reachable on the internet.
>>> 
>>> What's the permissions issue? The test runs on this internal instance
>>> are exactly the way cassci used to run - launch a scratch machine, check
>>> out, build, run tests.
>>> 
>>> I'll see if I can repro locally.
>>> 
>>> Michael
>>> 
>>>> On 02/14/2018 11:44 AM, Michael Kjellman wrote:
>>>> i looked at this a few weeks back. this is asf jenkins right? if so, it’s 
>>>> a permissions issue on the build executors 
>>>> 
>>>>> On Feb 14, 2018, at 9:40 AM, Michael Shuler  
>>>>> wrote:
>>>>> 
>>>>> Thanks for the feedback on 3.0.16 tentative release.
>>>>> 
>>>>> Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
>>>>> (in both standard and -compression suites) in CI for me. This test has
>>>>> failed 20 times in the last 20 runs. Test output attached.
>>>>> 
>>>>> Do we wish to fix this before the next cut, since we're here? :)
>>>>> 
>>>>> -- 
>>>>> Kind regards,
>>>>> Michael
>>>>> 
>>>>>> On 02/14/2018 07:30 AM, Jason Brown wrote:
>>>>>> I think we can attempt another build and vote now.
>>>>>> 
>>>>>>> On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
>>>>>>> workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).
>>>>>>> 
>>>>>>> On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I change my vote to -1 binding as well.
>>>>>>>> 
>>>>>>>> On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> -1, binding. Unit tests are broken:
>>>>>>>>> https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50
>>>>>>>>> 
>>>>>>>>> Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to 
>>>>>>>>> update
>>>>>>>>> some logging messages, which broke ViewComplexTest. The errors like
>>>>>>>> this:
>>>>>>>>> 
>>>>>>>>> junit.framework.AssertionFailedError: Expected error message to 
>>>>>>>>> contain
>>>>>>>>> 'Cannot drop column a on base table with materialized views', but got
>>>>>>>>> 'Cannot drop column a on base table table_21 with materialized views.'
>>>>>>>>> 
>>>>>>>>> Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
>>>>>>>>> w

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman
the tests are writing something to java.io.tmpdir which for whatever reason 
isn’t writable. the same exact tests work locally and in the cassandra-test 
docker image.

given everyone can partake on circleci runs and asf jenkins i think we should 
send out those links and base the vote off those runs.

> On Feb 14, 2018, at 9:48 AM, Michael Shuler  wrote:
> 
> This is an internal Jenkins instance that's not reachable on the internet.
> 
> What's the permissions issue? The test runs on this internal instance
> are exactly the way cassci used to run - launch a scratch machine, check
> out, build, run tests.
> 
> I'll see if I can repro locally.
> 
> Michael
> 
>> On 02/14/2018 11:44 AM, Michael Kjellman wrote:
>> i looked at this a few weeks back. this is asf jenkins right? if so, it’s a 
>> permissions issue on the build executors 
>> 
>>> On Feb 14, 2018, at 9:40 AM, Michael Shuler  wrote:
>>> 
>>> Thanks for the feedback on 3.0.16 tentative release.
>>> 
>>> Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
>>> (in both standard and -compression suites) in CI for me. This test has
>>> failed 20 times in the last 20 runs. Test output attached.
>>> 
>>> Do we wish to fix this before the next cut, since we're here? :)
>>> 
>>> -- 
>>> Kind regards,
>>> Michael
>>> 
>>>> On 02/14/2018 07:30 AM, Jason Brown wrote:
>>>> I think we can attempt another build and vote now.
>>>> 
>>>>> On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown  wrote:
>>>>> 
>>>>> CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
>>>>> workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).
>>>>> 
>>>>> On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams 
>>>>> wrote:
>>>>> 
>>>>>> I change my vote to -1 binding as well.
>>>>>> 
>>>>>> On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown 
>>>>>> wrote:
>>>>>> 
>>>>>>> -1, binding. Unit tests are broken:
>>>>>>> https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50
>>>>>>> 
>>>>>>> Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to update
>>>>>>> some logging messages, which broke ViewComplexTest. The errors like
>>>>>> this:
>>>>>>> 
>>>>>>> junit.framework.AssertionFailedError: Expected error message to contain
>>>>>>> 'Cannot drop column a on base table with materialized views', but got
>>>>>>> 'Cannot drop column a on base table table_21 with materialized views.'
>>>>>>> 
>>>>>>> Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
>>>>>>> which
>>>>>>> fixed a few of the errors, but there are four outstanding failures. I
>>>>>>> created CASSANDRA-14219 last week, and assigned it to Dave, but he might
>>>>>>> have missed the notification. Dinesh Joshi has a patch that I will
>>>>>> review
>>>>>>> ASAP.
>>>>>>> 
>>>>>>> Michael, is there a link of where you ran the tests? If so, can you
>>>>>> include
>>>>>>> it in the future [VOTE] emails?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> -Jason
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Feb 13, 2018 at 11:03 AM, Jon Haddad  
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>>> On Feb 13, 2018, at 10:52 AM, Josh McKenzie 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> On Feb 13, 2018 9:20 AM, "Marcus Eriksson" 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> +1
>>>>>>>>>> 
>>>>>>>>>> On Tue, Feb 13, 2018 at 1:29 PM, Aleksey Yeshchenko <
>>>>>>> alek...@apple.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 

Re: [VOTE FAILED] Release Apache Cassandra 3.0.16

2018-02-14 Thread Michael Kjellman
i looked at this a few weeks back. this is asf jenkins right? if so, it’s a 
permissions issue on the build executors 

> On Feb 14, 2018, at 9:40 AM, Michael Shuler  wrote:
> 
> Thanks for the feedback on 3.0.16 tentative release.
> 
> Commit 890f319 (current cassandra-3.0 branch HEAD) fails only one test
> (in both standard and -compression suites) in CI for me. This test has
> failed 20 times in the last 20 runs. Test output attached.
> 
> Do we wish to fix this before the next cut, since we're here? :)
> 
> -- 
> Kind regards,
> Michael
> 
>> On 02/14/2018 07:30 AM, Jason Brown wrote:
>> I think we can attempt another build and vote now.
>> 
>>> On Tue, Feb 13, 2018 at 3:44 PM, Jason Brown  wrote:
>>> 
>>> CASSANDRA-14219 is committed and tests look clean (https://circleci.com/
>>> workflow-run/d0a2622a-e74f-4c46-b0ad-a84ca063736f).
>>> 
>>> On Tue, Feb 13, 2018 at 1:47 PM, Brandon Williams 
>>> wrote:
>>> 
 I change my vote to -1 binding as well.
 
 On Tue, Feb 13, 2018 at 3:43 PM, Jason Brown 
 wrote:
 
> -1, binding. Unit tests are broken:
> https://circleci.com/gh/jasobrown/cassandra/451#tests/containers/50
> 
> Dave ninja-committed 7df36056b12a13b60097b7a9a4f8155a1d02ff62 to update
> some logging messages, which broke ViewComplexTest. The errors like
 this:
> 
> junit.framework.AssertionFailedError: Expected error message to contain
> 'Cannot drop column a on base table with materialized views', but got
> 'Cannot drop column a on base table table_21 with materialized views.'
> 
> Dave has a followup commit, 40148a178bd9b74b731591aa46b4158efb16b742,
> which
> fixed a few of the errors, but there are four outstanding failures. I
> created CASSANDRA-14219 last week, and assigned it to Dave, but he might
> have missed the notification. Dinesh Joshi has a patch that I will
 review
> ASAP.
> 
> Michael, is there a link of where you ran the tests? If so, can you
 include
> it in the future [VOTE] emails?
> 
> Thanks,
> 
> -Jason
> 
> 
> 
>> On Tue, Feb 13, 2018 at 11:03 AM, Jon Haddad  wrote:
>> 
>> +1
>> 
>>> On Feb 13, 2018, at 10:52 AM, Josh McKenzie 
>> wrote:
>>> 
>>> +1
>>> 
>>> On Feb 13, 2018 9:20 AM, "Marcus Eriksson" 
 wrote:
>>> 
 +1
 
 On Tue, Feb 13, 2018 at 1:29 PM, Aleksey Yeshchenko <
> alek...@apple.com>
 wrote:
 
> +1
> 
> —
> AY
> 
> On 12 February 2018 at 20:31:23, Michael Shuler (
>> mich...@pbandjelly.org)
> wrote:
> 
> I propose the following artifacts for release as 3.0.16.
> 
> sha1: 91e83c72de109521074b14a8eeae1309c3b1f215
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.16-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1154/org/apache/cassandra/apache-
> cassandra/3.0.16/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1154/
> 
> Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
> 
> *** This release addresses an important fix for CASSANDRA-14092
 ***
> "Max ttl of 20 years will overflow localDeletionTime"
> https://issues.apache.org/jira/browse/CASSANDRA-14092
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/rLj59Z
> [2]: (NEWS.txt) https://goo.gl/EkrT4G
> 
> 
 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 
> 
 
>>> 
>>> 
>> 
> 
> <3.0-890f319-testall-failure.txt>
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Michael Kjellman
why are people inserting data with a 15+ year TTL? sorta curious about the 
actual use case for that.

> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> 
> The assertion was working fine until yesterday 03:14 UTC.
> 
> The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
> 
> If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
> 
> regards,
> Christian
> 
> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
> wrote:
> 
>> Thanks for raising this. Agreed this is bad, when I filed
>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>> 3.0+
>> 
>> I propose adding the assertion back so writes will fail, and reduce
>> the max TTL to something like 15 years for the time being while we
>> figure a long term solution.
>> 
>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
>>> If you aren’t getting an error, then I agree, that is very bad.  Looking
>> at the 3.0 code it looks like the assertion checking for overflow was
>> dropped somewhere along the way, I had only been looking into 2.1 where you
>> get an assertion error that fails the query.
>>> 
>>> -Jeremiah
>>> 
 On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> wrote:
 
 
 Hi Jeremiah,
 Validation is on TTL value not on (system_time+ TTL). You can test it
>> with below example. Insert is successful, overflow happens silently and
>> data is lost:
 create table test(name text primary key,age int);
 insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
 select * from test where name='test_20yrs';
 
 name | age
 --+-
 
 (0 rows)
 
 insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="ttl is too large. requested (630720001) maximum (63072)"
 ThanksAnuj
   On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> jeremiah.jor...@gmail.com> wrote:
 
 Where is the dataloss?  Does the INSERT operation return successfully
>> to the client in this case?  From reading the linked issues it sounds like
>> you get an error client side.
 
 -Jeremiah
 
> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating
>> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
>> can silently cause irrecoverable Data Loss. This seems like a certain TOP
>> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>> one seems to be actively working on it. Just like any other critical
>> vulnerability, this vulnerability demands immediate attention from some
>> very experienced folks to bring out an Urgent Fast Track Patch for all
>> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>> understanding of the JIRA comments, the changes may not be that trivial for
>> older releases. So, community support on the patch is very much appreciated.
> 
> Thanks
> Anuj
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 



Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-10 Thread Michael Kjellman
another thought is to have the 
sufficient_system_resources_for_resource_intensive_tests fixture dynamically 
figure out the number of threads to run stress with. seems reasonable we should 
significantly lower our concurrency dynamically when we are resource 
constrained. 

> On Jan 10, 2018, at 1:53 PM, Michael Kjellman  
> wrote:
> 
> i had done some limited testing on the medium size an didn't see quite as bad 
> behavior you were seeing... :\
> 
> i added a test fixture 
> (sufficient_system_resources_for_resource_intensive_tests) that just 
> currently does a very very basic check free memory check and deselects tests 
> annotated with the @pytest.mark.resource_intensive annotation if the current 
> system doesn't have enough resources.
> 
> my short/medium term thinking was that we could expand on this and 
> dynamically skip tests for whatever physical resource constraints we're 
> working with -- with the ultimate goal to dynamically run as many tests 
> reliably as possible given what we have.
> 
> Any chance you'd mind changing your circleci config to set CCM_MAX_HEAP_SIZE 
> under resource_constrained_env_vars to 769MB and kicking off another run to 
> get us a baseline? I see a ton of the failures were from tests that run 
> stress to pre-fill the cluster for the test.. do you know if we have a way to 
> control the heap settings of stress when it's invoked via ccm.node as we do 
> in the dtests?
> 
> On Jan 10, 2018, at 1:04 PM, Stefan Podkowinski 
> mailto:s...@apache.org>> wrote:
> 
> I was giving this another try today to see how long it would take to
> finish on a oss account. But I've canceled the job after some hours as
> tests started to fail almost constantly.
> 
> https://circleci.com/gh/spodkowinski/cassandra/176
> 
> Looks like the 2CPU/4096MB (medium) limit for each container isn't
> really adequate for dtests. Yours seem to be running on xlarge.
> 
> 
> On 10.01.18 21:05, Michael Kjellman wrote:
> plan of action is to continue running everything on asf jenkins.
> 
> in additional all developers (just like today) will be free to run the unit 
> tests and as many of the dtests as possible against their local test branches 
> in circleci. circleci offers a free OSS account with 4 containers. while it 
> will be slow, it will run. additionally anyone who wants more speed is 
> obviously free to upgrade their account.
> 
> does that plan resolve any concerns you have?
> 
> On Jan 10, 2018, at 12:01 PM, Josh McKenzie  wrote:
> 
> 1) have *all* our tests run on *every* commit
> Have we discussed the cost / funding aspect of this? I know we as a project
> have run into infra-donation cost issues in the past with differentiating
> between ASF as a whole and cassandra as a project, so not sure how that'd
> work in terms of sponsors funding circleci containers just for this
> project's use, for instance.
> 
> This is a huge improvement in runtime (understatement of the day award...)
> so great work on that front.
> 
> 
> 
> On Tue, Jan 9, 2018 at 11:04 PM, Nate McCall  wrote:
> 
> Making these tests more accessible and reliable is super huge. There
> are a lot of folks in our community who are not well versed with
> python (myself included). I wholly support *any* efforts we can make
> for the dtest process to be easy.
> 
> Thanks a bunch for taking this on. I think it will pay off quickly.
> 
> On Wed, Jan 10, 2018 at 4:55 PM, Michael Kjellman 
> wrote:
> hi!
> 
> a few of us have been continuously iterating on the dtest-on-pytest
> branch now since the 2nd and we’ve run the dtests close to 600 times in ci.
> ariel has been working his way thru a formal review (three cheers for
> ariel!)
> flaky tests are a real thing and despite a few dozen totally green test
> runs, the vast majority of runs are still reliably hitting roughly 1-3 test
> failures. in a world where we can now run the dtests in 20 minutes instead
> of 13 hours it’s now at least possible to keep finding these flaky tests
> and fixing them one by one...
> i haven’t gotten a huge amount of feedback overall and i really want to
> hear it! ultimately this work is driven by the desire to 1) have *all* our
> tests run on *every* commit; 2) be able to trust the results; 3) make our
> testing story so amazing that even the most casual weekend warrior who
> wants to work on the project can (and will want to!) use it.
> i’m *not* a python guy (although lucky i know and work with many who
> are). thankfully i’ve been able to defer to them for much of this largely
> python based effort i’m sure there are a few more people working on the
> project who do consider themselves python experts and i’d especially

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-10 Thread Michael Kjellman
i had done some limited testing on the medium size an didn't see quite as bad 
behavior you were seeing... :\

i added a test fixture 
(sufficient_system_resources_for_resource_intensive_tests) that just currently 
does a very very basic check free memory check and deselects tests annotated 
with the @pytest.mark.resource_intensive annotation if the current system 
doesn't have enough resources.

my short/medium term thinking was that we could expand on this and dynamically 
skip tests for whatever physical resource constraints we're working with -- 
with the ultimate goal to dynamically run as many tests reliably as possible 
given what we have.

Any chance you'd mind changing your circleci config to set CCM_MAX_HEAP_SIZE 
under resource_constrained_env_vars to 769MB and kicking off another run to get 
us a baseline? I see a ton of the failures were from tests that run stress to 
pre-fill the cluster for the test.. do you know if we have a way to control the 
heap settings of stress when it's invoked via ccm.node as we do in the dtests?

On Jan 10, 2018, at 1:04 PM, Stefan Podkowinski 
mailto:s...@apache.org>> wrote:

I was giving this another try today to see how long it would take to
finish on a oss account. But I've canceled the job after some hours as
tests started to fail almost constantly.

https://circleci.com/gh/spodkowinski/cassandra/176

Looks like the 2CPU/4096MB (medium) limit for each container isn't
really adequate for dtests. Yours seem to be running on xlarge.


On 10.01.18 21:05, Michael Kjellman wrote:
plan of action is to continue running everything on asf jenkins.

in additional all developers (just like today) will be free to run the unit 
tests and as many of the dtests as possible against their local test branches 
in circleci. circleci offers a free OSS account with 4 containers. while it 
will be slow, it will run. additionally anyone who wants more speed is 
obviously free to upgrade their account.

does that plan resolve any concerns you have?

On Jan 10, 2018, at 12:01 PM, Josh McKenzie  wrote:

1) have *all* our tests run on *every* commit
Have we discussed the cost / funding aspect of this? I know we as a project
have run into infra-donation cost issues in the past with differentiating
between ASF as a whole and cassandra as a project, so not sure how that'd
work in terms of sponsors funding circleci containers just for this
project's use, for instance.

This is a huge improvement in runtime (understatement of the day award...)
so great work on that front.



On Tue, Jan 9, 2018 at 11:04 PM, Nate McCall  wrote:

Making these tests more accessible and reliable is super huge. There
are a lot of folks in our community who are not well versed with
python (myself included). I wholly support *any* efforts we can make
for the dtest process to be easy.

Thanks a bunch for taking this on. I think it will pay off quickly.

On Wed, Jan 10, 2018 at 4:55 PM, Michael Kjellman 
wrote:
hi!

a few of us have been continuously iterating on the dtest-on-pytest
branch now since the 2nd and we’ve run the dtests close to 600 times in ci.
ariel has been working his way thru a formal review (three cheers for
ariel!)
flaky tests are a real thing and despite a few dozen totally green test
runs, the vast majority of runs are still reliably hitting roughly 1-3 test
failures. in a world where we can now run the dtests in 20 minutes instead
of 13 hours it’s now at least possible to keep finding these flaky tests
and fixing them one by one...
i haven’t gotten a huge amount of feedback overall and i really want to
hear it! ultimately this work is driven by the desire to 1) have *all* our
tests run on *every* commit; 2) be able to trust the results; 3) make our
testing story so amazing that even the most casual weekend warrior who
wants to work on the project can (and will want to!) use it.
i’m *not* a python guy (although lucky i know and work with many who
are). thankfully i’ve been able to defer to them for much of this largely
python based effort i’m sure there are a few more people working on the
project who do consider themselves python experts and i’d especially
appreciate your feedback!
finally, a lot of my effort was focused around improving the end users
experience (getting bootstrapped, running the tests, improving the
debugability story, etc). i’d really appreciate it if people could try
running the pytest branch and following the install instructions to figure
out what could be improved on. any existing behavior i’ve inadvertently now
removed that’s going to make someone’s life miserable? 😅
thanks! looking forward to hearing any and all feedback from the
community!
best,
kjellman



On Jan 3, 2018, at 8:08 AM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:
no, i’m not. i just figured i should target python 3.6 if i was doing
this work in the first place. the current Ubuntu LTS was p

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-10 Thread Michael Kjellman
plan of action is to continue running everything on asf jenkins.

in additional all developers (just like today) will be free to run the unit 
tests and as many of the dtests as possible against their local test branches 
in circleci. circleci offers a free OSS account with 4 containers. while it 
will be slow, it will run. additionally anyone who wants more speed is 
obviously free to upgrade their account.

does that plan resolve any concerns you have?

On Jan 10, 2018, at 12:01 PM, Josh McKenzie  wrote:

>> 
>> 1) have *all* our tests run on *every* commit
> 
> Have we discussed the cost / funding aspect of this? I know we as a project
> have run into infra-donation cost issues in the past with differentiating
> between ASF as a whole and cassandra as a project, so not sure how that'd
> work in terms of sponsors funding circleci containers just for this
> project's use, for instance.
> 
> This is a huge improvement in runtime (understatement of the day award...)
> so great work on that front.
> 
> 
> 
>> On Tue, Jan 9, 2018 at 11:04 PM, Nate McCall  wrote:
>> 
>> Making these tests more accessible and reliable is super huge. There
>> are a lot of folks in our community who are not well versed with
>> python (myself included). I wholly support *any* efforts we can make
>> for the dtest process to be easy.
>> 
>> Thanks a bunch for taking this on. I think it will pay off quickly.
>> 
>> On Wed, Jan 10, 2018 at 4:55 PM, Michael Kjellman 
>> wrote:
>>> hi!
>>> 
>>> a few of us have been continuously iterating on the dtest-on-pytest
>> branch now since the 2nd and we’ve run the dtests close to 600 times in ci.
>> ariel has been working his way thru a formal review (three cheers for
>> ariel!)
>>> 
>>> flaky tests are a real thing and despite a few dozen totally green test
>> runs, the vast majority of runs are still reliably hitting roughly 1-3 test
>> failures. in a world where we can now run the dtests in 20 minutes instead
>> of 13 hours it’s now at least possible to keep finding these flaky tests
>> and fixing them one by one...
>>> 
>>> i haven’t gotten a huge amount of feedback overall and i really want to
>> hear it! ultimately this work is driven by the desire to 1) have *all* our
>> tests run on *every* commit; 2) be able to trust the results; 3) make our
>> testing story so amazing that even the most casual weekend warrior who
>> wants to work on the project can (and will want to!) use it.
>>> 
>>> i’m *not* a python guy (although lucky i know and work with many who
>> are). thankfully i’ve been able to defer to them for much of this largely
>> python based effort i’m sure there are a few more people working on the
>> project who do consider themselves python experts and i’d especially
>> appreciate your feedback!
>>> 
>>> finally, a lot of my effort was focused around improving the end users
>> experience (getting bootstrapped, running the tests, improving the
>> debugability story, etc). i’d really appreciate it if people could try
>> running the pytest branch and following the install instructions to figure
>> out what could be improved on. any existing behavior i’ve inadvertently now
>> removed that’s going to make someone’s life miserable? 😅
>>> 
>>> thanks! looking forward to hearing any and all feedback from the
>> community!
>>> 
>>> best,
>>> kjellman
>>> 
>>> 
>>> 
>>> On Jan 3, 2018, at 8:08 AM, Michael Kjellman <
>> mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:
>>> 
>>> no, i’m not. i just figured i should target python 3.6 if i was doing
>> this work in the first place. the current Ubuntu LTS was pulling in a
>> pretty old version. any concerns with using 3.6?
>>> 
>>> On Jan 3, 2018, at 1:51 AM, Stefan Podkowinski > s...@apache.org>> wrote:
>>> 
>>> The latest updates to your branch fixed the logging issue, thanks! Tests
>>> now seem to execute fine locally using pytest.
>>> 
>>> I was looking at the dockerfile and noticed that you explicitly use
>>> python 3.6 there. Are you aware of any issues with older python3
>>> versions, e.g. 3.5? Do I have to use 3.6 as well locally and do we have
>>> to do the same for jenkins?
>>> 
>>> 
>>> On 02.01.2018 22:42, Michael Kjellman wrote:
>>> I reproduced the NOTSET log issue locally... got a fix.. i'll push a
>> commit up in a moment.
>>> 
>>> On Jan 2, 2018, at 11:24 AM, Michael 

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-09 Thread Michael Kjellman
hi!

a few of us have been continuously iterating on the dtest-on-pytest branch now 
since the 2nd and we’ve run the dtests close to 600 times in ci. ariel has been 
working his way thru a formal review (three cheers for ariel!)

flaky tests are a real thing and despite a few dozen totally green test runs, 
the vast majority of runs are still reliably hitting roughly 1-3 test failures. 
in a world where we can now run the dtests in 20 minutes instead of 13 hours 
it’s now at least possible to keep finding these flaky tests and fixing them 
one by one...

i haven’t gotten a huge amount of feedback overall and i really want to hear 
it! ultimately this work is driven by the desire to 1) have *all* our tests run 
on *every* commit; 2) be able to trust the results; 3) make our testing story 
so amazing that even the most casual weekend warrior who wants to work on the 
project can (and will want to!) use it.

i’m *not* a python guy (although lucky i know and work with many who are). 
thankfully i’ve been able to defer to them for much of this largely python 
based effort i’m sure there are a few more people working on the project 
who do consider themselves python experts and i’d especially appreciate your 
feedback!

finally, a lot of my effort was focused around improving the end users 
experience (getting bootstrapped, running the tests, improving the debugability 
story, etc). i’d really appreciate it if people could try running the pytest 
branch and following the install instructions to figure out what could be 
improved on. any existing behavior i’ve inadvertently now removed that’s going 
to make someone’s life miserable? 😅

thanks! looking forward to hearing any and all feedback from the community!

best,
kjellman



On Jan 3, 2018, at 8:08 AM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

no, i’m not. i just figured i should target python 3.6 if i was doing this work 
in the first place. the current Ubuntu LTS was pulling in a pretty old version. 
any concerns with using 3.6?

On Jan 3, 2018, at 1:51 AM, Stefan Podkowinski 
mailto:s...@apache.org>> wrote:

The latest updates to your branch fixed the logging issue, thanks! Tests
now seem to execute fine locally using pytest.

I was looking at the dockerfile and noticed that you explicitly use
python 3.6 there. Are you aware of any issues with older python3
versions, e.g. 3.5? Do I have to use 3.6 as well locally and do we have
to do the same for jenkins?


On 02.01.2018 22:42, Michael Kjellman wrote:
I reproduced the NOTSET log issue locally... got a fix.. i'll push a commit up 
in a moment.

On Jan 2, 2018, at 11:24 AM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

Comments Inline: Thanks for giving this a go!!

On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski 
mailto:s...@apache.org>> wrote:

I was giving this a try today with some mixed results. First of all,
running pytest locally would fail with an "ccmlib.common.ArgumentError:
Unknown log level NOTSET" error for each test. Although I created a new
virtualenv for that as described in the readme (thanks for updating!)
and use both of your dtest and cassandra branches. But I haven't patched
ccm as described in the ticket, maybe that's why? Can you publish a
patched ccm branch to gh?

99% sure this is an issue parsing the logging level passed to pytest to the 
python logger... could you paste the exact command you're using to invoke 
pytest? should be a small change - i'm sure i just missed a invocation case.


The updated circle.yml is now using docker, which seems to be a good
idea to reduce clutter in the yaml file and gives us more control over
the test environment. Can you add the Dockerfile to the .circleci
directory as well? I couldn't find it when I was trying to solve the
pytest error mentioned above.

This is already tracked in a separate repo: 
https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile

Next thing I did was to push your trunk_circle branch to my gh repo to
start a circleCI run. Finishing all dtests in 15 minutes sounds
exciting, but requires a paid tier plan to get that kind of
parallelization. Looks like the dtests have even been deliberately
disabled for non-paid accounts, so I couldn't test this any further.

the plan of action (i already already mentioned this in previous emails) is to 
get dtests working for the free circieci oss accounts as well. part of this 
work (already included in this pytest effort) is to have fixtures that look at 
the system resources and dynamically include tests as possible.


Running dtests from the pytest branch on 
builds.apache.org<http://builds.apache.org> did not work
either. At least the run_dtests.py arguments will need to be updated in
cassandra-builds. We currently only use a single cassandra-dtest.sh
script for all builds. Maybe we should create a new job template that
would use an updated script w

Re: Question on submitting a patch

2018-01-05 Thread Michael Kjellman
great question! i personally always just leave everything blank to effectively 
just get the stupid state changed reason number 3000 why I personally hate 
JIRA.

> On Jan 5, 2018, at 12:44 PM, Tyagi, Preetika  wrote:
> 
> Hi all,
> 
> When I click on "Submit Patch" option, it pops up a new screen where it asks 
> for a bunch of details including Fix Version(s). Does the patch need to be 
> synced up with the latest repo or I can just choose which version I worked 
> with (which may not necessarily be the latest and hence one would be need to 
> fetch that specific repo version in order to compile source with the patch)?
> 
> Also, there is no option to upload the patch file on this screen. Can someone 
> point out where to actually upload the patch? I haven't done before so might 
> be asking dumb questions! :)
> 
> Thanks,
> Preetika
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-03 Thread Michael Kjellman
no, i’m not. i just figured i should target python 3.6 if i was doing this work 
in the first place. the current Ubuntu LTS was pulling in a pretty old version. 
any concerns with using 3.6?

> On Jan 3, 2018, at 1:51 AM, Stefan Podkowinski  wrote:
> 
> The latest updates to your branch fixed the logging issue, thanks! Tests
> now seem to execute fine locally using pytest.
> 
> I was looking at the dockerfile and noticed that you explicitly use
> python 3.6 there. Are you aware of any issues with older python3
> versions, e.g. 3.5? Do I have to use 3.6 as well locally and do we have
> to do the same for jenkins?
> 
> 
>> On 02.01.2018 22:42, Michael Kjellman wrote:
>> I reproduced the NOTSET log issue locally... got a fix.. i'll push a commit 
>> up in a moment.
>> 
>>> On Jan 2, 2018, at 11:24 AM, Michael Kjellman 
>>>  wrote:
>>> 
>>> Comments Inline: Thanks for giving this a go!!
>>> 
>>>> On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski  wrote:
>>>> 
>>>> I was giving this a try today with some mixed results. First of all,
>>>> running pytest locally would fail with an "ccmlib.common.ArgumentError:
>>>> Unknown log level NOTSET" error for each test. Although I created a new
>>>> virtualenv for that as described in the readme (thanks for updating!)
>>>> and use both of your dtest and cassandra branches. But I haven't patched
>>>> ccm as described in the ticket, maybe that's why? Can you publish a
>>>> patched ccm branch to gh?
>>> 
>>> 99% sure this is an issue parsing the logging level passed to pytest to the 
>>> python logger... could you paste the exact command you're using to invoke 
>>> pytest? should be a small change - i'm sure i just missed a invocation case.
>>> 
>>>> 
>>>> The updated circle.yml is now using docker, which seems to be a good
>>>> idea to reduce clutter in the yaml file and gives us more control over
>>>> the test environment. Can you add the Dockerfile to the .circleci
>>>> directory as well? I couldn't find it when I was trying to solve the
>>>> pytest error mentioned above.
>>> 
>>> This is already tracked in a separate repo: 
>>> https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile
>>>> 
>>>> Next thing I did was to push your trunk_circle branch to my gh repo to
>>>> start a circleCI run. Finishing all dtests in 15 minutes sounds
>>>> exciting, but requires a paid tier plan to get that kind of
>>>> parallelization. Looks like the dtests have even been deliberately
>>>> disabled for non-paid accounts, so I couldn't test this any further.
>>> 
>>> the plan of action (i already already mentioned this in previous emails) is 
>>> to get dtests working for the free circieci oss accounts as well. part of 
>>> this work (already included in this pytest effort) is to have fixtures that 
>>> look at the system resources and dynamically include tests as possible.
>>> 
>>>> 
>>>> Running dtests from the pytest branch on builds.apache.org did not work
>>>> either. At least the run_dtests.py arguments will need to be updated in
>>>> cassandra-builds. We currently only use a single cassandra-dtest.sh
>>>> script for all builds. Maybe we should create a new job template that
>>>> would use an updated script with the wip-pytest dtest branch, to make
>>>> this work and testable in parallel.
>>> 
>>> yes, i didn't touch cassandra-builds yet.. focused on getting circleci and 
>>> local runs working first... once we're happy with that and stable we can 
>>> make the changes to jenkins configs pretty easily...
>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 21.12.2017 11:13, Michael Kjellman wrote:
>>>>> I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 
>>>>> which includes tons of details (and a patch available for review) with my 
>>>>> efforts to migrate dtests from nosetest to pytest (which ultimately ended 
>>>>> up also including porting the ode from python 2.7 to python 3).
>>>>> 
>>>>> I'd love if people could pitch in in any way to help get this reviewed 
>>>>> and committed so we can reduce the natural drift that will occur with a 
>>>>> huge patch like this against the changes going into master. I apologize 
>>>>> for send

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-02 Thread Michael Kjellman
I reproduced the NOTSET log issue locally... got a fix.. i'll push a commit up 
in a moment.

> On Jan 2, 2018, at 11:24 AM, Michael Kjellman  
> wrote:
> 
> Comments Inline: Thanks for giving this a go!!
> 
>> On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski  wrote:
>> 
>> I was giving this a try today with some mixed results. First of all,
>> running pytest locally would fail with an "ccmlib.common.ArgumentError:
>> Unknown log level NOTSET" error for each test. Although I created a new
>> virtualenv for that as described in the readme (thanks for updating!)
>> and use both of your dtest and cassandra branches. But I haven't patched
>> ccm as described in the ticket, maybe that's why? Can you publish a
>> patched ccm branch to gh?
> 
> 99% sure this is an issue parsing the logging level passed to pytest to the 
> python logger... could you paste the exact command you're using to invoke 
> pytest? should be a small change - i'm sure i just missed a invocation case.
> 
>> 
>> The updated circle.yml is now using docker, which seems to be a good
>> idea to reduce clutter in the yaml file and gives us more control over
>> the test environment. Can you add the Dockerfile to the .circleci
>> directory as well? I couldn't find it when I was trying to solve the
>> pytest error mentioned above.
> 
> This is already tracked in a separate repo: 
> https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile
>> 
>> Next thing I did was to push your trunk_circle branch to my gh repo to
>> start a circleCI run. Finishing all dtests in 15 minutes sounds
>> exciting, but requires a paid tier plan to get that kind of
>> parallelization. Looks like the dtests have even been deliberately
>> disabled for non-paid accounts, so I couldn't test this any further.
> 
> the plan of action (i already already mentioned this in previous emails) is 
> to get dtests working for the free circieci oss accounts as well. part of 
> this work (already included in this pytest effort) is to have fixtures that 
> look at the system resources and dynamically include tests as possible.
> 
>> 
>> Running dtests from the pytest branch on builds.apache.org did not work
>> either. At least the run_dtests.py arguments will need to be updated in
>> cassandra-builds. We currently only use a single cassandra-dtest.sh
>> script for all builds. Maybe we should create a new job template that
>> would use an updated script with the wip-pytest dtest branch, to make
>> this work and testable in parallel.
> 
> yes, i didn't touch cassandra-builds yet.. focused on getting circleci and 
> local runs working first... once we're happy with that and stable we can make 
> the changes to jenkins configs pretty easily...
> 
>> 
>> 
>> 
>> On 21.12.2017 11:13, Michael Kjellman wrote:
>>> I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which 
>>> includes tons of details (and a patch available for review) with my efforts 
>>> to migrate dtests from nosetest to pytest (which ultimately ended up also 
>>> including porting the ode from python 2.7 to python 3).
>>> 
>>> I'd love if people could pitch in in any way to help get this reviewed and 
>>> committed so we can reduce the natural drift that will occur with a huge 
>>> patch like this against the changes going into master. I apologize for 
>>> sending this so close to the holidays, but I really have been working 
>>> non-stop trying to get things into a completed and stable state.
>>> 
>>> The latest CircleCI runs I did took roughly 15 minutes to run all the 
>>> dtests with only 6 failures remaining (when run with vnodes) and 12 
>>> failures remaining (when run without vnodes). For comparison the last ASF 
>>> Jenkins Dtest job to successfully complete took nearly 10 hours (9:51) and 
>>> we had 36 test failures. Of note, while I was working on this and trying to 
>>> determine a baseline for the existing tests I found that the ASF Jenkins 
>>> jobs were incorrectly configured due to a typo. The no-vnodes job is 
>>> actually running with vnodes (meaning the no-vnodes job is identical to the 
>>> with-vnodes ASF Jenkins job). There are some bootstrap tests that will 100% 
>>> reliably hang both nosetest and pytest on test cleanup, however this test 
>>> only runs in the no-vnodes configuration. I've debugged and fixed a lot of 
>>> these cases across many test cases over the past few weeks and I no longer 
>>> know of any tests that can hang CI.
&

Re: [Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2018-01-02 Thread Michael Kjellman
Comments Inline: Thanks for giving this a go!!

> On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski  wrote:
> 
> I was giving this a try today with some mixed results. First of all,
> running pytest locally would fail with an "ccmlib.common.ArgumentError:
> Unknown log level NOTSET" error for each test. Although I created a new
> virtualenv for that as described in the readme (thanks for updating!)
> and use both of your dtest and cassandra branches. But I haven't patched
> ccm as described in the ticket, maybe that's why? Can you publish a
> patched ccm branch to gh?

99% sure this is an issue parsing the logging level passed to pytest to the 
python logger... could you paste the exact command you're using to invoke 
pytest? should be a small change - i'm sure i just missed a invocation case.

> 
> The updated circle.yml is now using docker, which seems to be a good
> idea to reduce clutter in the yaml file and gives us more control over
> the test environment. Can you add the Dockerfile to the .circleci
> directory as well? I couldn't find it when I was trying to solve the
> pytest error mentioned above.

This is already tracked in a separate repo: 
https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile
> 
> Next thing I did was to push your trunk_circle branch to my gh repo to
> start a circleCI run. Finishing all dtests in 15 minutes sounds
> exciting, but requires a paid tier plan to get that kind of
> parallelization. Looks like the dtests have even been deliberately
> disabled for non-paid accounts, so I couldn't test this any further.

the plan of action (i already already mentioned this in previous emails) is to 
get dtests working for the free circieci oss accounts as well. part of this 
work (already included in this pytest effort) is to have fixtures that look at 
the system resources and dynamically include tests as possible.

> 
> Running dtests from the pytest branch on builds.apache.org did not work
> either. At least the run_dtests.py arguments will need to be updated in
> cassandra-builds. We currently only use a single cassandra-dtest.sh
> script for all builds. Maybe we should create a new job template that
> would use an updated script with the wip-pytest dtest branch, to make
> this work and testable in parallel.

yes, i didn't touch cassandra-builds yet.. focused on getting circleci and 
local runs working first... once we're happy with that and stable we can make 
the changes to jenkins configs pretty easily...

> 
> 
> 
> On 21.12.2017 11:13, Michael Kjellman wrote:
>> I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which 
>> includes tons of details (and a patch available for review) with my efforts 
>> to migrate dtests from nosetest to pytest (which ultimately ended up also 
>> including porting the ode from python 2.7 to python 3).
>> 
>> I'd love if people could pitch in in any way to help get this reviewed and 
>> committed so we can reduce the natural drift that will occur with a huge 
>> patch like this against the changes going into master. I apologize for 
>> sending this so close to the holidays, but I really have been working 
>> non-stop trying to get things into a completed and stable state.
>> 
>> The latest CircleCI runs I did took roughly 15 minutes to run all the dtests 
>> with only 6 failures remaining (when run with vnodes) and 12 failures 
>> remaining (when run without vnodes). For comparison the last ASF Jenkins 
>> Dtest job to successfully complete took nearly 10 hours (9:51) and we had 36 
>> test failures. Of note, while I was working on this and trying to determine 
>> a baseline for the existing tests I found that the ASF Jenkins jobs were 
>> incorrectly configured due to a typo. The no-vnodes job is actually running 
>> with vnodes (meaning the no-vnodes job is identical to the with-vnodes ASF 
>> Jenkins job). There are some bootstrap tests that will 100% reliably hang 
>> both nosetest and pytest on test cleanup, however this test only runs in the 
>> no-vnodes configuration. I've debugged and fixed a lot of these cases across 
>> many test cases over the past few weeks and I no longer know of any tests 
>> that can hang CI.
>> 
>> Thanks and I'm optimistic about making testing great for the project and 
>> most importantly for the OSS C* community!
>> 
>> best,
>> kjellman
>> 
>> Some highlights that I quickly thought of (in no particular order): {also 
>> included in the JIRA}
>> -Migrate dtests from executing using the nosetest framework to pytest
>> -Port the entire code base from Python 2.7 to Python 3.6
>> -Update run_dtests.py to wo

Re: Test patch to Cassandra.3.0.15 using dtests

2017-12-21 Thread Michael Kjellman
hi sergey:

took much longer than i hoped but i have a patch up for review to hopefully 
improve the dtest user experience.

https://issues.apache.org/jira/browse/CASSANDRA-14134

i sent an email summarizing it earlier this morning in a separate thread.

i moved all the undocumented environment variables to command line arguments 
and added help strings for all of them.

there is now some very basic environmental validation that happens up front:

either —cassandra-dir or —cassandra-version are required command line 
arguments. if you invoke with —help i even added a note that “ant clean jar” is 
required to be run before hand on the cassandra dir —cassandra-dir is pointed 
at.

in addition:
-if you’re running on mac i check that the required loopback interfaces have 
been created (if not print an error message with the command to run and create 
the other loopback interfaces).
-upgrade tests aren’t invoked/collected by default
-for tests marked with the “resource_intensive” annotation (these tests tend to 
invoke ccm by populating the test cluster with 9 instances requiring a good 
chunk of ram). instead of making the user know if they should run these or not 
i do a quick check at runtime to determine the amount of ram available on the 
system and dynamically enable or disable the resource_intensive annotated 
tests! (there are command line arguments of course to explicitly override this 
behavior if required for some reason).
-i added a bunch of extra documentation to README.md with the hope that it’s 
the first thing people see on GitHub and more likely to be read (how to start 
the tests, bootstrap and setup the required dependencies, and some tips on 
debugging tests)

curious on your thoughts from a user perspective of how a these improvements 
will help someone like yourself who recently tried to test your patch against 
the dtests? any other areas i didn’t address yet that would make getting 
bootstrapped better? hopefully we can shortly get a updated and greatly 
enhanced circleci 2.0 yaml in the upstream repo that reliably will let even the 
most casual contributor make a change and run the unit and dtests against their 
branch in circleci (using just a free circleci OSS account) without any end 
user effort!

best,
kjellman

On Dec 13, 2017, at 12:28 PM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

i’ve been working on a story to improve this around the clock. including better 
documentation (and a —help flag with options to make it easy to know how to run 
dtests and a few runtime sanity checks about the environment)! stay tuned!

On Dec 13, 2017, at 3:24 AM, Sergey 
mailto:cassandra.bu...@gmail.com>> wrote:

Hi!

I am looking for a way to test the patch I made to specific version of 
Cassandra (3.0.15) by leveraging the dtest.

Documentation for the dtests says:
“The only thing the framework needs to know is the location of the (compiled) 
sources for Cassandra. There are two options:

Use existing sources:

CASSANDRA_DIR=~/path/to/cassandra nosetests”

So if I git clone the Cassandra sources to ~/path/to/Cassandra, switch to tag 
3.0.15, apply my patch and run ant – will this be enough for my purpose?

Best regards,
Sergey


Re: Cassandra Dtests: skip upgrade tests

2017-12-21 Thread Michael Kjellman
As part of the work i did for 
https://issues.apache.org/jira/browse/CASSANDRA-14134, one of the things I did 
was add a new command line argument “--execute-upgrade-tests”.

all the upgrade tests are now annotated with an upgrade_test pytest annotation. 
by default they aren’t run. adding a single flag (easily discoverable in the 
—help) will turn them on if necessary. or you can use the power features of 
pytest collection filtering when invoking pytest directly (look at the -m 
option).

hope this helps going forward!!

best,
kjellman

On Dec 8, 2017, at 2:03 PM, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:

Yep, that rm is a bit of a hack, since environment vars for
JDK{8,9}_HOME are not able to be set on the static slaves. The "proper"
way to skip them is just a normal nose exclude (drop --collect-only to
actually run 'em):

./run_dtests.py --nose-options="--collect-only -e upgrade_tests/"
 or
nosetests --collect-only -e upgrade_tests/

Also, to run only the upgrade_tests, since we're here :)

./run_dtests.py --nose-options="--collect-only upgrade_tests/"
 or
nosetests --collect-only upgrade_tests/

--
Michael

On 12/08/2017 12:07 PM, Jay Zhuang wrote:
Here is how cassandra-builds jenkins job do:$ rm -r upgrade_tests/
https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L50

   On Friday, December 8, 2017, 1:28:34 AM PST, Sergey 
mailto:cassandra.bu...@gmail.com>> wrote:

Hi!

How to completely skip upgrade tests when running dtests?

Best regards,
Sergey



-
To unsubscribe, e-mail: 
dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
dev-h...@cassandra.apache.org



[Patch Available for Review!] CASSANDRA-14134: Migrate dtests to use pytest and python3

2017-12-21 Thread Michael Kjellman
I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which 
includes tons of details (and a patch available for review) with my efforts to 
migrate dtests from nosetest to pytest (which ultimately ended up also 
including porting the ode from python 2.7 to python 3).

I'd love if people could pitch in in any way to help get this reviewed and 
committed so we can reduce the natural drift that will occur with a huge patch 
like this against the changes going into master. I apologize for sending this 
so close to the holidays, but I really have been working non-stop trying to get 
things into a completed and stable state.

The latest CircleCI runs I did took roughly 15 minutes to run all the dtests 
with only 6 failures remaining (when run with vnodes) and 12 failures remaining 
(when run without vnodes). For comparison the last ASF Jenkins Dtest job to 
successfully complete took nearly 10 hours (9:51) and we had 36 test failures. 
Of note, while I was working on this and trying to determine a baseline for the 
existing tests I found that the ASF Jenkins jobs were incorrectly configured 
due to a typo. The no-vnodes job is actually running with vnodes (meaning the 
no-vnodes job is identical to the with-vnodes ASF Jenkins job). There are some 
bootstrap tests that will 100% reliably hang both nosetest and pytest on test 
cleanup, however this test only runs in the no-vnodes configuration. I've 
debugged and fixed a lot of these cases across many test cases over the past 
few weeks and I no longer know of any tests that can hang CI.

Thanks and I'm optimistic about making testing great for the project and most 
importantly for the OSS C* community!

best,
kjellman

Some highlights that I quickly thought of (in no particular order): {also 
included in the JIRA}
-Migrate dtests from executing using the nosetest framework to pytest
-Port the entire code base from Python 2.7 to Python 3.6
-Update run_dtests.py to work with pytest
-Add --dtest-print-tests-only option to run_dtests.py to get easily parsable 
list of all available collected tests
-Update README.md for executing the dtests with pytest
-Add new debugging tips section to README.md to help with some basics of 
debugging python3 and pytest
-Migrate all existing Enviornment Variable usage as a means to control dtest 
operation modes to argparse command line options with documented help on each 
toggles intended usage
-Migration of old unitTest and nose based test structure to modern pytest 
fixture approach
-Automatic detection of physical system resources to automatically determine if 
@pytest.mark.resource_intensive annotated tests should be collected and run on 
the system where they are being executed
-new pytest fixture replacements for @since and @pytest.mark.upgrade_test 
annotations
-Migration to python logging framework
-Upgrade thrift bindings to latest version with full python3 compatibility
-Remove deprecated cql and pycassa dependencies and migrate any remaining tests 
to fully remove those dependencies
-Fixed dozens of tests that would hang the pytest framework forever when run in 
CI enviornments
-Ran code nearly 300 times in CircleCI during the migration and to find, 
identify, and fix any tests capable of hanging CI
-Upgrade Tests do not yet run in CI and still need additional migration work 
(although all upgrade test classes compile successfully)


Re: Test patch to Cassandra.3.0.15 using dtests

2017-12-13 Thread Michael Kjellman
i’ve been working on a story to improve this around the clock. including better 
documentation (and a —help flag with options to make it easy to know how to run 
dtests and a few runtime sanity checks about the environment)! stay tuned!

> On Dec 13, 2017, at 3:24 AM, Sergey  wrote:
> 
> Hi!
> 
> I am looking for a way to test the patch I made to specific version of 
> Cassandra (3.0.15) by leveraging the dtest.
> 
> Documentation for the dtests says:
> “The only thing the framework needs to know is the location of the (compiled) 
> sources for Cassandra. There are two options:
> 
> Use existing sources:
> 
> CASSANDRA_DIR=~/path/to/cassandra nosetests”
> 
> So if I git clone the Cassandra sources to ~/path/to/Cassandra, switch to tag 
> 3.0.15, apply my patch and run ant – will this be enough for my purpose?
> 
> Best regards,
> Sergey


Re: CCM dependency in dtests

2017-11-30 Thread Michael Kjellman
Hey Stefan, any updates on this? Thanks.

best,
kjellman

> On Nov 27, 2017, at 7:34 AM, Michael Kjellman  
> wrote:
> 
> thanks for driving this Stefan this is definitely an issue that I 
> recently saw too trying to get all the dtests passing. having logic you need 
> to fix in 3 repos isn’t ideal at all. 
> 
>> On Nov 27, 2017, at 4:05 AM, Stefan Podkowinski  wrote:
>> 
>> Just wanted to bring a recent discussion about how to use ccm from
>> dtests to your attention:
>> https://github.com/apache/cassandra-dtest/pull/13
>> 
>> Basically the idea is to not depend on a released ccm artifact, but to
>> use a dedicated git branch in the ccm repo instead for executing dtests.
>> Motivation and details can be found in the PR, please feel free to comment.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 



Re: [PROPOSAL] Migrate to pytest from nosetests for dtests

2017-11-29 Thread Michael Kjellman
s/handling/hanging

> On Nov 29, 2017, at 9:54 AM, Michael Kjellman  
> wrote:
> 
> i keep seeing nose randomly handing after a test successfully completes 
> execution. i’m very far from a python guru but i spent a few hours with gdb 
> trying to debug the thing and get python stacks and got symbolicated native 
> stacks but it’s random but root causing while nose is sitting on a lock 
> forever alludes me. some tests are more reproducible than others. some i see 
> fail 1 in 10 runs.
> 
> the net of it all though is this makes people not trust dtests because it 
> randomly hangs and shows tests with “failures” that actually succeeded.
> 
> i’m not a huge fan of just blindly upgrading to fix a problem but in this 
> case I found that there is quite a lot of mistrust and dislike for nosetests 
> in the python community with most projects already moving to pytest. and if 
> it is some complicated set of interactions between threads we use in the 
> tests and how nose works do we really want to even debug it when the project 
> appears to be abandoned?
> 
> i think regardless of the root cause for making things more stable it seems 
> like there is little motivation to stick around on nose...
> 
> lmk!
> 
> best,
> kjellman
> 
>> On Nov 29, 2017, at 5:33 AM, Philip Thompson  
>> wrote:
>> 
>> I don't have any objection to this, really. I know I rely on a handful of
>> nose plugins, and possibly others do, but those should be easy enough to
>> re-write. I am curious though, what's the impetus for this? Is there some
>> pytest feature we want that nose lacks? Is there some nosetest bug or
>> restriction getting in the way?
>> 
>>> On Tue, Nov 28, 2017 at 8:34 PM, Jon Haddad  wrote:
>>> 
>>> +1
>>> 
>>> I stopped using nose a long time ago in favor of py.test.  It’s a
>>> significant improvement.
>>> 
>>>> On Nov 28, 2017, at 10:49 AM, Michael Kjellman 
>>> wrote:
>>>> 
>>>> I'd like to propose we move from nosetest to pytest for the dtests. It
>>> looks like nosetests is basically abandoned, the python community doesn't
>>> like it, it hasn't been updated since 2015, and pytest even has nosetests
>>> support which would help us greatly during migration (
>>> https://docs.pytest.org/en/latest/nose.html).
>>>> 
>>>> Thoughts?
>>>> 
>>>> best,
>>>> kjellman
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


Re: [PROPOSAL] Migrate to pytest from nosetests for dtests

2017-11-29 Thread Michael Kjellman
i keep seeing nose randomly handing after a test successfully completes 
execution. i’m very far from a python guru but i spent a few hours with gdb 
trying to debug the thing and get python stacks and got symbolicated native 
stacks but it’s random but root causing while nose is sitting on a lock forever 
alludes me. some tests are more reproducible than others. some i see fail 1 in 
10 runs.

the net of it all though is this makes people not trust dtests because it 
randomly hangs and shows tests with “failures” that actually succeeded.

i’m not a huge fan of just blindly upgrading to fix a problem but in this case 
I found that there is quite a lot of mistrust and dislike for nosetests in the 
python community with most projects already moving to pytest. and if it is some 
complicated set of interactions between threads we use in the tests and how 
nose works do we really want to even debug it when the project appears to be 
abandoned?

i think regardless of the root cause for making things more stable it seems 
like there is little motivation to stick around on nose...

lmk!

best,
kjellman

> On Nov 29, 2017, at 5:33 AM, Philip Thompson  
> wrote:
> 
> I don't have any objection to this, really. I know I rely on a handful of
> nose plugins, and possibly others do, but those should be easy enough to
> re-write. I am curious though, what's the impetus for this? Is there some
> pytest feature we want that nose lacks? Is there some nosetest bug or
> restriction getting in the way?
> 
>> On Tue, Nov 28, 2017 at 8:34 PM, Jon Haddad  wrote:
>> 
>> +1
>> 
>> I stopped using nose a long time ago in favor of py.test.  It’s a
>> significant improvement.
>> 
>>> On Nov 28, 2017, at 10:49 AM, Michael Kjellman 
>> wrote:
>>> 
>>> I'd like to propose we move from nosetest to pytest for the dtests. It
>> looks like nosetests is basically abandoned, the python community doesn't
>> like it, it hasn't been updated since 2015, and pytest even has nosetests
>> support which would help us greatly during migration (
>> https://docs.pytest.org/en/latest/nose.html).
>>> 
>>> Thoughts?
>>> 
>>> best,
>>> kjellman
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



[PROPOSAL] Migrate to pytest from nosetests for dtests

2017-11-28 Thread Michael Kjellman
I'd like to propose we move from nosetest to pytest for the dtests. It looks 
like nosetests is basically abandoned, the python community doesn't like it, it 
hasn't been updated since 2015, and pytest even has nosetests support which 
would help us greatly during migration 
(https://docs.pytest.org/en/latest/nose.html).

Thoughts?

best,
kjellman


Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman
do you know why this is the case? shouldn’t -all test...all?

> On Nov 27, 2017, at 7:39 PM, Michael Shuler  wrote:
> 
> The `test-cdc` target is not a dependent of `test-all`, so it was set up
> as a separate job in Jenkins:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-3.11-test-cdc/
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test-cdc/
> 
> -- 
> Michael
> 
>> On 11/27/2017 03:45 PM, Michael Kjellman wrote:
>> Hey Jay:
>> 
>> Thanks!! I just took a quick look at the JIRA and noticed that there is a 
>> “test-cdc” ant target? So, does that mean CDC get’s no testing with ant 
>> test? Do you know any of the history around this?
>> 
>>> On Nov 27, 2017, at 9:44 AM, Jay Zhuang  
>>> wrote:
>>> 
>>> I fixed one CDC uTest, please 
>>> review:https://issues.apache.org/jira/browse/CASSANDRA-14066
>>> 
>>> 
>>>   On Friday, November 17, 2017 6:34 AM, Josh McKenzie 
>>>  wrote:
>>> 
>>> 
>>>> 
>>>> Do we have any volunteers to fix the broken Materialized Views and CDC
>>>> DTests?
>>> 
>>> I'll try to take a look at the CDC tests next week; looks like one of the
>>> base unit tests is failing as well.
>>> 
>>> On Fri, Nov 17, 2017 at 12:09 AM, Michael Kjellman <
>>> mkjell...@internalcircle.com> wrote:
>>> 
>>>> Quick update re: dtests and off-heap memtables:
>>>> 
>>>> I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException:
>>>> offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)
>>>> 
>>>> Looks like we’re gonna need to do some work to test this configuration and
>>>> right now it’s pretty broken...
>>>> 
>>>> Do we have any volunteers to fix the broken Materialized Views and CDC
>>>> DTests?
>>>> 
>>>> best,
>>>> kjellman
>>>> 
>>>> 
>>>>> On Nov 15, 2017, at 5:59 PM, Michael Kjellman <
>>>> mkjell...@internalcircle.com> wrote:
>>>>> 
>>>>> yes - true- some are flaky, but almost all of the ones i filed fail 100%
>>>> (💯) of the time. i look forward to triaging just the remaining flaky ones
>>>> (hopefully - without powers combined - by the end of this month!!)
>>>>> 
>>>>> appreciate everyone’s help - no matter how small... i already personally
>>>> did a few “fun” random-python-class-is-missing-return-after-method stuff.
>>>>> 
>>>>> we’ve wanted this for a while and now is our time to actually execute
>>>> and make good on our previous dev list promises.
>>>>> 
>>>>> best,
>>>>> kjellman
>>>>> 
>>>>>> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
>>>>>> 
>>>>>> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>>>>>> 
>>>>>> If you haven't been paying attention to JIRA, you likely didn't notice
>>>> that
>>>>>> Josh went through and triage/categorized a bunch of issues by adding
>>>>>> components, and Michael took the time to open a bunch of JIRAs for
>>>> failing
>>>>>> tests.
>>>>>> 
>>>>>> How many is a bunch? Something like 35 or so just for tests currently
>>>>>> failing on trunk.  If you're a regular contributor, you already know
>>>> that
>>>>>> dtests are flakey - it'd be great if a few of us can go through and fix
>>>> a
>>>>>> few. Even incremental improvements are improvements. Here's an easy
>>>> search
>>>>>> to find them:
>>>>>> 
>>>>>> https://issues.apache.org/jira/secure/IssueNavigator.
>>>> jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+
>>>> component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
>>>> DESC%2C+created+ASC&mode=hide
>>>>>> 
>>>>>> If you're a new contributor, fixing tests is often a good way to learn a
>>>>>> new part of the codebase. Many of these are dtests, which live in a
>>>>>> different repo ( https://github.com/apache/cassandra-dtest ) and are in
>>>>>> python, but have no fear, the repo has instructions for setting up and
>>>>>> running dtests(
>>>>>> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>>>>>> 
>>>>>> Normal contribution workflow applies: self-assign the ticket if you
>>>> want to
>>>>>> work on it, click on 'start progress' to indicate that you're working on
>>>>>> it, mark it 'patch available' when you've uploaded code to be reviewed
>>>> (in
>>>>>> a github branch, or as a standalone patch file attached to the JIRA). If
>>>>>> you have questions, feel free to email the dev list (that's what it's
>>>> here
>>>>>> for).
>>>>>> 
>>>>>> Many thanks will be given,
>>>>>> - Jeff
>>>> 
>>>> 
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman
and just to make it super clear how awesome this is: currently the dtests when 
executed via ASF Jenkins (if they actually run successfully at all) take 
roughly 15+ hours to execute. Being able to run *everything* reliably and 
stably in 28 minutes is obviously many many orders of magnitude better.

best,
kjellman

On Nov 27, 2017, at 2:43 PM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

(with 100 containers we can actually build the project, run all of the unit 
tests, and run all of the dtests in roughly 28 minutes!).



Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman
Complicated question unfortunately — and something we’re actively working on 
improving:

Cassci is no longer being offered/run by Datastax and so we've need to come up 
with a new solution, and what that ultimately is is still a WIP — it’s loss was 
very huge obviously and a testament to the awesome resource and effort that was 
put into providing it to the community for all those years.

 - Short Term/Current: Tests (both dtests and unit tests) are being run via the 
ASF Jenkins (https://builds.apache.org) - but that solution isn’t hugely 
helpful as it’s resource constrained.
 - Short-Medium Term: we hope to get a fully baked CircleCI solution to get 
reliable fast test runs.
 - Long Term: Actively being discussed but I’m optimistic that we can get 
something awesome for the project with some stable combination of CircleCI + 
ASF Jenkins, and once we do I’m sure this will change any long term plans.

For Unit Tests (a.k.a the Java ones in tree - 
https://github.com/apache/cassandra/tree/trunk/test/unit/org/apache/cassandra):
Take a look at 
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/… 
looks like the last successful job to finish was #389. 
(https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/389/testReport/).
 There are currently a total of 6 tests  (all from CompressedInputStreamTest) 
failing on trunk via ASF Jenkins. These specific test failures are 
environmental. The only *unit* test on trunk that I currently know to be flaky 
is org.apache.cassandra.cql3.ViewTest. testRegularColumnTimestampUpdates 
(tracked as https://issues.apache.org/jira/browse/CASSANDRA-14054)

For Distributed Tests (DTests) (a.k.a the Python ones - 
https://github.com/apache/cassandra-dtest):
The situation is a great deal more complicated due to the length of time and 
number of resources executing all of the dtests take (and executing the tests 
across the various configurations)...

There are 4 dtest jobs on ASF Jenkins for trunk:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest-large/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest-novnode/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest-offheap/

It looks like you’ll need to go back to run #353 
(https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/353/testReport/)
 to see the test results as the last 2 jobs that were triggered failed to 
execute. Depending on the environment variables set tests are executed or 
skipped — so you’ll see different tests being run on the no-vnode job/off-heap 
job/regular dtest job (or some tests might be run multiple times)


More recently we’ve been woking on getting CircleCI running. Some sample runs 
from my personal fork can be seen at 
https://circleci.com/gh/mkjellman/cassandra/tree/trunk_circle. I’m personally 
using a paid account to get more CircleCI resources (with 100 containers we can 
actually build the project, run all of the unit tests, and run all of the 
dtests in roughly 28 minutes!). I’m actively working to determine out exactly 
can (and cannot) be executed reliably, routinely, and easily by anyone with 
just a simple free CircleCI account.

I’m also working on getting scheduled CircleCI daily runs setup against 
trunk/3.0 — more on both of those when we’ve got that story fully baked.. Hope 
this answers your question! There are quite a few dtests currently failing and 
as Jeff mentioned I’ve created JIRAs for a lot of them already so any help (no 
matter how trivial or annoying it might be or seem) to get everything green 
again.

best,
kjellman


On Nov 27, 2017, at 1:54 PM, Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

Is there a way to check which tests are failing in trunk currently?
Previously this URL  was giving such results
but is no longer working.

Jaydeep

On Wed, Nov 15, 2017 at 5:44 PM, Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:

In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.

If you haven't been paying attention to JIRA, you likely didn't notice that
Josh went through and triage/categorized a bunch of issues by adding
components, and Michael took the time to open a bunch of JIRAs for failing
tests.

How many is a bunch? Something like 35 or so just for tests currently
failing on trunk.  If you're a regular contributor, you already know that
dtests are flakey - it'd be great if a few of us can go through and fix a
few. Even incremental improvements are improvements. Here's an easy search
to find them:

https://issues.apache.org/jira/secure/IssueNavigator.
jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+
component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
DESC%2C+created+ASC&mode=hide

If you're a new contributor, fixing tests is often a good way to learn a
new part of the codebase. Many of these 

Re: Flakey Dtests

2017-11-27 Thread Michael Kjellman
Hey Jay:

Thanks!! I just took a quick look at the JIRA and noticed that there is a 
“test-cdc” ant target? So, does that mean CDC get’s no testing with ant test? 
Do you know any of the history around this?

> On Nov 27, 2017, at 9:44 AM, Jay Zhuang  wrote:
> 
> I fixed one CDC uTest, please 
> review:https://issues.apache.org/jira/browse/CASSANDRA-14066
> 
> 
>On Friday, November 17, 2017 6:34 AM, Josh McKenzie  
> wrote:
> 
> 
>> 
>> Do we have any volunteers to fix the broken Materialized Views and CDC
>> DTests?
> 
> I'll try to take a look at the CDC tests next week; looks like one of the
> base unit tests is failing as well.
> 
> On Fri, Nov 17, 2017 at 12:09 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> Quick update re: dtests and off-heap memtables:
>> 
>> I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException:
>> offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)
>> 
>> Looks like we’re gonna need to do some work to test this configuration and
>> right now it’s pretty broken...
>> 
>> Do we have any volunteers to fix the broken Materialized Views and CDC
>> DTests?
>> 
>> best,
>> kjellman
>> 
>> 
>>> On Nov 15, 2017, at 5:59 PM, Michael Kjellman <
>> mkjell...@internalcircle.com> wrote:
>>> 
>>> yes - true- some are flaky, but almost all of the ones i filed fail 100%
>> (💯) of the time. i look forward to triaging just the remaining flaky ones
>> (hopefully - without powers combined - by the end of this month!!)
>>> 
>>> appreciate everyone’s help - no matter how small... i already personally
>> did a few “fun” random-python-class-is-missing-return-after-method stuff.
>>> 
>>> we’ve wanted this for a while and now is our time to actually execute
>> and make good on our previous dev list promises.
>>> 
>>> best,
>>> kjellman
>>> 
>>>> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
>>>> 
>>>> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>>>> 
>>>> If you haven't been paying attention to JIRA, you likely didn't notice
>> that
>>>> Josh went through and triage/categorized a bunch of issues by adding
>>>> components, and Michael took the time to open a bunch of JIRAs for
>> failing
>>>> tests.
>>>> 
>>>> How many is a bunch? Something like 35 or so just for tests currently
>>>> failing on trunk.  If you're a regular contributor, you already know
>> that
>>>> dtests are flakey - it'd be great if a few of us can go through and fix
>> a
>>>> few. Even incremental improvements are improvements. Here's an easy
>> search
>>>> to find them:
>>>> 
>>>> https://issues.apache.org/jira/secure/IssueNavigator.
>> jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+
>> component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
>> DESC%2C+created+ASC&mode=hide
>>>> 
>>>> If you're a new contributor, fixing tests is often a good way to learn a
>>>> new part of the codebase. Many of these are dtests, which live in a
>>>> different repo ( https://github.com/apache/cassandra-dtest ) and are in
>>>> python, but have no fear, the repo has instructions for setting up and
>>>> running dtests(
>>>> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>>>> 
>>>> Normal contribution workflow applies: self-assign the ticket if you
>> want to
>>>> work on it, click on 'start progress' to indicate that you're working on
>>>> it, mark it 'patch available' when you've uploaded code to be reviewed
>> (in
>>>> a github branch, or as a standalone patch file attached to the JIRA). If
>>>> you have questions, feel free to email the dev list (that's what it's
>> here
>>>> for).
>>>> 
>>>> Many thanks will be given,
>>>> - Jeff
>> 
>> 
> 



Re: CCM dependency in dtests

2017-11-27 Thread Michael Kjellman
thanks for driving this Stefan this is definitely an issue that I recently 
saw too trying to get all the dtests passing. having logic you need to fix in 3 
repos isn’t ideal at all. 

> On Nov 27, 2017, at 4:05 AM, Stefan Podkowinski  wrote:
> 
> Just wanted to bring a recent discussion about how to use ccm from
> dtests to your attention:
> https://github.com/apache/cassandra-dtest/pull/13
> 
> Basically the idea is to not depend on a released ccm artifact, but to
> use a dedicated git branch in the ccm repo instead for executing dtests.
> Motivation and details can be found in the PR, please feel free to comment.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Flakey Dtests

2017-11-17 Thread Michael Kjellman
I’m guessing this was part of 
https://issues.apache.org/jira/browse/CASSANDRA-5?

I see Sylvain left a comment about something that sounds pretty similar… was 
this actually resolved? looks like it was merged as 
https://github.com/pcmanus/ccm/commit/1c0bf62e0b21fc78ee09026882953a5436ccf0f0? 
when do ccm releases get published to pypy?


On Nov 17, 2017, at 12:18 AM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

I see a ton of upgrade tests right now failing for:


Unexpected error in node1 log, error:
ERROR [main] 2017-11-17 07:57:54,477 CassandraDaemon.java:672 - Exception 
encountered during startup: Invalid yaml. Please remove properties [rpc_port] 
from your cassandra.yaml

I do see that rpc_port is in 3.0 and it seems to have been yanked from trunk.. 
So it seems like a legitimate failure.. I’m not sure I fully understand how the 
yaml upgrade path works for upgrade test dtests. I’ve taken a look at 
upgrade_tests/upgrade_manifest.py and upgrade_tests/README.md… can anyone shed 
any light on how this is supposed to work? Was handling rpc_port in the upgrade 
dtests just missed when this was removed for whatever reason from trunk?

thanks…


best,
kjellman

On Nov 16, 2017, at 9:09 PM, Michael Kjellman 
mailto:mkjell...@internalcircle.com><mailto:mkjell...@internalcircle.com>>
 wrote:

Quick update re: dtests and off-heap memtables:

I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException: 
offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)

Looks like we’re gonna need to do some work to test this configuration and 
right now it’s pretty broken...

Do we have any volunteers to fix the broken Materialized Views and CDC DTests?

best,
kjellman


On Nov 15, 2017, at 5:59 PM, Michael Kjellman 
mailto:mkjell...@internalcircle.com><mailto:mkjell...@internalcircle.com>>
 wrote:

yes - true- some are flaky, but almost all of the ones i filed fail 100% (💯) of 
the time. i look forward to triaging just the remaining flaky ones (hopefully - 
without powers combined - by the end of this month!!)

appreciate everyone’s help - no matter how small... i already personally did a 
few “fun” random-python-class-is-missing-return-after-method stuff.

we’ve wanted this for a while and now is our time to actually execute and make 
good on our previous dev list promises.

best,
kjellman

On Nov 15, 2017, at 5:45 PM, Jeff Jirsa 
mailto:jji...@gmail.com><mailto:jji...@gmail.com>> wrote:

In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.

If you haven't been paying attention to JIRA, you likely didn't notice that
Josh went through and triage/categorized a bunch of issues by adding
components, and Michael took the time to open a bunch of JIRAs for failing
tests.

How many is a bunch? Something like 35 or so just for tests currently
failing on trunk.  If you're a regular contributor, you already know that
dtests are flakey - it'd be great if a few of us can go through and fix a
few. Even incremental improvements are improvements. Here's an easy search
to find them:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC&mode=hide

If you're a new contributor, fixing tests is often a good way to learn a
new part of the codebase. Many of these are dtests, which live in a
different repo ( https://github.com/apache/cassandra-dtest ) and are in
python, but have no fear, the repo has instructions for setting up and
running dtests(
https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )

Normal contribution workflow applies: self-assign the ticket if you want to
work on it, click on 'start progress' to indicate that you're working on
it, mark it 'patch available' when you've uploaded code to be reviewed (in
a github branch, or as a standalone patch file attached to the JIRA). If
you have questions, feel free to email the dev list (that's what it's here
for).

Many thanks will be given,
- Jeff





Re: Flakey Dtests

2017-11-17 Thread Michael Kjellman
I see a ton of upgrade tests right now failing for:


Unexpected error in node1 log, error:
ERROR [main] 2017-11-17 07:57:54,477 CassandraDaemon.java:672 - Exception 
encountered during startup: Invalid yaml. Please remove properties [rpc_port] 
from your cassandra.yaml

I do see that rpc_port is in 3.0 and it seems to have been yanked from trunk.. 
So it seems like a legitimate failure.. I’m not sure I fully understand how the 
yaml upgrade path works for upgrade test dtests. I’ve taken a look at 
upgrade_tests/upgrade_manifest.py and upgrade_tests/README.md… can anyone shed 
any light on how this is supposed to work? Was handling rpc_port in the upgrade 
dtests just missed when this was removed for whatever reason from trunk?

thanks…


best,
kjellman

On Nov 16, 2017, at 9:09 PM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

Quick update re: dtests and off-heap memtables:

I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException: 
offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)

Looks like we’re gonna need to do some work to test this configuration and 
right now it’s pretty broken...

Do we have any volunteers to fix the broken Materialized Views and CDC DTests?

best,
kjellman


On Nov 15, 2017, at 5:59 PM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

yes - true- some are flaky, but almost all of the ones i filed fail 100% (💯) of 
the time. i look forward to triaging just the remaining flaky ones (hopefully - 
without powers combined - by the end of this month!!)

appreciate everyone’s help - no matter how small... i already personally did a 
few “fun” random-python-class-is-missing-return-after-method stuff.

we’ve wanted this for a while and now is our time to actually execute and make 
good on our previous dev list promises.

best,
kjellman

On Nov 15, 2017, at 5:45 PM, Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:

In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.

If you haven't been paying attention to JIRA, you likely didn't notice that
Josh went through and triage/categorized a bunch of issues by adding
components, and Michael took the time to open a bunch of JIRAs for failing
tests.

How many is a bunch? Something like 35 or so just for tests currently
failing on trunk.  If you're a regular contributor, you already know that
dtests are flakey - it'd be great if a few of us can go through and fix a
few. Even incremental improvements are improvements. Here's an easy search
to find them:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC&mode=hide

If you're a new contributor, fixing tests is often a good way to learn a
new part of the codebase. Many of these are dtests, which live in a
different repo ( https://github.com/apache/cassandra-dtest ) and are in
python, but have no fear, the repo has instructions for setting up and
running dtests(
https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )

Normal contribution workflow applies: self-assign the ticket if you want to
work on it, click on 'start progress' to indicate that you're working on
it, mark it 'patch available' when you've uploaded code to be reviewed (in
a github branch, or as a standalone patch file attached to the JIRA). If
you have questions, feel free to email the dev list (that's what it's here
for).

Many thanks will be given,
- Jeff




Re: Flakey Dtests

2017-11-16 Thread Michael Kjellman
Quick update re: dtests and off-heap memtables:

I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException: 
offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)

Looks like we’re gonna need to do some work to test this configuration and 
right now it’s pretty broken...

Do we have any volunteers to fix the broken Materialized Views and CDC DTests?

best,
kjellman


> On Nov 15, 2017, at 5:59 PM, Michael Kjellman  
> wrote:
> 
> yes - true- some are flaky, but almost all of the ones i filed fail 100% (💯) 
> of the time. i look forward to triaging just the remaining flaky ones 
> (hopefully - without powers combined - by the end of this month!!)
> 
> appreciate everyone’s help - no matter how small... i already personally did 
> a few “fun” random-python-class-is-missing-return-after-method stuff. 
> 
> we’ve wanted this for a while and now is our time to actually execute and 
> make good on our previous dev list promises. 
> 
> best,
> kjellman
> 
>> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
>> 
>> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>> 
>> If you haven't been paying attention to JIRA, you likely didn't notice that
>> Josh went through and triage/categorized a bunch of issues by adding
>> components, and Michael took the time to open a bunch of JIRAs for failing
>> tests.
>> 
>> How many is a bunch? Something like 35 or so just for tests currently
>> failing on trunk.  If you're a regular contributor, you already know that
>> dtests are flakey - it'd be great if a few of us can go through and fix a
>> few. Even incremental improvements are improvements. Here's an easy search
>> to find them:
>> 
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC&mode=hide
>> 
>> If you're a new contributor, fixing tests is often a good way to learn a
>> new part of the codebase. Many of these are dtests, which live in a
>> different repo ( https://github.com/apache/cassandra-dtest ) and are in
>> python, but have no fear, the repo has instructions for setting up and
>> running dtests(
>> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>> 
>> Normal contribution workflow applies: self-assign the ticket if you want to
>> work on it, click on 'start progress' to indicate that you're working on
>> it, mark it 'patch available' when you've uploaded code to be reviewed (in
>> a github branch, or as a standalone patch file attached to the JIRA). If
>> you have questions, feel free to email the dev list (that's what it's here
>> for).
>> 
>> Many thanks will be given,
>> - Jeff



Re: Flakey Dtests

2017-11-15 Thread Michael Kjellman
yes - true- some are flaky, but almost all of the ones i filed fail 100% (💯) of 
the time. i look forward to triaging just the remaining flaky ones (hopefully - 
without powers combined - by the end of this month!!)

appreciate everyone’s help - no matter how small... i already personally did a 
few “fun” random-python-class-is-missing-return-after-method stuff. 

we’ve wanted this for a while and now is our time to actually execute and make 
good on our previous dev list promises. 

best,
kjellman

> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
> 
> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
> 
> If you haven't been paying attention to JIRA, you likely didn't notice that
> Josh went through and triage/categorized a bunch of issues by adding
> components, and Michael took the time to open a bunch of JIRAs for failing
> tests.
> 
> How many is a bunch? Something like 35 or so just for tests currently
> failing on trunk.  If you're a regular contributor, you already know that
> dtests are flakey - it'd be great if a few of us can go through and fix a
> few. Even incremental improvements are improvements. Here's an easy search
> to find them:
> 
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC&mode=hide
> 
> If you're a new contributor, fixing tests is often a good way to learn a
> new part of the codebase. Many of these are dtests, which live in a
> different repo ( https://github.com/apache/cassandra-dtest ) and are in
> python, but have no fear, the repo has instructions for setting up and
> running dtests(
> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
> 
> Normal contribution workflow applies: self-assign the ticket if you want to
> work on it, click on 'start progress' to indicate that you're working on
> it, mark it 'patch available' when you've uploaded code to be reviewed (in
> a github branch, or as a standalone patch file attached to the JIRA). If
> you have questions, feel free to email the dev list (that's what it's here
> for).
> 
> Many thanks will be given,
> - Jeff


Re: Making CommitLog pluggable

2017-11-01 Thread Michael Kjellman
Awesome!! You're two steps ahead ;)

Not sure if you're allowed to share, but can you highlight any details on 
endurance and performance? Are the pages 4kb or 16kb? How many writes do you 
expect to handle over a 1 year window of the device? I assume because you're 
directly accessing the hardware as a block device there are different rules in 
regards to fsync and how things are flushed? Any power loss protection features 
etc? If you write a commit log segment that's like 20 bytes (for example), will 
you post-pad the entire thing internally and still need to write 4kb (or 
whatever the physical page size is)?

Thanks!

best,
kjellman

> On Nov 1, 2017, at 2:40 PM, 大平怜  wrote:
> 
> Hi Michael,
> 
> Yes, testing is always a problem, and that is exactly why we would like to
> release
> our code as a plugin, outside of the main source tree, so that the project
> won't
> need to test the hardware-dependent code.
> The pluggable CommitLog will allow this approach.
> 
> Actually, we have already released another plugin for CAPI-Flash-based
> RowCache,
> which takes advantage of the pluggable RowCache mechanism.
> https://github.com/ppc64le/capi-rowcache
> We would just like to repeat this approach in CommitLog.
> 
> 
> Thanks,
> Rei Odaira
> 
> 
> 2017-11-01 16:30 GMT-05:00 Michael Kjellman :
> 
>> Rei:
>> 
>> One thing that comes up when these type of conversations occur is how the
>> project can test hardware dependent code. In the case of the PPC64 stuff,
>> hardware actually got donated to the ASF so Jenkins runs could be done to
>> check that things work. Any thoughts on this aspect? Might be a bit
>> pre-mature, but I thought I'd at least mention it... On the flip side: if
>> CommitLog becomes pluggable enough, shipping an implementation compatible
>> with the hardware out of tree might also be viable too.
>> 
>> best,
>> kjellman
>> 
>>> On Nov 1, 2017, at 2:25 PM, 大平怜  wrote:
>>> 
>>> Hi Ariel,
>>> 
>>> CommitLogSegment assumes commit log files stored on a regular file
>> system.
>>> Our CAPI Flash system bypasses OS and directly accesses flash,
>>> so we cannot use the current framework of CommitLogSegment as it is.
>>> Intel's SPDK also bypasses a file system, so we think this kind of
>>> requirement
>>> is not uncommon.
>>> 
>>> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
>>> because the archiving and synchronization logics have to be decoupled.
>>> It would require major rework, and we don't think we should affect
>>> the existing implementation so much.
>>> 
>>> We do not change any existing format of CommitLog.  Our plugin will use
>>> its own format, as it must manage commit logs on the 4KB-block-oriented
>>> address spaces of flash devices.
>>> 
>>> 
>>> Regards,
>>> Rei Odaira
>>> 
>>> 
>>> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
>>> 
>>>> Hi,
>>>> 
>>>> There are pluggable elements to the commit log such as those used to
>>>> support mmap or compressed.
>>>> 
>>>> Can you describe at a high level what a new implementation would look
>>>> like and why it can't be a mode of the existing implementation?
>>>> 
>>>> You are not proposing changing the format correct?
>>>> 
>>>> Regards,
>>>> Ariel
>>>> 
>>>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>>>>> Hello,
>>>>> 
>>>>> We are developing a Cassandra plugin to store CommitLog on our
>>>>> low-latency
>>>>> Flash device (CAPI-Flash).  To do that, the original CommitLog
>> interface
>>>>> must be changed to allow plugins.  Anyone has any thoughts about it?
>> We
>>>>> have our codebase ready, but we think we should start with high-level
>>>>> discussion.
>>>>> 
>>>>> The runtime overhead will be minimal.  The only overhead will be
>> changing
>>>>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
>>>>> etc.
>>>>> into interface invocations.
>>>>> 
>>>>> Synching to CommitLog is one of the performance bottlenecks in
>> Cassandra
>>>>> especially with batch commit.  I think the pluggable CommitLog will
>> allow
>>>>> other interesting alternatives, such as one using SPDK.  Appreciate any
>>>>> comments.
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Rei Odaira
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
>> 
>> 



Re: Making CommitLog pluggable

2017-11-01 Thread Michael Kjellman
Rei:

One thing that comes up when these type of conversations occur is how the 
project can test hardware dependent code. In the case of the PPC64 stuff, 
hardware actually got donated to the ASF so Jenkins runs could be done to check 
that things work. Any thoughts on this aspect? Might be a bit pre-mature, but I 
thought I'd at least mention it... On the flip side: if CommitLog becomes 
pluggable enough, shipping an implementation compatible with the hardware out 
of tree might also be viable too.

best,
kjellman

> On Nov 1, 2017, at 2:25 PM, 大平怜  wrote:
> 
> Hi Ariel,
> 
> CommitLogSegment assumes commit log files stored on a regular file system.
> Our CAPI Flash system bypasses OS and directly accesses flash,
> so we cannot use the current framework of CommitLogSegment as it is.
> Intel's SPDK also bypasses a file system, so we think this kind of
> requirement
> is not uncommon.
> 
> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> because the archiving and synchronization logics have to be decoupled.
> It would require major rework, and we don't think we should affect
> the existing implementation so much.
> 
> We do not change any existing format of CommitLog.  Our plugin will use
> its own format, as it must manage commit logs on the 4KB-block-oriented
> address spaces of flash devices.
> 
> 
> Regards,
> Rei Odaira
> 
> 
> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
> 
>> Hi,
>> 
>> There are pluggable elements to the commit log such as those used to
>> support mmap or compressed.
>> 
>> Can you describe at a high level what a new implementation would look
>> like and why it can't be a mode of the existing implementation?
>> 
>> You are not proposing changing the format correct?
>> 
>> Regards,
>> Ariel
>> 
>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>>> Hello,
>>> 
>>> We are developing a Cassandra plugin to store CommitLog on our
>>> low-latency
>>> Flash device (CAPI-Flash).  To do that, the original CommitLog interface
>>> must be changed to allow plugins.  Anyone has any thoughts about it?  We
>>> have our codebase ready, but we think we should start with high-level
>>> discussion.
>>> 
>>> The runtime overhead will be minimal.  The only overhead will be changing
>>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
>>> etc.
>>> into interface invocations.
>>> 
>>> Synching to CommitLog is one of the performance bottlenecks in Cassandra
>>> especially with batch commit.  I think the pluggable CommitLog will allow
>>> other interesting alternatives, such as one using SPDK.  Appreciate any
>>> comments.
>>> 
>>> 
>>> Regards,
>>> Rei Odaira
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 



Re: Making CommitLog pluggable

2017-10-31 Thread Michael Kjellman
I think pluggable commit log is a great idea if we can do it right! 😃 
*Especially* given the intent here seems to be motivated by improving endurance 
and performance on the given storage you're running C* on. We've actually been 
looking at commit log a bunch recently and doing a pretty deep dive -- even 
getting a low level as looking at how commit log interacts with a drive with a 
SATA protocol analyzer and working to understand how this actually impacts the 
underlying NAND and various drives GC algorithms. I totally think we should 
have different implementations of commit log optimized for the storage you're 
running on. Ideally, we would even automatically suggest the implementation 
that we think will work best for your hardware.

Looking forward to seeing your changes on a branch somewhere!

best,
kjellman

On Oct 31, 2017, at 1:09 PM, 大平怜 
mailto:rei.oda...@gmail.com>> wrote:

Hello,

We are developing a Cassandra plugin to store CommitLog on our low-latency
Flash device (CAPI-Flash).  To do that, the original CommitLog interface
must be changed to allow plugins.  Anyone has any thoughts about it?  We
have our codebase ready, but we think we should start with high-level
discussion.

The runtime overhead will be minimal.  The only overhead will be changing
method invocations to CommitLog#add(), CommitLog#getCurrentPosition(), etc.
into interface invocations.

Synching to CommitLog is one of the performance bottlenecks in Cassandra
especially with batch commit.  I think the pluggable CommitLog will allow
other interesting alternatives, such as one using SPDK.  Appreciate any
comments.


Regards,
Rei Odaira



Re: Integrating vendor-specific code and developing plugins

2017-05-18 Thread Michael Kjellman
That’s epic Jeff. Very cool.

Sent from my iPhone

On May 18, 2017, at 10:28 AM, Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:

On Mon, May 15, 2017 at 5:25 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:



To me testable means that we can run the tests at the very least for every
release, but ideally they would be run more often than that.  Especially
with the push to not release unless the test board is all passing, we
should not be releasing features that we don’t have a test board for.
Ideally that means we have it in ASF CI.  If there is someone that can
commit to posting results of runs from an outside CI somewhere, then I
think that could work as well, but that gets pretty cumbersome if we have
to check 10 different CI dashboards at different locations before every
release.



It turns out there's a ppc64le jenkins slave @ asf, so I've setup
https://builds.apache.org/view/A-D/view/Cassandra/job/cassandra-devbranch-ppc64le-testall/
for testing.

Like our other devbranch-testall builds, it takes a repo+branch as
parameters, and runs unit tests. While the unit tests aren't passing, this
platform should now be considered testable.


Re: Dropped Mutation and Read messages.

2017-05-11 Thread Michael Kjellman
This discussion should be on the C* user mailing list. Thanks!

best,
kjellman

> On May 11, 2017, at 10:53 AM, Oskar Kjellin  wrote:
> 
> That seems way too low. Depending on what type of disk you have it should be 
> closer to 1-200MB.
> That's probably causing your problems. It would still take a while for you to 
> compact all your data tho 
> 
> Sent from my iPhone
> 
>> On 11 May 2017, at 19:50, varun saluja  wrote:
>> 
>> nodetool getcompactionthrougput
>> 
>> ./nodetool getcompactionthroughput
>> Current compaction throughput: 16 MB/s
>> 
>> Regards,
>> Varun Saluja
>> 
>>> On 11 May 2017 at 23:18, varun saluja  wrote:
>>> Hi,
>>> 
>>> PFB results for same. Numbers are scary here.
>>> 
>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>> pending tasks: 137
>>>   compaction type keyspace tablecompleted   
>>>totalunit   progress
>>>Compaction   system hints   5762711108   
>>> 837522028005   bytes  0.69%
>>>Compaction   walletkeyspace   user_txn_history_v2101477894 
>>> 4722068388   bytes  2.15%
>>>Compaction   walletkeyspace   user_txn_history_v2   1511866634   
>>> 753221762663   bytes  0.20%
>>>Compaction   walletkeyspace   user_txn_history_v2   3664734135
>>> 18605501268   bytes 19.70%
>>> Active compaction remaining time :  26h32m28s
>>> 
>>> 
>>> 
 On 11 May 2017 at 23:15, Oskar Kjellin  wrote:
 What does nodetool compactionstats show?
 
 I meant compaction throttling. nodetool getcompactionthrougput
 
 
> On 11 May 2017, at 19:41, varun saluja  wrote:
> 
> Hi Oskar,
> 
> Thanks for response.
> 
> Yes, could see lot of threads for compaction. Actually we are loading 
> around 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2 
> days and then, we start getting Mutation drops  , longer GC and very high 
> load on system.
> 
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) 
> off-heap
> 
> The job was stopped 12 hours back. But, still these failures can be seen. 
> Can you Please let me know how shall i proceed further. If possible, 
> Please suggest some parameters for high write intensive jobs.
> 
> 
> Regards,
> Varun Saluja
> 
> 
>> On 11 May 2017 at 23:01, Oskar Kjellin  wrote:
>> Do you have a lot of compactions going on? It sounds like you might've 
>> built up a huge backlog. Is your throttling configured properly?
>> 
>>> On 11 May 2017, at 18:50, varun saluja  wrote:
>>> 
>>> Hi Experts,
>>> 
>>> Seeking your help on a production issue.  We were running high write 
>>> intensive job on our 3 node cassandra cluster V 2.1.7.
>>> 
>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, 
>>> loadavg on 1 of the node increased to very high number like loadavg : 
>>> 29.
>>> 
>>> System log reports:
>>> 
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 
>>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 
>>> 5000ms
>>> 
>>> The job was stopped due to heavy load. But sill after 12 hours , we can 
>>> see mutation drops messages and sudden increase on avgload
>>> 
>>> Are these hintedhandoff mutations? Can we stop these.
>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show 
>>> any load or any such activity.
>>> 
>>> Due to heavy load and GC , there are intermittent gossip failures among 
>>> node. Can you someone Please help.
>>> 
>>> PS: Load job was stopped on cluster. Everything ran fine for few hours 
>>> and and Later issue started again like mutation messages drops.
>>> 
>>> Thanks and Regards,
>>> Varun Saluja
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
> 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

2017-05-11 Thread Michael Kjellman
I'm almost done with a rebased trunk patch. Hit a few snags. I want nothing 
more to finish this thing... The latest issue was due to range tombstones and 
the fact that the deletion time was being stored in the index from 3.0 onwards. 
I hope to have everything pushed very shortly. Sorry for the delay, I'm doing 
my best... there is never enough hours in the day. :)

best,
kjellman 

> On May 11, 2017, at 1:48 AM, Kant Kodali  wrote:
> 
> oh this looks like one I am looking for
> https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra
> 3.10 or merged somewhere?
> 
> On Thu, May 11, 2017 at 1:13 AM, Kant Kodali  wrote:
> 
>> Hi DuyHai,
>> 
>> I am trying to see what are the possible things we can do to get over this
>> limitation?
>> 
>> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help
>> at all?
>> 2. Can we have Merkle trees built for groups of rows in partition ? such
>> that we can stream only those groups where the hash is different?
>> 3. It would be interesting to see if we can spread a partition across
>> nodes.
>> 
>> I am just trying to validate some ideas that can help potentially get over
>> this 100MB limitation since we may not always fit into a time series model.
>> 
>> Thanks!
>> 
>> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan 
>> wrote:
>> 
>>> Yes the recommendation still applies
>>> 
>>> Wide partitions have huge impact on repair (over streaming), compaction
>>> and bootstrap
>>> 
>>> Le 10 mai 2017 23:54, "Kant Kodali"  a écrit :
>>> 
>>> Hi All,
>>> 
>>> Cassandra community had always been recommending 100MB per partition as a
>>> sweet spot however does this limitation still exist given there is a
>>> B-tree
>>> implementation to identify rows inside a partition?
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/db/rows/BTreeRow.java
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>> 



Re: unsubscribe

2017-04-06 Thread Michael Kjellman
http://apache.org/foundation/mailinglists.html

Sent from my iPhone

On Apr 6, 2017, at 9:57 AM, Nitija Patil 
mailto:nitij...@gmail.com>> wrote:

unsubscribe

On Thu, Apr 6, 2017 at 10:25 PM, Vineet Gadodia 
mailto:gadodi...@gmail.com>> wrote:

unsubscribe

On Wed, Apr 5, 2017 at 1:51 AM, Ksawery Glab 
mailto:ksaweryg...@gmail.com>>
wrote:

unsubscribe

2017-04-05 9:45 GMT+01:00 Nitija Patil 
mailto:nitij...@gmail.com>>:

unsubscribe

On Wed, Apr 5, 2017 at 2:05 PM, 郑蒙家(蒙家) 

Re: Upgrade the jna version to 4.3.0

2017-03-05 Thread Michael Kjellman
Amit –

What changes/bug fixes specifically are you looking for in JNA 4.3.0? Thanks!

best,
kjellman

Sent from my iPhone

> On Mar 5, 2017, at 12:58 PM, Jason Brown  wrote:
> 
> Hi Amit,
> 
> Can you open a Jira for that? Also we can figure out which branches to
> target for the upgrade on the Jira.
> 
> Thanks,
> 
> Jason
>> On Sun, Mar 5, 2017 at 08:25 Amitkumar Ghatwal  wrote:
>> 
>> 
>> 
>> Hi All,
>> 
>> Could you please upgrade the jna version present in the github cassandra
>> location :
>> https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar
>> to below latest version   - 4.3.0 -
>> 
>> http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar
>> .
>> 
>> Let me know the process to upgrade the same .
>> 
>> Regards,
>> Amit
>> 
>> 
>> 


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Michael Kjellman
Jason has asked for review and feedback many times. Maybe be constructive and 
review his code instead of just complaining (once again)?

Sent from my iPhone

> On Nov 19, 2016, at 1:49 PM, Edward Capriolo  wrote:
> 
> I would say start with a mindset like 'people will run this in production'
> not like 'why would you expect this to work'.
> 
> Now how does this logic effect feature develement? Maybe use gossip 2.0 as
> an example.
> 
> I will play my given debby downer role. I could imagine 1 or 2 dtests and
> the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with
> twitter announce of the release let bugs trickle in.
> 
> One could also do something comprehensive like test on clusters of 2 to
> 1000 nodes. Test with jepsen to see what happens during partitions, inject
> things like jvm pauses and account for behaivor. Log convergence times
> after given events.
> 
> Take a stand and say look "we engineered and beat the crap out of this
> feature. I deployed this release feature at my company and eat my dogfood.
> You are not my crash test dummy."
> 
> 
>> On Saturday, November 19, 2016, Jeff Jirsa  wrote:
>> 
>> Any proposal to solve the problem you describe?
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo > > wrote:
>>> 
>>> This is especially relevant if people wish to focus on removing things.
>>> 
>>> For example, gossip 2.0 sounds great, but seems geared toward huge
>> clusters
>>> which is not likely a majority of users. For those with a 20 node cluster
>>> are the indirect benefits woth it?
>>> 
>>> Also there seems to be a first push to remove things like compact storage
>>> or thrift. Fine great. But what is the realistic update path for someone.
>>> If the big players are running 2.1 and maintaining backports, the average
>>> shop without a dedicated team is going to be stuck saying (great features
>>> in 4.0 that improve performance, i would probably switch but its not
>> stable
>>> and we have that one compact storage cf and who knows what is going to
>>> happen performance wise when)
>>> 
>>> We really need to lose this realease wont be stable for 6 minor versions
>>> concept.
>>> 
>>> On Saturday, November 19, 2016, Edward Capriolo > >
>>> wrote:
>>> 
 
 
 On Friday, November 18, 2016, Jeff Jirsa > 
 >>> ');>>
>> wrote:
 
> We should assume that we’re ditching tick/tock. I’ll post a thread on
> 4.0-and-beyond here in a few minutes.
> 
> The advantage of a prod release every 6 months is fewer incentive to
>> push
> unfinished work into a release.
> The disadvantage of a prod release every 6 months is then we either
>> have
> a very short lifespan per-release, or we have to maintain lots of
>> active
> releases.
> 
> 2.1 has been out for over 2 years, and a lot of people (including us)
>> are
> running it in prod – if we have a release every 6 months, that means
>> we’d
> be supporting 4+ releases at a time, just to keep parity with what we
>> have
> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
>> year
> old branches.
> 
> 
> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf
>> of Blake
> Eggleston" > wrote:
> 
>>> While stability is important if we push back large "core" changes
> until later we're just setting ourselves up to face the same issues
>> later on
>> 
>> In theory, yes. In practice, when incomplete features are earmarked
>> for
> a certain release, those features are often rushed out, and not always
> fully baked.
>> 
>> In any case, I don’t think it makes sense to spend too much time
> planning what goes into 4.0, and what goes into the next major release
>> with
> so many release strategy related decisions still up in the air. Are we
> going to ditch tick-tock? If so, what will it’s replacement look like?
> Specifically, when will the next “production” release happen? Without
> knowing that, it's hard to say if something should go in 4.0, or 4.5,
>> or
> 5.0, or whatever.
>> 
>> The reason I suggested a production release every 6 months is because
> (in my mind) it’s frequent enough that people won’t be tempted to rush
> features to hit a given release, but not so frequent that it’s not
> practical to support. It wouldn’t be the end of the world if some of
>> these
> tickets didn’t make it into 4.0, because 4.5 would fine.
>> 
>> On November 18, 2016 at 1:57:21 PM, kurt Greaves (
>> k...@instaclustr.com )
> wrote:
>> 
>>> On 18 November 2016 at 18:25, Jason Brown > > wrote:
>>> 
>>> #11559 (enhanced node representation) - decided it's *not* something
>> we
>>> need wrt #7544 storage port configurable per node, so we are punting
>> on
>>> 
>> 
>> #12344 - Forward writes to replacement node with same address during
> replace
>> depends on #11559. To be honest I'd say #12344 i

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Michael Kjellman
Honest question: are you *ever* positive Ed? 

Maybe give it a shot once in a while. It will be good for your mental health. 


Sent from my iPhone

> On Nov 19, 2016, at 11:50 AM, Edward Capriolo  wrote:
> 
> This is especially relevant if people wish to focus on removing things.
> 
> For example, gossip 2.0 sounds great, but seems geared toward huge clusters
> which is not likely a majority of users. For those with a 20 node cluster
> are the indirect benefits woth it?
> 
> Also there seems to be a first push to remove things like compact storage
> or thrift. Fine great. But what is the realistic update path for someone.
> If the big players are running 2.1 and maintaining backports, the average
> shop without a dedicated team is going to be stuck saying (great features
> in 4.0 that improve performance, i would probably switch but its not stable
> and we have that one compact storage cf and who knows what is going to
> happen performance wise when)
> 
> We really need to lose this realease wont be stable for 6 minor versions
> concept.
> 
> On Saturday, November 19, 2016, Edward Capriolo 
> wrote:
> 
>> 
>> 
>> On Friday, November 18, 2016, Jeff Jirsa > > wrote:
>> 
>>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>>> 4.0-and-beyond here in a few minutes.
>>> 
>>> The advantage of a prod release every 6 months is fewer incentive to push
>>> unfinished work into a release.
>>> The disadvantage of a prod release every 6 months is then we either have
>>> a very short lifespan per-release, or we have to maintain lots of active
>>> releases.
>>> 
>>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>>> running it in prod – if we have a release every 6 months, that means we’d
>>> be supporting 4+ releases at a time, just to keep parity with what we have
>>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>>> old branches.
>>> 
>>> 
>>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>>> Eggleston"  wrote:
>>> 
> While stability is important if we push back large "core" changes
>>> until later we're just setting ourselves up to face the same issues later on
 
 In theory, yes. In practice, when incomplete features are earmarked for
>>> a certain release, those features are often rushed out, and not always
>>> fully baked.
 
 In any case, I don’t think it makes sense to spend too much time
>>> planning what goes into 4.0, and what goes into the next major release with
>>> so many release strategy related decisions still up in the air. Are we
>>> going to ditch tick-tock? If so, what will it’s replacement look like?
>>> Specifically, when will the next “production” release happen? Without
>>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>>> 5.0, or whatever.
 
 The reason I suggested a production release every 6 months is because
>>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>>> features to hit a given release, but not so frequent that it’s not
>>> practical to support. It wouldn’t be the end of the world if some of these
>>> tickets didn’t make it into 4.0, because 4.5 would fine.
 
 On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>>> wrote:
 
> On 18 November 2016 at 18:25, Jason Brown  wrote:
> 
> #11559 (enhanced node representation) - decided it's *not* something we
> need wrt #7544 storage port configurable per node, so we are punting on
> 
 
 #12344 - Forward writes to replacement node with same address during
>>> replace
 depends on #11559. To be honest I'd say #12344 is pretty important,
 otherwise it makes it difficult to replace nodes without potentially
 requiring client code/configuration changes. It would be nice to get
>>> #12344
 in for 4.0. It's marked as an improvement but I'd consider it a bug and
 thus think it could be included in a later minor release.
 
 Introducing all of these in a single release seems pretty risky. I think
>>> it
> would be safer to spread these out over a few 4.x releases (as they’re
> finished) and give them time to stabilize before including them in an
>>> LTS
> release. The downside would be having to maintain backwards
>>> compatibility
> across the 4.x versions, but that seems preferable to delaying the
>>> release
> of 4.0 to include these, and having another big bang release.
 
 
 I don't think anyone expects 4.0.0 to be stable. It's a major version
 change with lots of new features; in the production world people don't
 normally move to a new major version until it has been out for quite some
 time and several minor releases have passed. Really, most people are only
 migrating to 3.0.x now. While stability is important if we push back
>>> large
 "core" changes until later we're just setting ourselves up to face the
>>> same
 issues later on. Th

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-11-08 Thread Michael Kjellman
Yes, We hit this as well. We have a internal patch that I wrote to mostly 
revert the behavior back to ByteBuffers with as small amount of code change as 
possible. Performance of our build is now even with 2.0.x and we've also 
forward ported it to 3.x (although the 3.x patch was even more complicated due 
to Bounds, RangeTombstoneBound, ClusteringPrefix which actually increases the 
number of allocations to somewhere between 11 and 13 depending on how I count 
it per indexed block -- making it even worse than what you're observing in 2.1.

We haven't upstreamed it as 2.1 is obviously not taking any changes at this 
point and the longer term solution is 
https://issues.apache.org/jira/browse/CASSANDRA-9754 (which also includes the 
changes to go back to ByteBuffers and remove as much of the Composites from the 
storage engine as possible.) Also, the solution is a bit of a hack -- although 
it was a blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if 
it works..

best,
kjellman


On Nov 8, 2016, at 11:31 AM, Dikang Gu 
mailto:dikan...@gmail.com>> wrote:

This is very expensive:

"MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10 
tid=0x7f2fd57e1800 nid=0x1cc510 runnable [0x7f2b971b]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.cassandra.db.marshal.IntegerType.compare(IntegerType.java:29)
at 
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:98)
at 
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:31)
at java.util.TreeMap.put(TreeMap.java:545)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:254)
at 
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:228)
at 
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:104)
at 
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:156)
at 
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)


Checked the git history, it comes from this jira: 
https://issues.apache.org/jira/browse/CASSANDRA-5417

Any thoughts?
​

On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta 
mailto:pauloricard...@gmail.com>> wrote:
Haven't seen this before, but perhaps it's related to CASSANDRA-10433? This is 
just a wild guess as it's in a related codepath, but maybe worth trying out the 
patch available to see if it helps anything...

2016-10-28 15:03 GMT-02:00 Dikang Gu 
mailto:dikan...@gmail.com>>:
We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster to 
2.1.14 as well. The 2.1.14 node is not able to handle the same amount of read 
traffic as the 2.0.16 node, actually, it's less than 50%.

And in the perf results, the first line could go as high as 50%, as we turn up 
the read traffic, which never appeared in 2.0.16.

Any thoughts?
Thanks


Samples: 952K of event 'cycles', Event count (approx.): 229681774560
Overhead  Shared Object  Symbol
   6.52%  perf-196410.map[.] 
Lorg/apache/cassandra/db/marshal/IntegerType;.compare in 
Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   4.84%  libzip.so  [.] adler32
   2.88%  perf-196410.map[.] 
Ljava/nio/HeapByteBuffer;.get in 
Lorg/apache/cassandra/db/marshal/IntegerType;.compare
   2.39%  perf-196410.map[.] 
Ljava/nio/Buffer;.checkIndex in 
Lorg/apache/cassandra/db/marshal/IntegerType;.findMostSignificantByte
   2.03%  perf-196410.map[.] 
Ljava/math/BigInteger;.compareTo in 
Lorg/apache/cassandra/db/DecoratedKey;.compareTo
   1.65%  perf-196410.map[.] vtable chunks
   1.44%  perf-196410.map[.] 
Lorg/apache/cassandra/db/DecoratedKey;.compareTo in 
Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
   1.02%  perf-196410.map[.] 
Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   1.00%  
snappy-1.0.5.2-libsnappyjava.so
[.] 0x3804
   0.87%  perf-196410.map[.] 
Ljava/io/DataInputStream;.readFully in 
Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
   0.82%  
snappy-1.0.5.2-libsnappyjava.so
[.] 0x36

Re: Review of Cassandra actions

2016-11-05 Thread Michael Kjellman
Thanks Jeff for your thoughtful comments. +100

Sent from my iPhone

> On Nov 5, 2016, at 6:26 PM, Jeff Jirsa  wrote:
> 
> I hope the other 7 members of the board take note of this response,
> and other similar reactions on dev@ today.
> 
> When Datastax violated trademark, they acknowledged it and worked to
> correct it. To their credit, they tried to do the right thing.
> When the PMC failed to enforce problems, we acknowledged it and worked
> to correct it. We aren't perfect, but we're trying.
> 
> When a few members the board openly violate the code of conduct, being
> condescending and disrespectful under the auspices of "enforcing the
> rules" and "protecting the community", they're breaking the rules,
> damaging the community, and nobody seems willing to acknowledge it or
> work to correct it. It's not isolated, I'll link examples if it's
> useful.
> 
> In a time when we're all trying to do the right thing to protect the
> project and the community, it's unfortunate that high ranking, long
> time members within the ASF actively work to undermine trust and
> community while flaunting the code of conduct, which requires
> friendliness, empathy, and professionalism, and the rest of the board
> is silent on the matter.
> 
> 
> 
> 
>> On Nov 5, 2016, at 4:08 PM, Dave Brosius  wrote:
>> 
>> I take this response (a second time) as a pompous way to trivialize the 
>> responses of others as to the point of their points being meaningless to 
>> you. So either explain what this means, or accept the fact that you are as 
>> Chris is exactly what people are claiming you to be. Abnoxious bullies more 
>> interested in throwing your weight around and causing havoc, destroying a 
>> community, rather than actually being motivated by improving the ASF.
>> 
>> 
>>> On 11/05/2016 06:16 PM, Jim Jagielski wrote:
>>> How about a nice game of chess?
>>> 
 On Nov 5, 2016, at 1:15 PM, Aleksey Yeschenko  wrote:
 
 I’m sorry, but this statement is so at odds with common sense that I have 
 to call it out.
 
 Of course your position grants your voice extra power. A lot of extra 
 power,
 like it or not (I have a feeling you quite like it, though).
 
 In an ideal world, that power would entail corresponding duties:
 care and consideration in your actions at least.
 Instead, you are being hotheaded, impulsive, antagonising, and immature.
 
 In what possible universe dropping that hammer threat from the ’20% off” 
 email thread,
 then following up with a Game of Thrones youtube clip is alright?
 
 That kind of behaviour is inappropriate for a board member. Frankly, it 
 wouldn’t be
 appropriate for a greeter at Walmart. If you don’t see this, we do indeed 
 have bigger
 problems.
 
 --
 AY
 
 On 5 November 2016 at 14:57:13, Jim Jagielski (j...@jagunet.com) wrote:
 
>> But I love the ability of VP's and Board to simply pretend their 
>> positions carried no weight.
>> 
> I would submit that whatever "weight" someone's position may
> carry, it is due to *who* they are, and not *what* they are.
> 
> If we have people here in the ASF or in PMCs which really think
> that titles manner in discussions like this, when one is NOT
> speaking ex cathedra, then we have bigger problems. :)
>>> 
>> 


Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Michael Kjellman
And to add one additional thought to follow up: I generally am personally 
motivated to fix problems and bugs that reduce my chance of getting paged at 
3am in the morning. This is important for my mental health but also for the 
perceived stability of our products (obviously). 

Features are important as they provide gateways for adoption of all the other 
code by new customers.

Stability and performance is one of those things that doesn't "sell" well to 
new adopters (but sells very well to existing customers).

Luckily most people are on 2.1 and 3.0 and there are tons of features already 
in releases for people to adopt so we've got the "features" thing under control 
for at least a year in my opinion. 

best,
kjellman

Sent from my iPhone

> On Nov 4, 2016, at 10:33 AM, Michael Kjellman  
> wrote:
> 
> "Avalon. The database" yes autocorrect. That's exactly what I wanted. 
> 
> That should read "scaling the database and stability." Sorry. I'm typing this 
> while walking up a big ass hill in San Francisco heading to the office. 😜
> 
> Sent from my iPhone
> 
>> On Nov 4, 2016, at 10:31 AM, Michael Kjellman  
>> wrote:
>> 
>> Avalon. The database


Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Michael Kjellman
"Avalon. The database" yes autocorrect. That's exactly what I wanted. 

That should read "scaling the database and stability." Sorry. I'm typing this 
while walking up a big ass hill in San Francisco heading to the office. 😜

Sent from my iPhone

> On Nov 4, 2016, at 10:31 AM, Michael Kjellman  
> wrote:
> 
> Avalon. The database


Re: Moderation

2016-11-04 Thread Michael Kjellman
Thank you Chris. 

I have just replied to her email myself.

Sent from my iPhone

> On Nov 4, 2016, at 10:24 AM, Chris Mattmann  wrote:
> 
> I'm sorry that you feel I'm promoting the arguing. The point of this thread 
> was a simple request - moderate the email. I suspected that there weren't 
> enough moderators but didn't spend the time to check. Honestly this is the 
> Apache Cassandra PMC's job to maintain a healthy set of moderators, ideally 
> in diverse timezones. Some may not feel 12 hours is a long time. I can only 
> provide answers from most of the other Apache communities I participate in 
> (Tika, Nutch, OODT, Hadoop, Solr/Lucene, Incubator, etc.) and in those 
> communities, that *is* a long time. That said, this was also elevated because 
> I saw someone with real questions asking them on Twitter and not on ASF lists 
> pertaining to the projects (Yes there was also denigrating statements about 
> both DataStax and ASF in there too). So, regarding that, the grown up thing 
> to do (and honestly the "Apache" thing to do) is to bring the conversation on 
> list, and talk, rather than sound off in 25+ different mediums. 
> 
> Now we've added a least one more moderator, which will help the root of the 
> problem. And Kelly's email is on list. I will try and reply with my knowledge 
> as a concerned ASF member first, and Board Member second. It would be great 
> for the Apache Cassandra PMC and community to reply as well.
> 
> Cheers,
> Chris
> 
> 
>> On 2016-11-04 10:14 (-0700), Michael Kjellman  
>> wrote: 
>> @Chris: instead of promoting the arguing going on on this thread could you 
>> please help lead by example and reply to Kelly's questions in her email? 
>> Thanks. 
>> 
>> I don't enjoy watching a community I care about continue to explode in front 
>> of my eyes â~¹ï¸
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone
>> 
>>> On Nov 4, 2016, at 10:10 AM, Chris Mattmann  wrote:
>>> 
>>> I have apmail karma and can add moderators. 
>>> 
>>> Jason I can add you - please confirm you would like to be added. Did you 
>>> file the ticket - if so point me to it. If you haven't yet, no worries I 
>>> can still add you. Let me know. Thanks.
>>> 
>>>> On 2016-11-04 09:54 (-0700), Jason Brown  wrote: 
>>>> Gary,
>>>> 
>>>> I've just started looking into the moderator component due to this thread;
>>>> I admit I did not know about it before (my fault). Yes, I would like to be
>>>> added. Apparently, I need to file an INFRA ticket (as per
>>>> https://www.apache.org/dev/committers.html#mailing-list-moderators), which
>>>> I will do in the next few minutes.
>>>> 
>>>> -Jason
>>>> 
>>>>> On Fri, Nov 4, 2016 at 9:51 AM, Gary Dusbabek  wrote:
>>>>> 
>>>>> I'm beginning to wonder if I'm the only one with moderator privs. Any 
>>>>> other
>>>>> committer/PMCs interested?
>>>>> 
>>>>> Sorry, it's a chore to begin with and I've been traveling this week.
>>>>> 
>>>>> Gary.
>>>>> 
>>>>> On Fri, Nov 4, 2016 at 3:47 PM, Chris Mattmann 
>>>>> wrote:
>>>>> 
>>>>>> Hi Folks,
>>>>>> 
>>>>>> Kelly Sommers sent a message to dev@cassandra and I'm trying to figure
>>>>>> out if it's in moderation.
>>>>>> 
>>>>>> Can the moderators speak up?
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 


Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Michael Kjellman
Hi Kelly-

I can't speak to many of your questions as it's not my position to do so. What 
I can say is that at Apple we are doubling down on open source. We have tons of 
code in flight -- really big ones in fact -- many already out for review. Our 
list of enhancements we want to do grows all the time so there is no shortage 
of work to do. We also have a really great team built up with an incredible 
amount of in house knowledge. 

The stuff we work on generally is focused on Avalon. The database and 
stabilizing it. I'm not sure how much "feature" work we will do in comparison 
(although things like SASI obviously is). 

It's unfortunate how things have played out -- but let's remind ourselves this 
is a database and we're in it for the long haul. The last thing we want is to 
have the project stagnate due to infighting. 

For the foreseeable future Apple and The Last Pickle will step up a bit more of 
an active role as much as we can. 

I have no doubt in my mind this will change the project. The rate of releases 
-- what gets worked on -- bandwidth to fix low hanging fruit tickets... but at 
least I see a path forward. 

So let's try to be positive here and lead by example. It's the only thing we 
can do right now. 

best,
kjellman



Sent from my iPhone

> On Nov 4, 2016, at 9:47 AM, Kelly Sommers  wrote:
> 
> I think the community needs some clarification about what's going on.
> There's a really concerning shift going on and the story about why is
> really blurry. I've heard all kinds of wild claims about what's going on.
> 
> I've heard people say the ASF is pushing DataStax out because they don't
> like how much control they have over Cassandra. I've heard other people say
> DataStax and the ASF aren't getting along. I've heard one person who has
> pull with a friend in the ASF complained about a feature not getting
> considered (who also didn't go down the correct path of proposing) kicked
> and screamed and started the ball rolling for control change.
> 
> I don't know what's going on, and I doubt the truth is in any of those, the
> truth is probably somewhere in between. As a former Cassandra MVP and
> builder of some of the larger Cassandra clusters in the last 3 years I'm
> concerned.
> 
> I've been really happy with Jonathan and DataStax's role in the Cassandra
> community. I think they have done a great job at investing time and money
> towards the good interest in the project. I think it is unavoidable a
> single company bootstraps large projects like this into popularity. It's
> those companies investments who give the ability to grow diversity in later
> stages. The committer list in my opinion is the most diverse its ever been,
> hasn't it? Apple is a big player now.
> 
> I don't think reducing DataStax's role for the sake of diversity is smart.
> You grow diversity by opening up new opportunities for others. Grow the
> committer list perhaps. Mentor new people to join that list. You don't kick
> someone to the curb and hope things improve. You add.
> 
> I may be way off on what I'm seeing but there's not much to go by but
> gossip (ahaha :P) and some ASF meeting notes and DataStax blog posts.
> 
> August 17th 2016 ASF changed the Apache Cassandra chair
> https://www.apache.org/foundation/records/minutes/2016/board_minutes_2016_08_17.txt
> 
> "The Board expressed continuing concern that the PMC was not acting
> independently and that one company had undue influence over the project."
> 
> August 19th 2016 Jonothan Ellis steps down as chair
> http://www.datastax.com/2016/08/a-look-back-a-look-forward
> 
> November 2nd 2016 DataStax moves committers to DSE from Cassandra.
> http://www.datastax.com/2016/11/serving-customers-serving-the-community
> 
> I'm really concerned if indeed the ASF is trying to change control and
> diversity  of organizations by reducing DataStax's role. As I said earlier,
> I've been really happy at the direction DataStax and Jonathan has taken the
> project and I would much prefer see additional opportunities along side
> theirs grow instead of subtracting. The ultimate question that's really
> important is whether DataStax and Jonathan have been steering the project
> in the right direction. If the answer is yes, then is there really anything
> broken? Only if the answer is no should change happen, in my opinion.
> 
> Can someone at the ASF please clarify what is going on? The ASF meeting
> notes are very concerning.
> 
> Thank you for listening,
> Kelly Sommers


Re: Moderation

2016-11-04 Thread Michael Kjellman
@Chris: instead of promoting the arguing going on on this thread could you 
please help lead by example and reply to Kelly's questions in her email? 
Thanks. 

I don't enjoy watching a community I care about continue to explode in front of 
my eyes ☹️

best,
kjellman

Sent from my iPhone

> On Nov 4, 2016, at 10:10 AM, Chris Mattmann  wrote:
> 
> I have apmail karma and can add moderators. 
> 
> Jason I can add you - please confirm you would like to be added. Did you file 
> the ticket - if so point me to it. If you haven't yet, no worries I can still 
> add you. Let me know. Thanks.
> 
>> On 2016-11-04 09:54 (-0700), Jason Brown  wrote: 
>> Gary,
>> 
>> I've just started looking into the moderator component due to this thread;
>> I admit I did not know about it before (my fault). Yes, I would like to be
>> added. Apparently, I need to file an INFRA ticket (as per
>> https://www.apache.org/dev/committers.html#mailing-list-moderators), which
>> I will do in the next few minutes.
>> 
>> -Jason
>> 
>>> On Fri, Nov 4, 2016 at 9:51 AM, Gary Dusbabek  wrote:
>>> 
>>> I'm beginning to wonder if I'm the only one with moderator privs. Any other
>>> committer/PMCs interested?
>>> 
>>> Sorry, it's a chore to begin with and I've been traveling this week.
>>> 
>>> Gary.
>>> 
>>> On Fri, Nov 4, 2016 at 3:47 PM, Chris Mattmann 
>>> wrote:
>>> 
 Hi Folks,
 
 Kelly Sommers sent a message to dev@cassandra and I'm trying to figure
 out if it's in moderation.
 
 Can the moderators speak up?
 
 Cheers,
 Chris
 
 
>>> 
>> 


Re: 8099 Storage Format Documentation as used with PRIMARY_INDEX

2016-10-19 Thread Michael Kjellman
Ugh, just finally figured the "header" bit of my question out. Mega lame. :\

> On Oct 18, 2016, at 9:17 AM, Michael Kjellman  
> wrote:
> 
> I'm working on writing Birch for trunk and I noticed the following:
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java#L503
> 
> Prior to 3.0 the offset was the literal offset into the data file, yet now we 
> seem to be doing the position encoded with the key (for all rows regardless 
> of if they're > 64kb and thus have an index component) plus the serialized 
> offset. I also see there is now a a "header" offset.
> 
> In RowIndexEntry there is:
> 
> 
> /**
> * @return the offset to the start of the header information for this row.
> * For some formats this may not be the start of the row.
> */
> public long headerOffset()
> {
>return 0;
> }
> 
> /**
> * The length of the row header (partition key, partition deletion and static 
> row).
> * This value is only provided for indexed entries and this method will throw
> * {@code UnsupportedOperationException} if {@code !isIndexed()}.
> */
> public long headerLength()
> {
>throw new UnsupportedOperationException();
> }
> 
> 
> In 2.1 we stored the partition key, deletion, but not static row -- but we 
> didn't need or use this so I'm guessing this is actually just to support 
> static rows? Is there any further documentation around the header in other 
> classes that I just haven't come across yet? Any thoughts on position + 
> offset and why this behavior changed? Thanks
> 
> best,
> kjellman



Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Sorry, No. Always document your assumptions. I shouldn't need to git blame a 
thousand commits and read thru a billion tickets to maybe understand why 
something was done. Clearly thru the conversations on this topic I've had on 
IRC and the responses so far on this email thread it's not/still not obvious.

best,
kjellman

On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith 
mailto:bened...@apache.org>> wrote:

This is what JIRA is for.



Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Yeah, it has been there for years -- that being said most of the community is 
just catching up to 2.1 and 3.0 now where the usage did appear to change over 
2.0-- and I'm more trying to figure out what the intent was in the various 
usages all over the codebase and make sure it's actually doing that. Maybe even 
add some comments about that intent. :)

In 2.1 I saw that we were doing this to get the file descriptor in some cases 
(which obviously will return the wrong file descriptor so most likely would 
have made this even more of a potential no-op than it already was?):

public static int getfd(String path)
{
RandomAccessFile file = null;
try
{
file = new RandomAccessFile(path, "r");
return getfd(file.getFD());
}
catch (Throwable t)
{
JVMStabilityInspector.inspectThrowable(t);
// ignore
return -1;
}
finally
{
try
{
if (file != null)
file.close();
}
catch (Throwable t)
{
// ignore
}
}
}


On Oct 18, 2016, at 9:34 AM, Jake Luciani 
mailto:jak...@gmail.com>> wrote:

Although given we have an in process page cache[1] now this may not be
needed anymore?
This is only for the data file though.  I think its been years? since we
showed it helped so perhaps someone should show if this is still
working/helping in the real world.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5863


On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

Specifically regarding the behavior in different kernels, from `man
posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
this was interpreted literally as "zero bytes", rather than as meaning "all
bytes through to the end of the file"."

On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com><mailto:mkjell...@internalcircle.com>>
 wrote:

Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances
have been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we
would immediately tidy() it and then call posix_fadvise with
POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
thinking is doing so in respect to the API behavior in modern Linux kernel
builds?). Am I reading things correctly here? Sorta hard as there are many
different code paths the reference could have tidy() called.

Why would we want to drop the segment we just write from the page cache --
wouldn't that most likely be the most hot data, and even if it turned out
not to be wouldn't it be better in this case to have kernel be smart at
what it's best at?

best,
kjellman

On Oct 18, 2016, at 8:50 AM, Jake Luciani 
mailto:jak...@gmail.com><mailto:jaker
s...@gmail.com<mailto:s...@gmail.com>>> wrote:

The main point is to avoid keeping things in the page cache that are no
longer needed like compacted data that has been early opened elsewhere.

On Oct 18, 2016 11:29 AM, "Michael Kjellman" 
mailto:mkjell...@internalcircle.com>
<mailto:mkjell...@internalcircle.com>>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone





--
http://twitter.com/tjake



Re: Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman
Cool, as I would have assumed they would need to be. Given they were initially 
commented out on 6/30/15 maybe cleanup and removal of that dead code is still 
at least warranted.

On Oct 18, 2016, at 9:15 AM, Oleksandr Petrov 
mailto:oleksandr.pet...@gmail.com>> wrote:

Unit tests will be completely rewritten I suspect.



8099 Storage Format Documentation as used with PRIMARY_INDEX

2016-10-18 Thread Michael Kjellman
I'm working on writing Birch for trunk and I noticed the following:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java#L503

Prior to 3.0 the offset was the literal offset into the data file, yet now we 
seem to be doing the position encoded with the key (for all rows regardless of 
if they're > 64kb and thus have an index component) plus the serialized offset. 
I also see there is now a a "header" offset.

In RowIndexEntry there is:


/**
 * @return the offset to the start of the header information for this row.
 * For some formats this may not be the start of the row.
 */
public long headerOffset()
{
return 0;
}

/**
 * The length of the row header (partition key, partition deletion and static 
row).
 * This value is only provided for indexed entries and this method will throw
 * {@code UnsupportedOperationException} if {@code !isIndexed()}.
 */
public long headerLength()
{
throw new UnsupportedOperationException();
}


In 2.1 we stored the partition key, deletion, but not static row -- but we 
didn't need or use this so I'm guessing this is actually just to support static 
rows? Is there any further documentation around the header in other classes 
that I just haven't come across yet? Any thoughts on position + offset and why 
this behavior changed? Thanks

best,
kjellman


Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Within a single SegmentedFile?

On Oct 18, 2016, at 9:02 AM, Ariel Weisberg 
mailto:ariel.weisb...@datastax.com>> wrote:

With compaction there can be hot and cold data mixed together.



Re: Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman
Gotcha, I didn't know we were actually bringing them back from the dead! 

That being said, won't the unit tests need to be re-writtten (or at least 
refactored) after your work? Couldn't we use /* */ comments instead of every 
single line one by one? Given we use source control couldn't we remove the dead 
code and get it from the revision history if we need it in the future?

> On Oct 18, 2016, at 8:18 AM, Oleksandr Petrov  
> wrote:
> 
> I'm currently working on actually making Super Columns work in CQL context.
> Currently they do not really work[1].
> 
> It's not a very small piece of work. It was in the pipeline for some time,
> although there most likely were more important things that had to be worked
> on. I understand your disappointment and am sorry you stumbled upon this.
> But for now you may just disregard the commented tests. My branch is going
> to be ready for review soon.
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-12373
> 
> 
> On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> There was a bunch of tests hastily and messly commented out line by line
>> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
>> SuperColumns support post 8099.
>> 
>> Could those responsible please cleanup after themselves? It's been a while
>> since 8099 was committed in the first place and I don't see us adding Super
>> Column support at this point and the unit tests surly will need to be
>> rewritten anyways.
>> 
>> As my mother always said, pick your dirty wet towel in the hamper off the
>> floor and put it in the hamper please
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone
> 
> -- 
> Alex Petrov



Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Specifically regarding the behavior in different kernels, from `man 
posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then this 
was interpreted literally as "zero bytes", rather than as meaning "all bytes 
through to the end of the file"."

On Oct 18, 2016, at 8:57 AM, Michael Kjellman 
mailto:mkjell...@internalcircle.com>> wrote:

Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances have 
been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we would 
immediately tidy() it and then call posix_fadvise with POSIX_FADV_DONTNEED with 
an offset of 0 and a length of 0 (which I'm thinking is doing so in respect to 
the API behavior in modern Linux kernel builds?). Am I reading things correctly 
here? Sorta hard as there are many different code paths the reference could 
have tidy() called.

Why would we want to drop the segment we just write from the page cache -- 
wouldn't that most likely be the most hot data, and even if it turned out not 
to be wouldn't it be better in this case to have kernel be smart at what it's 
best at?

best,
kjellman

On Oct 18, 2016, at 8:50 AM, Jake Luciani 
mailto:jak...@gmail.com>> wrote:

The main point is to avoid keeping things in the page cache that are no
longer needed like compacted data that has been early opened elsewhere.

On Oct 18, 2016 11:29 AM, "Michael Kjellman" 
mailto:mkjell...@internalcircle.com>>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone




Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances have 
been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we would 
immediately tidy() it and then call posix_fadvise with POSIX_FADV_DONTNEED with 
an offset of 0 and a length of 0 (which I'm thinking is doing so in respect to 
the API behavior in modern Linux kernel builds?). Am I reading things correctly 
here? Sorta hard as there are many different code paths the reference could 
have tidy() called.

Why would we want to drop the segment we just write from the page cache -- 
wouldn't that most likely be the most hot data, and even if it turned out not 
to be wouldn't it be better in this case to have kernel be smart at what it's 
best at?

best,
kjellman

> On Oct 18, 2016, at 8:50 AM, Jake Luciani  wrote:
> 
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
> 
> On Oct 18, 2016 11:29 AM, "Michael Kjellman" 
> wrote:
> 
>> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
>> fashion no comments were provided.
>> 
>> There is a check the OS is Linux (okay, a start) but it turns out the
>> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
>> kernels. We don't check the kernel version -- or even note it.
>> 
>> What is the *expected* outcome of our use of posix_fadvise -- not what
>> does it do or not do today -- but what problem was it added to solve and
>> what's the expected behavior regardless of kernel versions.
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone



Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Sure -- my bad, I aggregated them all of them up for you:
https://github.com/apache/cassandra/search?utf8=✓&q=CLibrary.trySkipCache&type=Code<https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=CLibrary.trySkipCache&type=Code>
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/test/unit/org/apache/cassandra/utils/CLibraryTest.java#L34
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java#L102
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java#L218
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/hints/HintsWriter.java#L292
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/io/util/FileHandle.java#L167
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L174
https://github.com/apache/cassandra/blob/f2a354763877cfeaf1dd017b84a7c8ee9eafd885/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2281
https://github.com/apache/cassandra/blob/f2a354763877cfeaf1dd017b84a7c8ee9eafd885/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2282

Or if you use IDEA this should work pretty well too:
[cid:543B66BF-5E99-4227-A24D-1AB8C0341D97@localhost]

best,
kjellman


On Oct 18, 2016, at 8:33 AM, Benedict Elliott Smith 
mailto:bened...@apache.org>> wrote:

... and continuing in the fashion of behaviours one might like to disabuse
people of, no code link is provided.



On 18 October 2016 at 16:28, Michael Kjellman 
mailto:mkjell...@internalcircle.com>>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone



Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
We use posix_fadvise in a bunch of places, and in stereotypical Cassandra 
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the behavior 
of providing a length of 0 to posix_fadvise changed in some 2.6 kernels. We 
don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what does it 
do or not do today -- but what problem was it added to solve and what's the 
expected behavior regardless of kernel versions. 

best,
kjellman

Sent from my iPhone

Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman
There was a bunch of tests hastily and messly commented out line by line 
(*whyy?*) ColumnFamilyStoreTest with comments that they are pending 
SuperColumns support post 8099. 

Could those responsible please cleanup after themselves? It's been a while 
since 8099 was committed in the first place and I don't see us adding Super 
Column support at this point and the unit tests surly will need to be rewritten 
anyways. 

As my mother always said, pick your dirty wet towel in the hamper off the floor 
and put it in the hamper please

best,
kjellman

Sent from my iPhone

Re: Question on assert

2016-09-21 Thread Michael Kjellman
Yeah, I understand what you're saying, don't get me wrong.

However, I just spent close to a year total working and writing CASSANDRA-9754 
and when you're dealing with IO, sometimes asserts are the right way to go. I 
found putting them there are sanity checks mostly to ensure that code changes 
to other parts of the code don't have unexpected interactions with the input 
bounds expected by a method. I think asserts are fine (and correct) in these 
cases.


> On Sep 21, 2016, at 11:16 AM, Edward Capriolo  wrote:
> 
> You are essentially arguing, "if you turn off -ea your screwed" which is a
> symptom of a larger problem that I am pointing out.
> 
> Forget the "5%" thing. I am having a discussion about use of assert.
> 
> You have:
> 1) checked exceptions
> 2) unchecked exceptions
> 3) Error (like ioError which we sometime have to track)
> 
> The common case for assert is to only be used in testing. This is why -ea
> is off by default.
> 
> My point is that using assert as a Apache Cassandra specific "psuedo
> exception" seems problematic. I can point at tickets in the Cassandra Jira
> where the this is not trapped properly. It appears to me that having deal
> with a 4th "pseudo exception" is code smell.
> 
> Sometimes you see assert in place of a bounds check or a null check that
> you would never want to turn off. Other times it is uses as a quasi
> IllegalStateException. Other times an class named "estimator" asserts when
> the "estimate" "overflows". This seem far away from the defined purpose of
> assert.
> 
> The glaring issue is that it bubbles through try catch so it hardly makes
> me feel "safe" either on or off.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Sep 21, 2016 at 1:34 PM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> Asserts have their place as sanity checks. Just like exceptions have their
>> place.
>> 
>> They can both live in harmony and they both serve a purpose.
>> 
>> What doesn't serve a purpose is that comment encouraging n00b users to get
>> a mythical 5% performance increase and then get silent corruption when
>> their disk/io goes sideways and the asserts might have caught things before
>> it went really wrong.
>> 
>> Sent from my iPhone
>> 
>> On Sep 21, 2016, at 10:31 AM, Edward Capriolo > <mailto:edlinuxg...@gmail.com>> wrote:
>> 
>> " potential 5% performance win when you've corrupted all their data."
>> This is somewhat of my point. Why do assertions that sometimes are trapped
>> "protect my data" better then a checked exception?
>> 
>> On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
>> mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:
>> 
>> I hate that comment with a passion. Please please please please do
>> yourself a favor and *always* run with asserts on. `-ea` for life. In
>> practice I'd be surprised if you actually got a reliable 5% performance win
>> and I doubt your customers will care about a potential 5% performance win
>> when you've corrupted all their data.
>> 
>> best,
>> kjellman
>> 
>> On Sep 21, 2016, at 10:21 AM, Edward Capriolo > <mailto:edlinuxg...@gmail.com>>
>> wrote:
>> 
>> There are a variety of assert usages in the Cassandra. You can find
>> several
>> tickets like mine.
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-12643
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-11537
>> 
>> Just to prove that I am not the only one who runs into these:
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-12484
>> 
>> To paraphrase another ticket that I read today and can not find,
>> "The problem is X throws Assertion which is not caught by the Exception
>> handler and it bubbles over and creates a thread death."
>> 
>> The jvm.properties file claims this:
>> 
>> # enable assertions.  disabling this in production will give a modest
>> # performance benefit (around 5%).
>> -ea
>> 
>> If assertions incur a "5% penalty" but are not always trapped what value
>> do
>> they add?
>> 
>> These are common sentiments about how assert should be used: (not trying
>> to
>> make this a this is what the internet says type debate)
>> 
>> http://stackoverflow.com/questions/2758224/what-does-
>> the-java-assert-keyword-do-and-when-should-it-be-used
>> 
>> "Assertions
>> <http://docs.ora

Re: Question on assert

2016-09-21 Thread Michael Kjellman
Asserts have their place as sanity checks. Just like exceptions have their 
place.

They can both live in harmony and they both serve a purpose.

What doesn't serve a purpose is that comment encouraging n00b users to get a 
mythical 5% performance increase and then get silent corruption when their 
disk/io goes sideways and the asserts might have caught things before it went 
really wrong.

Sent from my iPhone

On Sep 21, 2016, at 10:31 AM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>> wrote:

" potential 5% performance win when you've corrupted all their data."
This is somewhat of my point. Why do assertions that sometimes are trapped
"protect my data" better then a checked exception?

On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:

I hate that comment with a passion. Please please please please do
yourself a favor and *always* run with asserts on. `-ea` for life. In
practice I'd be surprised if you actually got a reliable 5% performance win
and I doubt your customers will care about a potential 5% performance win
when you've corrupted all their data.

best,
kjellman

On Sep 21, 2016, at 10:21 AM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>>
wrote:

There are a variety of assert usages in the Cassandra. You can find
several
tickets like mine.

https://issues.apache.org/jira/browse/CASSANDRA-12643

https://issues.apache.org/jira/browse/CASSANDRA-11537

Just to prove that I am not the only one who runs into these:

https://issues.apache.org/jira/browse/CASSANDRA-12484

To paraphrase another ticket that I read today and can not find,
"The problem is X throws Assertion which is not caught by the Exception
handler and it bubbles over and creates a thread death."

The jvm.properties file claims this:

# enable assertions.  disabling this in production will give a modest
# performance benefit (around 5%).
-ea

If assertions incur a "5% penalty" but are not always trapped what value
do
they add?

These are common sentiments about how assert should be used: (not trying
to
make this a this is what the internet says type debate)

http://stackoverflow.com/questions/2758224/what-does-
the-java-assert-keyword-do-and-when-should-it-be-used

"Assertions
<http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.10>
(by
way of the *assert* keyword) were added in Java 1.4. They are used to
verify the correctness of an invariant in the code. They should never be
triggered in production code, and are indicative of a bug or misuse of a
code path. They can be activated at run-time by way of the -eaoption on
the
java command, but are not turned on by default."

http://stackoverflow.com/questions/1957645/when-to-use-
an-assertion-and-when-to-use-an-exception

"An assertion would stop the program from running, but an exception would
let the program continue running."

I look at how Cassandra uses assert and how it manifests in how the code
operates in production. Assert is something like semi-unchecked
exception.
All types of internal Util classes might throw it, downstream code is
essentially unaware and rarely specifically handles it. They do not
always
result in the hard death one would expect from an assert.

I know this is a ballpark type figure, but would "5% performance penalty"
be in the ballpark of a checked exception? Being that they tend to bubble
through things uncaught do they do more danger than good?




Re: Question on assert

2016-09-21 Thread Michael Kjellman
I hate that comment with a passion. Please please please please do yourself a 
favor and *always* run with asserts on. `-ea` for life. In practice I'd be 
surprised if you actually got a reliable 5% performance win and I doubt your 
customers will care about a potential 5% performance win when you've corrupted 
all their data.

best,
kjellman

> On Sep 21, 2016, at 10:21 AM, Edward Capriolo  wrote:
> 
> There are a variety of assert usages in the Cassandra. You can find several
> tickets like mine.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-12643
> 
> https://issues.apache.org/jira/browse/CASSANDRA-11537
> 
> Just to prove that I am not the only one who runs into these:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-12484
> 
> To paraphrase another ticket that I read today and can not find,
> "The problem is X throws Assertion which is not caught by the Exception
> handler and it bubbles over and creates a thread death."
> 
> The jvm.properties file claims this:
> 
> # enable assertions.  disabling this in production will give a modest
> # performance benefit (around 5%).
> -ea
> 
> If assertions incur a "5% penalty" but are not always trapped what value do
> they add?
> 
> These are common sentiments about how assert should be used: (not trying to
> make this a this is what the internet says type debate)
> 
> http://stackoverflow.com/questions/2758224/what-does-the-java-assert-keyword-do-and-when-should-it-be-used
> 
> "Assertions
>  (by
> way of the *assert* keyword) were added in Java 1.4. They are used to
> verify the correctness of an invariant in the code. They should never be
> triggered in production code, and are indicative of a bug or misuse of a
> code path. They can be activated at run-time by way of the -eaoption on the
> java command, but are not turned on by default."
> 
> http://stackoverflow.com/questions/1957645/when-to-use-an-assertion-and-when-to-use-an-exception
> 
> "An assertion would stop the program from running, but an exception would
> let the program continue running."
> 
> I look at how Cassandra uses assert and how it manifests in how the code
> operates in production. Assert is something like semi-unchecked exception.
> All types of internal Util classes might throw it, downstream code is
> essentially unaware and rarely specifically handles it. They do not always
> result in the hard death one would expect from an assert.
> 
> I know this is a ballpark type figure, but would "5% performance penalty"
> be in the ballpark of a checked exception? Being that they tend to bubble
> through things uncaught do they do more danger than good?



Re: Failing tests 2016-08-24 [cassandra-3.9]

2016-08-25 Thread Michael Kjellman
Awesome Joel

Sent from my iPhone

On Aug 24, 2016, at 8:22 PM, Joel Knighton 
mailto:joel.knigh...@datastax.com>> wrote:

===
testall: All passed!

===
dtest: 2 failures
 scrub_test.TestScrubIndexes.test_standalone_scrub
   CASSANDRA-12337. I've root-caused this; the failure is cosmetic
   but user-facing, so I plan on fixing this soon.

 commitlog_test.TestCommitLog.test_commitlog_replay_on_startup
   CASSANDRA-12213. This is still being analyzed.

===
novnode: All passed!

===
upgrade: All passed!

While it is somewhat due to the stars aligning such that our flaky tests
all didn't fail this run, it is very exciting to see an upgrade test run
with 0 failures. This is 50+ fewer failures than two weeks ago.


Re: Failing tests 2016-08-22 [cassandra-3.9]

2016-08-23 Thread Michael Kjellman
Looks like some very nice progress here! Mucho Exciting!! 💃🏻💃🏻💃🏻

On Aug 22, 2016, at 10:44 PM, Joel Knighton 
mailto:joel.knigh...@datastax.com>> wrote:

===
testall: All passed!

===
dtest: 2 failures
 upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test
   Looks like a new, flaky failure. I'll follow up on this and get a ticket
   created tomorrow.

 materialized_views_test.TestMaterializedViews
 .add_dc_after_mv_network_replication_test
   CASSANDRA-12140. Known issue, still needs to be solved.

===
novnode: 6 failures
 6 failures in cql_tests.SlowQueryTester. This was a test regression
 quickly fixed in CASSANDRA-12514.

===
upgrade: 1 failure
 upgrade_tests.cql_tests
 .TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_3_x
 .bug_5732_test
 CASSANDRA-12457. A fix is in development.



Thanks for all the fish.

2016-08-19 Thread Michael Kjellman
Just wanted to say thank you publicly to Jonathan Ellis for his tireless work 
making this community and software what it is. He's always been level headed 
and I certainly wouldn't be where I am without his leadership.

So, Jonathan, thanks for all the fish.

best,
kjellman

Sent from my iPhone


Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Michael Kjellman
+1

Sent from my iPhone

On Aug 15, 2016, at 6:48 PM, Brandon Williams 
mailto:dri...@gmail.com>> wrote:

So will I, if that happens, which has never happened in the last ~7 years.

On Mon, Aug 15, 2016 at 4:27 PM, Jeff Jirsa 
mailto:jeff.ji...@crowdstrike.com>>
wrote:


On 8/15/16, 2:15 PM, "Marvin Humphrey" 
mailto:mar...@apache.org>> wrote:

Julian Hyde, who made the proposal, is active in the Apache Incubator ...
  I propose that when a JIRA is created, we send an email to both dev@
and
  issues@. This will be an extra 40 emails per month on the dev list.
I am
  really cautious about increasing the number of messages on the dev
list,
  because I think high-volume lists discourage part-time contributors,
but I
  think this change is worthwhile. It will make people aware of
  conversations that are happening and if it helps to channel
conversations
  onto JIRA cases it could possibly even REDUCE the volume on the dev
list.


That's a useful example. However, that's a project with 30-40 issues per
month (1300 over its lifetime) - Cassandra is sitting at 244 in the past 30
days, 12000 over its lifetime.

I think a lot of us part-time contributors appreciate efforts to increase
visibility and certainly welcome growing the project by making it easier to
recruit and retain more contributors, but is the noise of 10 more new email
threads per day going to get into the "high volume lists discourage
part-time contributors" range Julian discussed?

I'm a part time contributor. If this list gets ~10 threads per day with
2-3 replies each, I'm going to have to start filtering it out of necessity
(because I can't keep up with that volume).






Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Michael Kjellman
I get 2500+ emails a day and I don't filter dev as I like to stay engaged. If 
this list becomes too noisy everyone will just filter it into a black hole. Sad.

Sent from my iPhone

On Aug 15, 2016, at 3:05 PM, Russell Bradberry 
mailto:rbradbe...@gmail.com>> wrote:

So then what was the point of Ellis’s proposal, and this discussion, if there 
was never a choice in the matter in the first place?


On 8/15/16, 2:03 PM, "Chris Mattmann" 
mailto:mattm...@apache.org>> wrote:

   I’m sorry but you are massively confused if you believe that the ASF mailing 
lists
   aren’t the source of truth. They are. That’s not optional. If you are an ASF 
project,
   mailing lists are the source of truth. Period.

   On 8/15/16, 11:01 AM, "Michael Kjellman" 
mailto:mkjell...@internalcircle.com>> wrote:

   I'm a big fan of mailing lists, but google makes issues very findable 
for new people to the project as JIRA gets indexed. They won't be able to find 
the same thing on an email they didn't get -- because they weren't in the 
project in the first place.

   Mailing lists are good for broad discussion or bringing specific issues 
to the attention of the broader community. It should never be the source of 
truth.

   best,
   kjellman

   Sent from my iPhone

   On Aug 15, 2016, at 2:57 PM, Chris Mattmann 
mailto:mattm...@apache.org><mailto:mattm...@apache.org>> 
wrote:

   Realize it’s not just about committers and PMC members that are *already*
   on the PMC or that are developing the project. It’s about how to engage 
the
   *entire* community including those that are not yet on the committer or
   PMC roster. That is the future (and current) lifeblood of the project. 
The mailing
   list aren’t just an unfortunate necessity of being an Apache project. 
They *are*
   the lifeblood of the Apache project.



   On 8/15/16, 10:44 AM, "Brandon Williams" 
mailto:dri...@gmail.com><mailto:dri...@gmail.com>> wrote:

  I too, use this method quite a bit, almost every single day.

  On Mon, Aug 15, 2016 at 12:43 PM, Yuki Morishita 
mailto:mor.y...@gmail.com><mailto:mor.y...@gmail.com>> 
wrote:

   As an active committer, the most important thing for me is to be able
   to *look up* design discussion and decision easily later.

   I often look up the git history or CHANGES.txt for changes that I'm
   interested in, then look up JIRA by following JIRA ticket number
   written to the comment or text.
   If we move to dev mailing list, I would request to post permalink to
   that thread posted to JIRA, which I think is just one extra step that
   isn't necessary if we simply use JIRA.

   So, I'm +1 to just post JIRA link to dev list.


   On Mon, Aug 15, 2016 at 12:35 PM, Chris Mattmann 
mailto:mattm...@apache.org><mailto:mattm...@apache.org>>
   wrote:
   This is a good outward flow of info to the dev list. However, there
   needs to be
   inward flow too – having the convo on the dev list will be a good start
   to that.
   I hope to see more inclusivity here.



   On 8/15/16, 10:26 AM, "Aleksey Yeschenko" 
mailto:alek...@apache.org><mailto:alek...@apache.org>> 
wrote:

  Well, if you read carefully what Jeremiah and I have just proposed,
   it wouldn’t be an issue.

  The notable major changes would start off on dev@ (think, a
   summary, a link to the JIRA, and maybe an attached spec doc).

  No need to follow the JIRA feed. Watch dev@ for those announcements
   and start watching the invidual JIRA tickets if interested.

  This creates the least amount of noise: you miss nothing important,
   and at the same time you won’t be receiving mail from
  dev@ for each individual comment - including those on proposals you
   don’t care about.

  We aren’t doing it currently, but we could, and probably should.

  --
  AY

  On 15 August 2016 at 18:22:36, Chris Mattmann 
(mattm...@apache.org<mailto:mattm...@apache.org><mailto:mattm...@apache.org>)
   wrote:

  Discussion belongs on the dev list. Putting discussion in JIRA, is
   fine, but realize,
  there is a lot of noise in that signal and people may or may not be
   watching
  the JIRA list. In fact, I don’t see JIRA sent to the dev list at all
   so you are basically
  forking the conversation to a high noise list by putting it all in
   JIRA.





  On 8/15/16, 10:11 AM, "Aleksey Yeschenko" 
mailto:alek...@apache.org><mailto:alek...@apache.org>>
   wrote:

  I too feel like it would be sufficient to announce those major JIRAs
   on the dev@ list, but keep all discussion

Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Michael Kjellman
I'm a big fan of mailing lists, but google makes issues very findable for new 
people to the project as JIRA gets indexed. They won't be able to find the same 
thing on an email they didn't get -- because they weren't in the project in the 
first place.

Mailing lists are good for broad discussion or bringing specific issues to the 
attention of the broader community. It should never be the source of truth.

best,
kjellman

Sent from my iPhone

On Aug 15, 2016, at 2:57 PM, Chris Mattmann 
mailto:mattm...@apache.org>> wrote:

Realize it’s not just about committers and PMC members that are *already*
on the PMC or that are developing the project. It’s about how to engage the
*entire* community including those that are not yet on the committer or
PMC roster. That is the future (and current) lifeblood of the project. The 
mailing
list aren’t just an unfortunate necessity of being an Apache project. They *are*
the lifeblood of the Apache project.



On 8/15/16, 10:44 AM, "Brandon Williams" 
mailto:dri...@gmail.com>> wrote:

   I too, use this method quite a bit, almost every single day.

   On Mon, Aug 15, 2016 at 12:43 PM, Yuki Morishita 
mailto:mor.y...@gmail.com>> wrote:

As an active committer, the most important thing for me is to be able
to *look up* design discussion and decision easily later.

I often look up the git history or CHANGES.txt for changes that I'm
interested in, then look up JIRA by following JIRA ticket number
written to the comment or text.
If we move to dev mailing list, I would request to post permalink to
that thread posted to JIRA, which I think is just one extra step that
isn't necessary if we simply use JIRA.

So, I'm +1 to just post JIRA link to dev list.


On Mon, Aug 15, 2016 at 12:35 PM, Chris Mattmann 
mailto:mattm...@apache.org>>
wrote:
This is a good outward flow of info to the dev list. However, there
needs to be
inward flow too – having the convo on the dev list will be a good start
to that.
I hope to see more inclusivity here.



On 8/15/16, 10:26 AM, "Aleksey Yeschenko" 
mailto:alek...@apache.org>> wrote:

   Well, if you read carefully what Jeremiah and I have just proposed,
it wouldn’t be an issue.

   The notable major changes would start off on dev@ (think, a
summary, a link to the JIRA, and maybe an attached spec doc).

   No need to follow the JIRA feed. Watch dev@ for those announcements
and start watching the invidual JIRA tickets if interested.

   This creates the least amount of noise: you miss nothing important,
and at the same time you won’t be receiving mail from
   dev@ for each individual comment - including those on proposals you
don’t care about.

   We aren’t doing it currently, but we could, and probably should.

   --
   AY

   On 15 August 2016 at 18:22:36, Chris Mattmann 
(mattm...@apache.org)
wrote:

   Discussion belongs on the dev list. Putting discussion in JIRA, is
fine, but realize,
   there is a lot of noise in that signal and people may or may not be
watching
   the JIRA list. In fact, I don’t see JIRA sent to the dev list at all
so you are basically
   forking the conversation to a high noise list by putting it all in
JIRA.





   On 8/15/16, 10:11 AM, "Aleksey Yeschenko" 
mailto:alek...@apache.org>>
wrote:

   I too feel like it would be sufficient to announce those major JIRAs
on the dev@ list, but keep all discussion itself to JIRA, where it
belongs.

   You don’t need to follow every ticket this way, just subscribe to
dev@ and then start watching the select major JIRAs you care about.

   --
   AY

   On 15 August 2016 at 18:08:20, Jeremiah D Jordan (
jeremiah.jor...@gmail.com) wrote:

   I like keeping things in JIRA because then everything is in one
place, and it is easy to refer someone to it in the future.
   But I agree that JIRA tickets with a bunch of design discussion and
POC’s and such in them can get pretty long and convoluted.

   I don’t really like the idea of moving all of that discussion to
email which makes it has harder to point someone to it. Maybe a better idea
would be to have a “design/POC” JIRA and an “implementation” JIRA. That way
we could still keep things in JIRA, but the final decision would be kept
“clean”.

   Though it would be nice if people would send an email to the dev
list when proposing “design” JIRA’s, as not everyone has time to follow
every JIRA ever made to see that a new design JIRA was created that they
might be interested in participating on.

   My 2c.

   -Jeremiah


On Aug 15, 2016, at 9:22 AM, Jonathan Ellis 
mailto:jbel...@gmail.com>>
wrote:

A long time ago, I was a proponent of keeping most development
discussions
on Jira, where tickets can be self contained and the threadless
nature
helps keep discussions from getting sidetracked.

But Cassandra was a lot smaller then, and as we've grown it has
become
necessary to separate out the signal (discussions of new features
and major
changes) from the noise of routine bug reports.

I

Re: Jira down, again?

2016-06-20 Thread Michael Kjellman
Hm... weird, JIRA isn't working again? S bizarre!! 😂

> On Jun 15, 2016, at 5:38 PM, Michael Kjellman  
> wrote:
> 
> down. again.
> 
>> On Jun 14, 2016, at 11:14 AM, Alex Popescu  wrote:
>> 
>> I've been trying to get to a ticket for the last 2h and I only get service
>> unavailable :-(
>> 
>> On Tue, Jun 14, 2016 at 10:26 AM, Michael Kjellman <
>> mkjell...@internalcircle.com> wrote:
>> 
>>> and, it's down again. :(
>>> 
>>>> On Jun 14, 2016, at 4:48 AM, Dave Brosius  wrote:
>>>> 
>>>> They are aware of these things
>>>> 
>>>> https://twitter.com/infrabot <https://twitter.com/infrabot>
>>>> 
>>>> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
>>>>> Hi to all,
>>>>> at the moment is the same for me. Is there a way to notify to someone
>>> this
>>>>> situation?
>>>>> 
>>>>> Giampaolo
>>>>> 
>>>>> 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi :
>>>>> 
>>>>>> And when it is not down, it is very slow for me.
>>>>>> 
>>>>>> Do others have the same experience?
>>>>>> 
>>>>>> Best Regards
>>>>>> 
>>>>>> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams 
>>>>>> wrote:
>>>>>> 
>>>>>>> Everyone.
>>>>>>> 
>>>>>>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>>>>>>> mkjell...@internalcircle.com> wrote:
>>>>>>> 
>>>>>>>> Seems like Apache Jira is 100% down, again, for like the 500th time
>>> in
>>>>>>> the
>>>>>>>> last 2 months. Just me or everyone?
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Bests,
>> 
>> Alex Popescu | @al3xandru
>> Sen. Product Manager @ DataStax
>> 
>> <http://cassandrasummit.org/Email_Signature>
>> 
>> » DataStax Enterprise - the database for cloud applications. «
> 



Re: Jira down, again?

2016-06-15 Thread Michael Kjellman
down. again.

> On Jun 14, 2016, at 11:14 AM, Alex Popescu  wrote:
> 
> I've been trying to get to a ticket for the last 2h and I only get service
> unavailable :-(
> 
> On Tue, Jun 14, 2016 at 10:26 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> and, it's down again. :(
>> 
>>> On Jun 14, 2016, at 4:48 AM, Dave Brosius  wrote:
>>> 
>>> They are aware of these things
>>> 
>>> https://twitter.com/infrabot <https://twitter.com/infrabot>
>>> 
>>> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
>>>> Hi to all,
>>>> at the moment is the same for me. Is there a way to notify to someone
>> this
>>>> situation?
>>>> 
>>>> Giampaolo
>>>> 
>>>> 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi :
>>>> 
>>>>> And when it is not down, it is very slow for me.
>>>>> 
>>>>> Do others have the same experience?
>>>>> 
>>>>> Best Regards
>>>>> 
>>>>> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams 
>>>>> wrote:
>>>>> 
>>>>>> Everyone.
>>>>>> 
>>>>>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>>>>>> mkjell...@internalcircle.com> wrote:
>>>>>> 
>>>>>>> Seems like Apache Jira is 100% down, again, for like the 500th time
>> in
>>>>>> the
>>>>>>> last 2 months. Just me or everyone?
>>> 
>> 
>> 
> 
> 
> -- 
> Bests,
> 
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
> 
> <http://cassandrasummit.org/Email_Signature>
> 
> » DataStax Enterprise - the database for cloud applications. «



Re: NewBie Question

2016-06-15 Thread Michael Kjellman
This was forwarded to me yesterday... a helpful first step 
https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md

> On Jun 15, 2016, at 9:54 AM, Jonathan Haddad  wrote:
> 
> Maybe some brave soul will document the 3.0 on disk format as part of
> https://issues.apache.org/jira/browse/CASSANDRA-8700.
> 
> On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford 
> wrote:
> 
>> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
>> engine.
>> 
>> 
>> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
>> 
>> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey 
>> wrote:
>> 
 http://wiki.apache.org/cassandra/ArchitectureSSTable
>>> 
>>> Be aware that this page hasn't been updated since 2013, so it doesn't
>>> reflect any changes to the SSTable format since then, including the
>>> new storage engine introduced in 3.0 (see CASSANDRA-8099).
>>> 
>>> That said, I believe the linked Apache wiki page is the best
>>> documentation for the format. Unfortunately, if you want a better or
>>> more current understanding, you'll have to read the code and read some
>>> SSTables.
>>> 
>> 



Re: Jira down, again?

2016-06-14 Thread Michael Kjellman
and, it's down again. :(

> On Jun 14, 2016, at 4:48 AM, Dave Brosius  wrote:
> 
> They are aware of these things
> 
> https://twitter.com/infrabot <https://twitter.com/infrabot>
> 
> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
>> Hi to all,
>> at the moment is the same for me. Is there a way to notify to someone this
>> situation?
>> 
>> Giampaolo
>> 
>> 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi :
>> 
>>> And when it is not down, it is very slow for me.
>>> 
>>> Do others have the same experience?
>>> 
>>> Best Regards
>>> 
>>> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams 
>>> wrote:
>>> 
>>>> Everyone.
>>>> 
>>>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>>>> mkjell...@internalcircle.com> wrote:
>>>> 
>>>>> Seems like Apache Jira is 100% down, again, for like the 500th time in
>>>> the
>>>>> last 2 months. Just me or everyone?
> 



Jira down, again?

2016-06-13 Thread Michael Kjellman
Seems like Apache Jira is 100% down, again, for like the 500th time in the last 
2 months. Just me or everyone?

Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Michael Kjellman
Bhuvan,

You didn't disrespect anyone, so please don't apologize! Appreciate your 
positive and helpful comment for the OP :) 

best,
kjellman

> On Jun 13, 2016, at 8:50 AM, Bhuvan Rawal  wrote:
> 
> Hi Matt,
> 
> I suggested the resources keeping in mind the ease with which one can
> learn. My idea was not to disrespect Apache or community in any form, it
> was just to facilitate learning of a Newbie.
> While having a good wiki would be amazing and I believe we all agree on
> this Thread that current Documentation has a lot of scope for improvement.
> And I'm completely willing to contribute in whatever way possible to the
> docs and getting it reviewed.
> 
> Best Regards,
> Bhuvan
> 
> On Mon, Jun 13, 2016 at 8:17 PM, Eric Evans 
> wrote:
> 
>> On Mon, Jun 13, 2016 at 8:05 AM, Mattmann, Chris A (3980)
>>  wrote:
>>> However also see that besides the current documentation, there needs to
>> be
>>> a roadmap for making Apache Cassandra and *its* documentation (not
>> *DataStax’s*)
>>> up to par for a basic user to build, deploy and run Cassandra. I don’t
>> think that’s
>>> the current case, is it?
>> 
>> There is CASSANDRA-8700
>> (https://issues.apache.org/jira/browse/CASSANDRA-8700), which is a
>> step in this direction I hope.
>> 
>> One concern I do have though is that changing the tech used to
>> author/publish documentation won't in itself be enough to get good
>> docs.  In fact, moving the docs in-tree raises the barrier to
>> contribution in the sense that instead of mashing 'Edit', you have to
>> put together a patch and have it reviewed.
>> 
>> That said, I also think that we've historically set the bar way too
>> high to committer/PMC, and that this may be an opportunity to change
>> that; There ought to be a path to the PMC for documentation authors
>> and translators (and this is typical in other projects).  So, I will
>> personally do my best to set aside some time each week to review and
>> merge documentation changes, and to champion regular doc contributors
>> for committership.  Hopefully there are others willing to do the same!
>> 
>> 
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>> 



Re: Java Driver 3.0 for Apache Cassandra - Documentation Outdated?

2016-06-06 Thread Michael Kjellman
I think it comes down to having full time tech writers employed and paid. If 
Datastax has the $$ to provide a significant benefit to the community (well 
thought out documentation) that's better than little or no documentation (if it 
was only done via developers who most likely won't document or do a poor job at 
documentation).

Having some documentation is much better for the community than the alternative 
that "the code is the documentation".

Nothing is free.

On Jun 6, 2016, at 4:50 PM, Chris Mattmann 
mailto:mattm...@apache.org>> wrote:

Excellent, why am I the first person to ask that, and why didn’t
a PMC member point that out right away and why did it take me asking
to point to the Apache docs.

This is what I am talking about in terms of the Apache community..





On 6/6/16, 4:47 PM, "Michael Kjellman" 
mailto:mkjell...@internalcircle.com>> wrote:

http://cassandra.apache.org/doc/cql3/CQL.html

On Jun 6, 2016, at 4:42 PM, Mattmann, Chris A (3980) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:

Hi,

So, the core documentation for a key part of Cassandra is hosted
at DataStax?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov<mailto:chris.a.mattm...@nasa.gov>
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/6/16, 7:32 AM, "Mahdi Mohammadi" 
mailto:mah...@gmail.com>> wrote:

Team,

I was checking the documentation for TupleType in DataStax docs here
<https://docs.datastax.com/en/latest-java-driver/java-driver/reference/tupleTypes.html>
and
the code example was like this:

TupleType theType = TupleType.of(DataType.cint(), DataType.text(),
DataType.cfloat());


But in the code, the *TupleType.of* has two additional parameters not
mentioned in the documentation:


*public static TupleType of(ProtocolVersion protocolVersion, CodecRegistry
codecRegistry, DataType... types)*

Maybe I am looking in the wrong place. Could someone please explain how can
I instantiate a *TupleType*?

I have the same question for *Map* type.

Thanks for your help.

===
Best Regards





Re: Java Driver 3.0 for Apache Cassandra - Documentation Outdated?

2016-06-06 Thread Michael Kjellman
http://cassandra.apache.org/doc/cql3/CQL.html

On Jun 6, 2016, at 4:42 PM, Mattmann, Chris A (3980) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:

Hi,

So, the core documentation for a key part of Cassandra is hosted
at DataStax?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/6/16, 7:32 AM, "Mahdi Mohammadi" 
mailto:mah...@gmail.com>> wrote:

Team,

I was checking the documentation for TupleType in DataStax docs here

and
the code example was like this:

TupleType theType = TupleType.of(DataType.cint(), DataType.text(),
DataType.cfloat());


But in the code, the *TupleType.of* has two additional parameters not
mentioned in the documentation:


*public static TupleType of(ProtocolVersion protocolVersion, CodecRegistry
codecRegistry, DataType... types)*

Maybe I am looking in the wrong place. Could someone please explain how can
I instantiate a *TupleType*?

I have the same question for *Map* type.

Thanks for your help.

===
Best Regards



Re: Cassandra Java Driver and DataStax

2016-06-04 Thread Michael Kjellman
No need to argue your point to me anymore. I've already tuned you out.

These are good people who I consider my friends and insulting people just shows 
your arguments really have no merit. 

Good luck with your new driver contribution! I look forward to reviewing the 
code. 

Sent from my iPhone

> On Jun 4, 2016, at 10:10 AM, James Carman  wrote:
> 
> I apologized else-thread about that one.  It was a low blow.  Anyway, to
> answer your question. The Cassandra community wins!  How do we know if they
> won't make you pay for the driver in the future (after all your code is
> written against it)?  It has happened before.  Also, the rest of the
> community can have a say in the direction (because that's the Apache Way).
> The driver can be more intimate with the database, because it's the same
> people developing it.
> 
>> On Sat, Jun 4, 2016 at 1:06 PM Aleksey Yeschenko  wrote:
>> 
>> An eloquent and powerful response, but please, reply to my points instead
>> of resorting to ad hominem arguments.
>> 
>> In practical terms, who would benefit from such a merge, and who is
>> suffering from the current state of affairs?
>> 
>> --
>> AY
>> 
>> On 4 June 2016 at 18:03:05, James Carman (ja...@carmanconsulting.com)
>> wrote:
>> 
>> "Sr. Software Engineer at DataStax", imagine that.
>> 
>> On Sat, Jun 4, 2016 at 1:01 PM Aleksey Yeschenko 
>> wrote:
>> 
>>> As a member of that governing body (Cassandra PMC), I would much prefer
>>> not to deal with the drivers as well.
>>> 
>>> And I’m just as certain that java-driver - and other driver communities -
>>> would much rather prefer to keep their process and organisation instead
>> of
>>> being forced to conform to ours.
>>> 
>>> I’m finding it hard to see a single party that would benefit from such a
>>> merge, and who suffers from the current state of things.
>>> 
>>> --
>>> AY
>>> 
>>> On 4 June 2016 at 17:46:48, James Carman (ja...@carmanconsulting.com)
>>> wrote:
>>> 
>>> How does it add more complexity by having one governing body (the PMC)?
>>> What I am suggesting is that the driver project be somewhat of a
>> subproject
>>> or a "module". It can still have its own life cycle, just like it does
>> now.
>>> 
>>> On Sat, Jun 4, 2016 at 12:44 PM Nate McCall 
>>> wrote:
>>> 
 It doesnt. But then we add complexity in communicating and managing
 versions, releases, etc. to the project. Again, from my experience with
 hector, I just didnt want the hassle of owning that within the project
 confines.
 
 On Sat, Jun 4, 2016 at 11:30 AM, James Carman <
>>> ja...@carmanconsulting.com>
 wrote:
 
> Who said the driver has to be released with the database?
> 
> On Sat, Jun 4, 2016 at 12:29 PM Nate McCall 
> wrote:
> 
>> On Sat, Jun 4, 2016 at 10:05 AM, James Carman <
> ja...@carmanconsulting.com>
>> wrote:
>> 
>>> So why not just donate the Java driver and keep that in house?
> Cassandra
>> is
>>> a Java project. Makes sense to me.
>>> 
>>> 
>> I won't deny there is an argument to be made here, but as a former
 client
>> maintainer (Hector), current ASF committer (Usergrid) and active
> community
>> member since late 2009, my opinion is that this would be a step
> backwards.
>> 
>> Maintaining Hector independently allowed me the freedom to release
 major
>> features with technology that I wanted to use while maintaining
 backwards
>> compatibility without having to be bound to the project's release
>>> cycle
> and
>> process. (And to use a build system that didnt suck).
>> 
>> The initial concern of the use of the word "controls" is *super*
>> not
 cool
>> and I hope that this is being fixed. That said, the reality, from
>> my
>> (external to DataStax) perspective, is that this is not the case. I
 like
>> the current project separation the way it is and don't feel like
>>> there
 is
>> any attempt at "control" of the java driver's direction and
 development.
>> 
>> -Nate
>> 
> 
 
 
 
 --
 -
 Nate McCall
 Austin, TX
 @zznate
 
 CTO
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
>>> 
>> 


Re: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot

2016-05-08 Thread Michael Kjellman
I'd recommend you create a JIRA! That way you can get some traction on the 
issue. Obviously an OOM is never correct, even if your process is wrong in some 
way!

Best,
kjellman 

Sent from my iPhone

> On May 8, 2016, at 8:48 PM, Michael Fong  
> wrote:
> 
> Hi, all,
> 
> 
> Haven't heard any responses so far, and this isue has troubled us for quite 
> some time. Here is another update:
> 
> We have noticed several times that The schema version may change after 
> migration and reboot:
> 
> Here is the scenario:
> 
> 1.   Two node cluster (1 & 2).
> 
> 2.   There are some schema changes, i.e. create a few new columnfamily. 
> The cluster will wait until both nodes have schema version in sync (describe 
> cluster) before moving on.
> 
> 3.   Right before node2 is rebooted, the schema version is consistent; 
> however, after ndoe2 reboots and starts servicing, the MigrationManager would 
> gossip different schema version.
> 
> 4.   Afterwards, both nodes starts exchanging schema  message 
> indefinitely until one of the node dies.
> 
> We currently suspect the change of schema is due to replying the old entry in 
> commit log. We wish to continue dig further, but need experts help on this.
> 
> I don't know if anyone has seen this before, or if there is anything wrong 
> with our migration flow though..
> 
> Thanks in advance.
> 
> Best regards,
> 
> 
> Michael Fong
> 
> From: Michael Fong [mailto:michael.f...@ruckuswireless.com]
> Sent: Thursday, April 21, 2016 6:41 PM
> To: u...@cassandra.apache.org; dev@cassandra.apache.org
> Subject: RE: Cassandra 2.0.x OOM during bootstrap
> 
> Hi, all,
> 
> Here is some more information on before the OOM happened on the rebooted node 
> in a 2-node test cluster:
> 
> 
> 1.   It seems the schema version has changed on the rebooted node after 
> reboot, i.e.
> Before reboot,
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> 
> After rebooting node 2,
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) 
> Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> 
> 
> 
> 2.   After reboot, both nods repeatedly send MigrationTask to each other 
> - we suspect it is related to the schema version (Digest) mismatch after Node 
> 2 rebooted:
> The node2  keeps submitting the migration task over 100+ times to the other 
> node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node 
> /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> INFO [GossipStage:1] 2016-04-19 11:18:18,263 StorageService.java (line 1544) 
> Node /192.168.88.33 state jump to normal
> INFO [GossipStage:1] 2016-04-19 11:18:18,264 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
> Can't send schema pull request: node /192.168.88.33 is down.
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
> Can't send schema pull request: node /192.168.88.33 is down.
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33
> INFO [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33
> INFO [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33
> DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33
> INFO [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP
> DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,356 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33
> .
> 
> 
> On the otherhand, Node 1 keeps updating its gossip information, followed by 
> receiving and submitting migrationTask afterwards:
> DEBUG [R

Re: [Proposal] Mandatory comments

2016-05-05 Thread Michael Kjellman
My vote is to start with BigTableScanner (SSTableScanner).. 5 iterators that 
all do something different with each other depending on how used with zero 
comments -- in a critical code path. What could go wrong!

> On May 5, 2016, at 11:26 AM, Dave Brosius  wrote:
> 
> A less controversial tact would be to actively solicit input from 
> contributors, etc, about what methods/classes are confusing, and put those 
> classes/methods on a priority list for adding good javadoc. When that list 
> goes to ~0, you've probably done enough.
> 
> The key tho is to actively solicit, and make it easy to do so. It's important 
> to differentiate the list being 0 because you've done a good job, and 0 
> because people didn't know about it, or was to difficult to ask for.
> 
> --dave
> 
> ---
> 
> 
> On 2016-05-05 11:46, Jack Krupansky wrote:
>> FWIW, I recently wrote up a bunch of notes on Code Quality and published
>> them on Medium. There are notes on comments and consistency and boilerplate
>> buried in there.
>> WARNING: There's a lot of stuff there and it is not for the  faint of heart
>> or those not truly committed to code quality.
>> tl;dr - I'm not a fan of boiler plate just to say you did something, but...
>> I am a fan of consistency, but that doesn't mean every situation is the
>> same, just that similar situations should be treated similarly - unless
>> there is some reasonable reason to do otherwise.
>> See:
>> https://medium.com/@jackkrupansky/code-quality-preamble-932626a3131c#.ynrjbryus
>> https://medium.com/@jackkrupansky/software-and-product-quality-notes-no-1-346ab1d8df24#.xzg1ihuxb
>> https://medium.com/@jackkrupansky/code-quality-notes-no-1-4dc522a5e29c#.cm7tan2zu
>> https://medium.com/@jackkrupansky/code-quality-notes-no-2-7939377b73c6#.zco8oq3dj
>> -- Jack Krupansky
>> On Thu, May 5, 2016 at 10:55 AM, Eric Evans 
>> wrote:
>>> On Wed, May 4, 2016 at 12:14 PM, Jonathan Ellis  wrote:
>>> > On Wed, May 4, 2016 at 2:27 AM, Sylvain Lebresne 
>>> > wrote:
>>> >
>>> >> On Tue, May 3, 2016 at 6:57 PM, Eric Evans 
>>> >> wrote:
>>> >>
>>> >> > On Mon, May 2, 2016 at 11:26 AM, Sylvain Lebresne <
>>> sylv...@datastax.com>
>>> >> > wrote:
>>> >> > > Looking forward to other's opinions and feedbacks on this proposal.
>>> >> >
>>> >> > We might want to leave just a little wiggle room for judgment on the
>>> >> > part of the reviewer, for the very simple cases.  Documenting
>>> >> > something like setFoo(int) with "Sets foo" can get pretty tiresome for
>>> >> > everyone, and doesn't add any value.
>>> >> >
>>> >>
>>> >> I knew someone was going to bring this :). In principle, I don't really
>>> >> disagree. In practice though,
>>> >> I suspect it's sometimes just easier to adhere to such simple rule
>>> somewhat
>>> >> strictly. In particular,
>>> >> I can guarantee that we don't all agree where the border lies between
>>> what
>>> >> warrants a javadoc
>>> >> and what doesn't. Sure, there is a few cases where you're just
>>> paraphrasing
>>> >> the method name
>>> >> (and while it might often be the case for getters and setters, it's
>>> worth
>>> >> noting that we don't really
>>> >> do much of those in C*), but how hard is it to write a one line comment?
>>> >> Surely that's a negligeable
>>> >> part of writing a patch and we're not that lazy.
>>> >>
>>> >
>>> > I'm more concerned that this kind of boilerplate commenting obscures
>>> rather
>>> > than clarifies.  When I'm reading code i look for comments to help me
>>> > understand key points, points that aren't self-evident.  If we institute
>>> a
>>> > boilerplate "comment everything" rule then I lose that signpost.
>>> This.
>>> Additionally you could also probably argue that it obscures the true
>>> purpose to leaving a comment; It becomes a check box to tick, having
>>> some javadoc attached to every method, rather than genuinely looking
>>> for the value that could be added with quality comments (or even
>>> altering the approach so that the code is more obvious in the absence
>>> of them).
>>> The reason I suggested "wiggle room", is that I think everyone
>>> basically agrees that the default should be to leave good comments
>>> (and that that hasn't been the case), that we should start making this
>>> a requirement to successful review, and that we can afford to leave
>>> some room for judgment on the part of the reviewer.  Worse-case is
>>> that we find in doing so that there isn't much common ground on what
>>> constitutes a quality comment versus useless boilerplate, and that we
>>> have to remove any wiggle room and make it 100% mandatory (I don't
>>> think that will (has to) be the case, though).
>>> --
>>> Eric Evans
>>> john.eric.ev...@gmail.com



Re: Criteria for upgrading to 3.x releases in PROD

2016-04-18 Thread Michael Kjellman
This is best for the users list. Test the releases yourself and then decide 
when it's ready for your use case, ops team, and organization. This is a 
personal decision and not one for *thousands* of others on this mailing list to 
make for you.

best,
kjellman

> On Apr 18, 2016, at 10:54 AM, Anuj Wadehra  
> wrote:
> 
> Hi All,
> For last several months, the "most stable version" question pops up on the 
> user mailing list and then people get all sorts of responses/suggestions..
> If you are conservative go for x if adventurous y..
> If you have good risk appetite go for x else y..
> If you want features go for x else y..
> 
> Unfortunately, all above responses dont help many users..but only reinforce 
> the low confidence in latest releases.Who wants to be adventurous in 
> Production? Who wants to test his risk appetite in Production? And who would 
> want features for stability in Production? Not many..I am sure.
> So my question is:
> Would it be a wise decision to mention the "most stable/production ready" 
> version (as it used to be before 3.x) on the Apache website till tick-tock 
> release strategy evolves and matures?
>  That will somewhat contradict the tick-tock philosphy of stable odd releases 
> but would be more realistic as every big change needs time to stabilise. Its 
> slightly unfair, if users are kept in confused state till the strategy 
> matures and starts delivering solid stable builds.
> I think the question is more appropriate in dev list so I have kept it here.
> ThanksAnuj
> Sent from Yahoo Mail on Android 
> 
>  On Mon, 11 Apr, 2016 at 11:39 PM, Aleksey Yeschenko 
> wrote:   The answer will depend on how conservative you are.
> 
> The most conservative choice overall would be to go with the 2.2.x line.
> 
> 3.0.x if you want to the new nice and shiny 3.0 things, but can tolerate some 
> risk (the branch has a lot of relatively new core code, and hasn’t yet been 
> tried out by as many users as the 2.x branch had).
> 
> The latest odd 3.x if you want the shiniest (3.5 to be released soon, with 
> features like the new SASI secondary indexes support). Also, there hasn’t yet 
> been that much divergence between 3.0.x and 3.x, so risk levels are around 
> the same, so long as you limit yourself to only the features present in 3.0.x.
> 
> Either way, make sure to properly test whatever release you go for in staging 
> first, as Michael says, and you’ll be alright.
> 
> -- 
> AY
> 
> On 11 April 2016 at 18:42:31, Anuj Wadehra (anujw_2...@yahoo.co.in.invalid) 
> wrote:
> 
> Can someone help me with this one?  
> ThanksAnuj  
> 
> Sent from Yahoo Mail on Android  
> 
> On Sun, 10 Apr, 2016 at 11:07 PM, Anuj Wadehra wrote: 
> Hi,  
> Tick-Tock release strategy in 3.x was a good intiative to ensure frequent & 
> stable releases. While odd releases are supposed to get all the bug fixes and 
> should be most stable, many people like me, who got used to the comforting 
> "production ready/stable" tag on Apache website,  are still reluctant to take 
> latest 3.x odd releases into production. I think the hesitation is somewhat 
> justified as processes often take time to mature.  
> So here I would like to ask the experts, people who know the ground 
> situation, people who actively develop it and manage it. Considering the 
> current scenario, What should be a resonable criteria for taking 3.x releases 
> in production?   
> 
> 
> ThanksAnuj  
> 
> 
> 
> 
> 



Re: A short guid on how to contribute patches to Cassandra

2016-02-09 Thread Michael Kjellman
Move to Wiki?

Sent from my iPhone

> On Feb 9, 2016, at 5:59 AM, Aleksey Yeschenko  wrote:
> 
> Hello everyone,
> 
> I’ve compiled a short guide for contributors (who aren’t committers yet) 
> about how to properly contribute Cassandra patches:
> 
> https://docs.google.com/document/d/1d_AzYQo74de9utbbpyXxW2w-b0__sFhC7b24ibiJqjw/edit?usp=sharing
> 
> Following the outlined recommendations make the lives of committers much 
> easier, without adding much hassle to contributor process.
> 
> Follow the steps and feel the love.
> 
> -- 
> AY


Re: Versioning policy?

2016-01-16 Thread Michael Kjellman
Correct, this is an open source project. 

If you want a Enterprise support story Datastax has an Enterprise option for 
you. 

> On Jan 16, 2016, at 11:19 AM, Anuj Wadehra  wrote:
> 
> Hi Jonathan
> 
> It would be really nice if you could share your thoughts on the four points 
> raised regarding the Cassandra EOL process. I think similar things happen for 
> other open source products and it would be really nice if we could streamline 
> such things for Apache Cassandra.
> 
> ThanksAnuj
> 
> Sent from Yahoo Mail on Android 
> 
>  On Thu, 14 Jan, 2016 at 11:28 pm, Anuj Wadehra 
> wrote:   Hi Jonathan,
> Thanks for the crisp communication regarding the tick tock release & EOL.
> I think its worth considering some points regarding EOL policy and it would 
> be great if you can share your thoughts on below points:
> 1.  EOL of a release should be based on "most stable"/"production ready" 
> version date rather than "GA" date of subsequent major releases.
> 2.  I think we should have "Formal EOL Announcement" on Apache Cassandra 
> website.  
> 3. "Formal EOL Announcement" should come at least 6 months before the EOL, so 
> that users get reasonable time to  upgrade.
> 4. EOL Policy (even if flexible) should be stated on Apache Cassandra website
> 
> EOL thread on users mailing list ended with the conclusion of raising a 
> Wishlist JIRA but I think above points are more about working on policy and 
> processes rather than just a wish list. 
> 
> ThanksAnuj
> 
> 
> 
> Sent from Yahoo Mail on Android 
> 
>   On Thu, 14 Jan, 2016 at 10:57 pm, Jonathan Ellis wrote:  
> Hi Maciek,
> 
> First let's talk about the tick-tock series, currently 3.x.  This is pretty
> simple: outside of the regular monthly releases, we will release fixes for
> critical bugs against the most recent bugfix release, the way we did
> recently with 3.1.1 for CASSANDRA-10822 [1].  No older tick-tock releases
> will be patched.
> 
> Now, we also have three other release series currently being supported:
> 
> 2.1.x: supported with critical fixes only until 4.0 is released, projected
> in November 2016 [2]
> 2.2.x: maintained until 4.0 is released
> 3.0.x: maintained for 6 months after 4.0, i.e. projected until May 2017
> 
> I will add this information to the releases page [3].
> 
> [1]
> https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCAKkz8Q3StqRFHfMgCMRYaaPdg+HE5N5muBtFVt-=v690pzp...@mail.gmail.com%3E
> [2] 4.0 will be an ordinary tick-tock release after 3.11, but we will be
> sunsetting deprecated features like Thrift so bumping the major version
> seems appropriate
> [3] http://cassandra.apache.org/download/
> 
>> On Sun, Jan 10, 2016 at 9:29 PM, Maciek Sakrejda  wrote:
>> 
>> There was a discussion recently about changing the Cassandra EOL policy on
>> the users list [1], but it didn't really go anywhere. I wanted to ask here
>> instead to clear up the status quo first. What's the current versioning
>> policy? The tick-tock versioning blog post [2] states in passing that two
>> major releases are maintained, but I have not found this as an official
>> policy stated anywhere. For comparison, the Postgres project lays this out
>> very clearly [3]. To be clear, I'm not looking for any official support,
>> I'm just asking for clarification regarding the maintenance policy: if a
>> critical bug or security vulnerability is found in version X.Y.Z, when can
>> I expect it to be fixed in a bugfix patch to that major version, and when
>> do I need to upgrade to the next major version.
>> 
>> [1]: http://www.mail-archive.com/user@cassandra.apache.org/msg45324.html
>> [2]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
>> [3]: http://www.postgresql.org/support/versioning/
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
> 


Re: new version of Cassandra-stress doesn't support random read benchmark?

2015-05-25 Thread Michael Kjellman
So #cassandra party in LA? Drinks on you? 😅 Sweet!

Sent from my iPhone

> On May 25, 2015, at 2:34 AM, graham sanderson  wrote:
> 
> Hey Benedict;
> 
> I screwed up on email after a bachelor party, and sent something to external 
> cassandra-users not internal users (drunken drivel)
> 
> I never said anything about it because I hoped no one noticed it.
> 
> That said, I was wondering if my data was helpful for your injector post. We 
> haven’t played with it yet, but I passed it on to some other Austin companies 
> who probably have a need.
> 
> Currently on hold with Orbitz - wtf - I opted to be called back because the 
> wait time was 40 mins; they called me back and I’ve still been on hold for 20 
> mins. I have a bunch of friends in LA - computer games/movies … my flight 
> back was routed by LA, so I figure I could leave Thursday night instead of 
> Friday morning and catch up with them all
> 
> 
>> On May 23, 2015, at 2:13 AM, Benedict Elliott Smith 
>>  wrote:
>> 
>> Hi Min,
>> 
>> The key selection occurs prior to this. The operation has been assigned one
>> (or more, in the case of user profile operations) partition keys, and this
>> is just it accessing that key. You should explore backwards for assignment
>> operations, and see where these happen, to understand how this behaves.
>>> On 23 May 2015 01:30, "Min Zhou"  wrote:
>>> 
>>> Hi all,
>>> 
>>> 
>>> 
>>> Seems there is only one implementation of the getKey() , it's
>>> in PredefinedOperation.java  cassandra branch 2.2.0-beta1
>>> 
>>> 
>>>   protected ByteBuffer getKey()
>>>   {
>>>   return (ByteBuffer) partitions.get(0).getPartitionKey(0);
>>>   }
>>> 
>>> The read operations will just the same key for each iteration, since it
>>> will lead 100% cache hit on the storage side, the result throughput will be
>>> very high.
>>> 
>>> please correct me if i was wrong.
>>> 
>>> 
>>> Min
> 


Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Michael Kjellman
Last I checked — and I could be wrong — we’ve never had to think about what to 
number a Cassandra version due to a ticket that could “impact” our users so 
dramatically due to the scope of the changes from a single ticket. Food for 
thought.

love,
kjellman

> On May 11, 2015, at 2:20 PM, Alex Popescu  wrote:
> 
> On Mon, May 11, 2015 at 2:16 PM, Jonathan Haddad  wrote:
> 
>> I'm not sure if the complications surrounding the versioning of the drivers
>> should be factored into the releases of Cassandra.
> 
> 
> I agree. If we could come up with a versioning scheme that would also work
> for drivers, that would be
> the ideal case as it will prove quite helpful to our users.
> 
> 
>> I think that 3.0
>> signals a massive change and calling the release containing 8099 a .1 would
>> be drastically underplaying how big of a release it is - from the
>> perspective of the end user it would be a disservice.
>> 
>> 
> I see. My last suggestion could work though as it signals both releases
> having significant impact.
> 
> 
> 
>> 
>> On Mon, May 11, 2015 at 2:09 PM Jonathan Ellis  wrote:
>> 
>>> I do like 2.2 and 3.0 over 3.0 and 3.1 because going from 2.x to 3.x
>>> signals that 8099 really is a big change.
>>> 
>>> On Mon, May 11, 2015 at 3:28 PM, Alex Popescu 
>> wrote:
>>> 
 On Sun, May 10, 2015 at 2:14 PM, Robert Stupp  wrote:
 
> Instead of labeling it 2.2, I’d like to propose to label it 3.0 (so
> basically just move 8099 to 3.1).
> In the end it’s ”only a label”. But there are a lot of new
>> user-facing
> features in it that justifies a major release.
> 
 
 +1 on labeling the proposed 2.2 as 3.0 and moving (8099 to 3.1)
 
 1. Tons of new features that feel more than just a 2.2
 2. The majority of features planned for 3.0 are actually ready for this
 version
 3. in order to avoid compatiblity questions (and version compatibility
 matrices), the drivers developed by DataStax have
followed the Cassandra versions so far. The Python and C# drivers
>> are
 already at 2.5 as they added some major features.
 
   Renaming the proposed 2.2 as 3.0 would allow us to continue to use
>>> this
 versioning policy until all drivers are supporting
   the latest Cassandra version and continue to not require a user to
>>> check
 a compatibility matrix.
 
 
 --
 Bests,
 
 Alex Popescu | @al3xandru
 Sen. Product Manager @ DataStax
 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder, http://www.datastax.com
>>> @spyced
>>> 
>> 
> 
> 
> 
> -- 
> Bests,
> 
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax



Re: 3.0 and the Cassandra release process

2015-03-18 Thread Michael Kjellman
For most of my life I’ve lived on the software bleeding edge both personally 
and professionally. Maybe it’s a personal weakness, but I guess I get a thrill 
out of the problem solving aspect?

Recently I came to a bit of an epiphany — the closer I keep to the daily build 
— generally the happier I am on a daily basis. Bugs happen, but for the most 
part (aside from show stopper bugs), pain points for myself in a given daily 
build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 
hours, and then life is better the next day again. In comparison, the old 
waterfall model generally means taking an “official” release at some point and 
waiting for some poor soul (or developer) to actually run the thing. No matter 
how good the QA team is, until it’s actually used in the real world, most bugs 
aren’t found.

If you and your organization can wait 24 hours * number of bugs discovered 
after people actually started using the thing, you end up with a “usable build” 
around the holy-grail minor X.X.5 release of Cassandra.

I love the idea of the LTS model Jonathan describes because it means more code 
can get real testing and “bake” for longer instead of sitting largely unused on 
some git repository in a datacenter far far away. A lot of code has changed 
between 2.0 and trunk today. The code has diverged to the point that if you 
write something for 2.0 (as the most stable major branch currently available), 
merging it forward to 3.0 or after generally means rewriting it. If the only 
thing that comes out of this is a smaller delta of LOC between the deployable 
version/branch and what we can develop against and what QA is focused on I 
think that’s a massive win.

Something like CASSANDRA-8099 will need 2x the baking time of even many of the 
more risky changes the project has made. While I wouldn’t want to run a build 
with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes 
blocked, most likely many containing new bugs of their own, but have no 
exposure at all to even the most involved C* developers.

I really think this will be a huge win for the project and I’m super thankful 
for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a 
much more sustainable release model for the entire community.

best,
kjellman

 
> On Mar 18, 2015, at 3:02 PM, Ariel Weisberg  
> wrote:
> 
> Hi,
> 
> Keep in mind it is a bug fix release every month and a feature release every 
> two months.
> 
> For development that is really a two month cycle with all bug fixes being 
> backported one release. As a developer if you want to get something in a 
> release you have two months and you should be sizing pieces of large tasks so 
> they ship at least every two months.
> 
> Ariel
>> On Mar 18, 2015, at 5:58 PM, Terrance Shepherd  wrote:
>> 
>> I like the idea but I agree that every month is a bit aggressive. I have no
>> say but:
>> 
>> I would say 4 releases a year instead of 12. with 2 months of new features
>> and 1 month of bug squashing per a release. With the 4th quarter just bugs.
>> 
>> I would also proposed 2 year LTS releases for the releases after the 4th
>> quarter. So everyone could get a new feature release every quarter and the
>> stability of super major versions for 2 years.
>> 
>> On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius 
>> wrote:
>> 
>>> It would seem the practical implications of this is that there would be
>>> significantly more development on branches, with potentially more
>>> significant delays on merging these branches. This would imply to me that
>>> more Jenkins servers would need to be set up to handle auto-testing of more
>>> branches, as if feature work spends more time on external branches, it is
>>> then likely to be be less tested (even if by accident) as less developers
>>> would be working on that branch. Only when a feature was blessed to make it
>>> to the release-tracked branch, would it become exposed to the majority of
>>> developers/testers, etc doing normal running/playing/testing.
>>> 
>>> This isn't to knock the idea in anyway, just wanted to mention what i
>>> think the outcome would be.
>>> 
>>> dave
>>> 
>>> 
>>> 
 
>> On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis 
> wrote:
>>> Cassandra 2.1 was released in September, which means that if we were
> on
>>> track with our stated goal of six month releases, 3.0 would be done
> about
>>> now.  Instead, we haven't even delivered a beta.  The immediate cause
>> this
>>> time is blocking for 8099
>>> , but the
> reality
>> is
>>> that nobody should really be surprised.  Something always comes up --
>> we've
>>> averaged about nine months since 1.0, with 2.1 taking an entire year.
>>> 
>>> We could make theory align with reality by acknowledging, "if nine
> months
>>> is our 'natural' release schedule, then so be it."  But I think we
> 

  1   2   >