Re: CASSANDRA-14482

2019-03-01 Thread Jonathan Haddad
I plan on doing some more thorough testing, it’s just a matter of finding
time.


On Fri, Mar 1, 2019 at 7:17 PM Dinesh Joshi 
wrote:

> Thanks for testing this out Jon. Apart from the compression ratio could
> you also test the insertion rate and GC? 14482 allows you to use full range
> of Zstd compression levels.
>
> Dinesh
>
> > On Mar 1, 2019, at 6:41 PM, Jonathan Haddad  wrote:
> >
> > Hey all,
> >
> > I finally got around to doing some testing.  Nothing too crazy, I had it
> > run on my laptop while I did other things around the house.
> >
> > Test 1: Inserting Random Data in a K/V table, 10 million inserts
> >
> > LZ4 compression rate: 0.909857609644112
> > ZStd: 0.6136099401596449
> >
> > Test 2: Inserting fairly compressable text data into K/V table: 20
> million
> > inserts
> > LZ4: 0.6950432247957942
> > ZStd: 0.4797311577845362
> >
> > Definitely worth it at first glance.  More testing to come.
> >
> > Jon
> >
> >
> >
> >
> > On Sun, Feb 17, 2019 at 8:46 PM dinesh.jo...@yahoo.com.INVALID
> >  wrote:
> >
> >> Thanks all for your input. The consensus is to go forward with this
> ticket.
> >> Dinesh
> >>
> >>On Friday, February 15, 2019, 12:54:20 PM PST, Sumanth Pasupuleti <
> >> spasupul...@netflix.com.INVALID> wrote:
> >>
> >> +1
> >>
> >>> On Fri, Feb 15, 2019 at 12:14 PM Dikang Gu  wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Feb 15, 2019 at 10:27 AM Vinay Chella  >
> >>> wrote:
> >>>
>  We have been using Zstd compressor across different products/services
> >>> here
>  and have seen significant improvements, getting this in 4.0 would be a
> >>> big
>  win.
> 
>  +1
> 
>  Thanks,
>  Vinay Chella
> 
> 
> > On Fri, Feb 15, 2019 at 10:19 AM Jeff Jirsa 
> wrote:
> >
> > +1
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Feb 15, 2019, at 9:35 AM, Jonathan Ellis 
> >>> wrote:
> >>
> >> IMO "add a new compression class that has demonstrable benefits to
>  Sushma
> >> and Joseph" is sufficiently noninvasive that we should allow it
> >> into
>  4.0.
> >>
> >> On Fri, Feb 15, 2019 at 10:48 AM Dinesh Joshi
> >>  wrote:
> >>
> >>> Hey folks,
> >>>
> >>> Just wanted to get a pulse on whether we can proceed with ZStd
>  support.
> >>> The consensus on the ticket was that it’s a very valuable addition
> > without
> >>> any risk of destabilizing 4.0. It’s ready to go if there aren’t
> >> any
> >>> objections.
> >>>
> >>> Dinesh
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>>
> >>
> >> --
> >> Jonathan Ellis
> >> co-founder, http://www.datastax.com
> >> @spyced
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> 
> >>>
> >>>
> >>> --
> >>> Dikang
> >>>
> >
> >
> >
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: CASSANDRA-14482

2019-03-01 Thread Dinesh Joshi
Thanks for testing this out Jon. Apart from the compression ratio could you 
also test the insertion rate and GC? 14482 allows you to use full range of Zstd 
compression levels. 

Dinesh

> On Mar 1, 2019, at 6:41 PM, Jonathan Haddad  wrote:
> 
> Hey all,
> 
> I finally got around to doing some testing.  Nothing too crazy, I had it
> run on my laptop while I did other things around the house.
> 
> Test 1: Inserting Random Data in a K/V table, 10 million inserts
> 
> LZ4 compression rate: 0.909857609644112
> ZStd: 0.6136099401596449
> 
> Test 2: Inserting fairly compressable text data into K/V table: 20 million
> inserts
> LZ4: 0.6950432247957942
> ZStd: 0.4797311577845362
> 
> Definitely worth it at first glance.  More testing to come.
> 
> Jon
> 
> 
> 
> 
> On Sun, Feb 17, 2019 at 8:46 PM dinesh.jo...@yahoo.com.INVALID
>  wrote:
> 
>> Thanks all for your input. The consensus is to go forward with this ticket.
>> Dinesh
>> 
>>On Friday, February 15, 2019, 12:54:20 PM PST, Sumanth Pasupuleti <
>> spasupul...@netflix.com.INVALID> wrote:
>> 
>> +1
>> 
>>> On Fri, Feb 15, 2019 at 12:14 PM Dikang Gu  wrote:
>>> 
>>> +1
>>> 
>>> On Fri, Feb 15, 2019 at 10:27 AM Vinay Chella 
>>> wrote:
>>> 
 We have been using Zstd compressor across different products/services
>>> here
 and have seen significant improvements, getting this in 4.0 would be a
>>> big
 win.
 
 +1
 
 Thanks,
 Vinay Chella
 
 
> On Fri, Feb 15, 2019 at 10:19 AM Jeff Jirsa  wrote:
> 
> +1
> 
> --
> Jeff Jirsa
> 
> 
>> On Feb 15, 2019, at 9:35 AM, Jonathan Ellis 
>>> wrote:
>> 
>> IMO "add a new compression class that has demonstrable benefits to
 Sushma
>> and Joseph" is sufficiently noninvasive that we should allow it
>> into
 4.0.
>> 
>> On Fri, Feb 15, 2019 at 10:48 AM Dinesh Joshi
>>  wrote:
>> 
>>> Hey folks,
>>> 
>>> Just wanted to get a pulse on whether we can proceed with ZStd
 support.
>>> The consensus on the ticket was that it’s a very valuable addition
> without
>>> any risk of destabilizing 4.0. It’s ready to go if there aren’t
>> any
>>> objections.
>>> 
>>> Dinesh
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
> 
 
>>> 
>>> 
>>> --
>>> Dikang
>>> 
> 
> 
> 
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-14482

2019-03-01 Thread Jonathan Haddad
Hey all,

I finally got around to doing some testing.  Nothing too crazy, I had it
run on my laptop while I did other things around the house.

Test 1: Inserting Random Data in a K/V table, 10 million inserts

LZ4 compression rate: 0.909857609644112
ZStd: 0.6136099401596449

Test 2: Inserting fairly compressable text data into K/V table: 20 million
inserts
LZ4: 0.6950432247957942
ZStd: 0.4797311577845362

Definitely worth it at first glance.  More testing to come.

Jon




On Sun, Feb 17, 2019 at 8:46 PM dinesh.jo...@yahoo.com.INVALID
 wrote:

> Thanks all for your input. The consensus is to go forward with this ticket.
> Dinesh
>
> On Friday, February 15, 2019, 12:54:20 PM PST, Sumanth Pasupuleti <
> spasupul...@netflix.com.INVALID> wrote:
>
>  +1
>
> On Fri, Feb 15, 2019 at 12:14 PM Dikang Gu  wrote:
>
> > +1
> >
> > On Fri, Feb 15, 2019 at 10:27 AM Vinay Chella 
> > wrote:
> >
> > > We have been using Zstd compressor across different products/services
> > here
> > > and have seen significant improvements, getting this in 4.0 would be a
> > big
> > > win.
> > >
> > > +1
> > >
> > > Thanks,
> > > Vinay Chella
> > >
> > >
> > > On Fri, Feb 15, 2019 at 10:19 AM Jeff Jirsa  wrote:
> > >
> > > > +1
> > > >
> > > > --
> > > > Jeff Jirsa
> > > >
> > > >
> > > > > On Feb 15, 2019, at 9:35 AM, Jonathan Ellis 
> > wrote:
> > > > >
> > > > > IMO "add a new compression class that has demonstrable benefits to
> > > Sushma
> > > > > and Joseph" is sufficiently noninvasive that we should allow it
> into
> > > 4.0.
> > > > >
> > > > > On Fri, Feb 15, 2019 at 10:48 AM Dinesh Joshi
> > > > >  wrote:
> > > > >
> > > > >> Hey folks,
> > > > >>
> > > > >> Just wanted to get a pulse on whether we can proceed with ZStd
> > > support.
> > > > >> The consensus on the ticket was that it’s a very valuable addition
> > > > without
> > > > >> any risk of destabilizing 4.0. It’s ready to go if there aren’t
> any
> > > > >> objections.
> > > > >>
> > > > >> Dinesh
> > > > >>
> > > > >>
> > -
> > > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >>
> > > > >>
> > > > >
> > > > > --
> > > > > Jonathan Ellis
> > > > > co-founder, http://www.datastax.com
> > > > > @spyced
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > >
> >
> >
> > --
> > Dikang
> >



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Audit logging to tables.

2019-03-01 Thread Jonathan Haddad
Instead of logging to tables, putting a virtual table around the audit /
query logs might be an option. Same with the commit log for cdc

On Fri, Mar 1, 2019 at 5:25 PM Sagar  wrote:

> Thanks all for the pointers. Really insightful.
>
> Subroto I think that’s part of the enterprise version but yeah even I have
> seen it. Again not sure of the performance implications.
>
> Sagar.
>
> On Sat, 2 Mar 2019 at 5:15 AM, Subroto Barua 
> wrote:
>
> > Datastax version has an option to store audit info to dse_audit.audit_log
> > table; I do not know the performance impact since I use the file option
> >
> > Subroto
> >
> > > On Mar 1, 2019, at 9:40 AM, Jeremiah D Jordan <
> jeremiah.jor...@gmail.com>
> > wrote:
> > >
> > > AFAIK the Full Query Logging binary format was already made more
> general
> > in order to support using that format for the audit logging.
> > >
> > > -Jeremiah
> > >
> > >> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie 
> > wrote:
> > >>
> > >> Is there a world in which a general purpose, side-channel file storage
> > >> format for transient things like this (hints, batches, audit logs,
> etc)
> > >> could be useful as a first class citizen in the codebase? i.e. a world
> > in
> > >> which we refactored some of the hints-specific reader/writer code to
> be
> > >> used for things like this if/when they come up?
> > >>
> > >>> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad  > > wrote:
> > >>>
> > >>> Agreed with Dinesh and Josh.  I would *never* put the audit log back
> in
> > >>> Cassandra.
> > >>>
> > >>> This is extendable, Sagar, so you're free to do as you want, but I'm
> > very
> > >>> opposed to putting a ticking time bomb in Cassandra proper.
> > >>>
> > >>> Jon
> > >>>
> > >>>
> > >>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi
> > 
> > >>> wrote:
> > >>>
> >  I strongly echo Josh’s sentiment. Imagine losing audit entries
> > because C*
> >  is overloaded? It’s fine if you don’t care about losing audit
> entries.
> > 
> >  Dinesh
> > 
> > > On Feb 28, 2019, at 6:41 AM, Joshua McKenzie  >
> >  wrote:
> > >
> > > One of the things we've run into historically, on a *lot* of axes,
> is
> >  that
> > > "just put it in C*" for various functionality looks great from a
> user
> > >>> and
> > > usability perspective, and proves to be something of a nightmare
> from
> > >>> an
> > > admin / cluster behavior perspective.
> > >
> > > i.e. - cluster suffering so you're writing hints? Write them to C*
> > >>> tables
> > > and watch the cluster suffer more! :)
> > > Same thing probably holds true for audit logging - at a time frame
> > when
> > > things are getting hairy w/a cluster, if you're writing that audit
> >  logging
> > > into C* proper (and dealing with ser/deser, compaction pressure,
> > >>> flushing
> > > pressure, etc) from that, there's a compounding effect of pressure
> > and
> >  pain
> > > on the cluster.
> > >
> > > So the TL;DR we as a project kind of philosophically have been
> moving
> > > towards (I think that's valid to say?) is: use C* for the things
> it's
> > > absolutely great at, and try to side-channel other recovery
> > operations
> > >>> as
> > > much as you can (see: file-based hints) to stay out of its way.
> > >
> > > Same thing held true w/design of CDC - I debated "materialize in
> > memory
> >  for
> > > consumer to take over socket", and "keep the data in another C*
> > table",
> >  but
> > > the ramifications to perf and core I/O operations in C* the moment
> > >>> things
> > > start to go badly were significant enough that the route we went
> was
> > >>> "do
> >  no
> > > harm". For better or for worse, as there's obvious tradeoffs there.
> > >
> > >> On Thu, Feb 28, 2019 at 7:46 AM Sagar 
> >  wrote:
> > >>
> > >> Thanks all for the pointers.
> > >>
> > >> @Joseph,
> > >>
> > >> I have gone through the links shared by you. Also, I have been
> > looking
> >  at
> > >> the code base.
> > >>
> > >> I understand the fact that pushing the logs to ES or Solr is a lot
> >  easier
> > >> to do. Having said that, the only reason I thought having
> something
> > >>> like
> > >> this might help is, if I don't want to add more pieces and still
> >  provide a
> > >> central piece of audit logging within Cassandra itself and still
> be
> > >> queryable.
> > >>
> > >> In terms of usages, one of them could definitely be CDC related
> use
> >  cases.
> > >> With data being stored in tables and being queryable, it can
> become
> > a
> >  lot
> > >> more easier to expose this data to external systems like Kafka
> > >>> Connect,
> > >> Debezium which have the ability to push data to Kafka for example.
> > >>> Note
> > >> that pushing data to Kafka is just an example, but what I mean is,
> > if
> > >>> 

Re: Audit logging to tables.

2019-03-01 Thread Sagar
Thanks all for the pointers. Really insightful.

Subroto I think that’s part of the enterprise version but yeah even I have
seen it. Again not sure of the performance implications.

Sagar.

On Sat, 2 Mar 2019 at 5:15 AM, Subroto Barua 
wrote:

> Datastax version has an option to store audit info to dse_audit.audit_log
> table; I do not know the performance impact since I use the file option
>
> Subroto
>
> > On Mar 1, 2019, at 9:40 AM, Jeremiah D Jordan 
> wrote:
> >
> > AFAIK the Full Query Logging binary format was already made more general
> in order to support using that format for the audit logging.
> >
> > -Jeremiah
> >
> >> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie 
> wrote:
> >>
> >> Is there a world in which a general purpose, side-channel file storage
> >> format for transient things like this (hints, batches, audit logs, etc)
> >> could be useful as a first class citizen in the codebase? i.e. a world
> in
> >> which we refactored some of the hints-specific reader/writer code to be
> >> used for things like this if/when they come up?
> >>
> >>> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad  > wrote:
> >>>
> >>> Agreed with Dinesh and Josh.  I would *never* put the audit log back in
> >>> Cassandra.
> >>>
> >>> This is extendable, Sagar, so you're free to do as you want, but I'm
> very
> >>> opposed to putting a ticking time bomb in Cassandra proper.
> >>>
> >>> Jon
> >>>
> >>>
> >>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi
> 
> >>> wrote:
> >>>
>  I strongly echo Josh’s sentiment. Imagine losing audit entries
> because C*
>  is overloaded? It’s fine if you don’t care about losing audit entries.
> 
>  Dinesh
> 
> > On Feb 28, 2019, at 6:41 AM, Joshua McKenzie 
>  wrote:
> >
> > One of the things we've run into historically, on a *lot* of axes, is
>  that
> > "just put it in C*" for various functionality looks great from a user
> >>> and
> > usability perspective, and proves to be something of a nightmare from
> >>> an
> > admin / cluster behavior perspective.
> >
> > i.e. - cluster suffering so you're writing hints? Write them to C*
> >>> tables
> > and watch the cluster suffer more! :)
> > Same thing probably holds true for audit logging - at a time frame
> when
> > things are getting hairy w/a cluster, if you're writing that audit
>  logging
> > into C* proper (and dealing with ser/deser, compaction pressure,
> >>> flushing
> > pressure, etc) from that, there's a compounding effect of pressure
> and
>  pain
> > on the cluster.
> >
> > So the TL;DR we as a project kind of philosophically have been moving
> > towards (I think that's valid to say?) is: use C* for the things it's
> > absolutely great at, and try to side-channel other recovery
> operations
> >>> as
> > much as you can (see: file-based hints) to stay out of its way.
> >
> > Same thing held true w/design of CDC - I debated "materialize in
> memory
>  for
> > consumer to take over socket", and "keep the data in another C*
> table",
>  but
> > the ramifications to perf and core I/O operations in C* the moment
> >>> things
> > start to go badly were significant enough that the route we went was
> >>> "do
>  no
> > harm". For better or for worse, as there's obvious tradeoffs there.
> >
> >> On Thu, Feb 28, 2019 at 7:46 AM Sagar 
>  wrote:
> >>
> >> Thanks all for the pointers.
> >>
> >> @Joseph,
> >>
> >> I have gone through the links shared by you. Also, I have been
> looking
>  at
> >> the code base.
> >>
> >> I understand the fact that pushing the logs to ES or Solr is a lot
>  easier
> >> to do. Having said that, the only reason I thought having something
> >>> like
> >> this might help is, if I don't want to add more pieces and still
>  provide a
> >> central piece of audit logging within Cassandra itself and still be
> >> queryable.
> >>
> >> In terms of usages, one of them could definitely be CDC related use
>  cases.
> >> With data being stored in tables and being queryable, it can become
> a
>  lot
> >> more easier to expose this data to external systems like Kafka
> >>> Connect,
> >> Debezium which have the ability to push data to Kafka for example.
> >>> Note
> >> that pushing data to Kafka is just an example, but what I mean is,
> if
> >>> we
> >> can have data in tables, then instead of everyone writing custom
> >>> custom
> >> loggers, they can hook into this table info and take action.
> >>
> >> Regarding the infinite loop question, I have done some analysis, and
> >>> in
>  my
> >> opinion, instead of tweaking the behaviour of Binlog and the way it
> >> functions currently, we can actually spin up another tailer thread
> to
>  the
> >> same Chronicle Queue which can do the needful. This way the config

Re: Audit logging to tables.

2019-03-01 Thread Subroto Barua
Datastax version has an option to store audit info to dse_audit.audit_log 
table; I do not know the performance impact since I use the file option 

Subroto 

> On Mar 1, 2019, at 9:40 AM, Jeremiah D Jordan  
> wrote:
> 
> AFAIK the Full Query Logging binary format was already made more general in 
> order to support using that format for the audit logging.
> 
> -Jeremiah
> 
>> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie  wrote:
>> 
>> Is there a world in which a general purpose, side-channel file storage
>> format for transient things like this (hints, batches, audit logs, etc)
>> could be useful as a first class citizen in the codebase? i.e. a world in
>> which we refactored some of the hints-specific reader/writer code to be
>> used for things like this if/when they come up?
>> 
>>> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad >> > wrote:
>>> 
>>> Agreed with Dinesh and Josh.  I would *never* put the audit log back in
>>> Cassandra.
>>> 
>>> This is extendable, Sagar, so you're free to do as you want, but I'm very
>>> opposed to putting a ticking time bomb in Cassandra proper.
>>> 
>>> Jon
>>> 
>>> 
>>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi 
>>> wrote:
>>> 
 I strongly echo Josh’s sentiment. Imagine losing audit entries because C*
 is overloaded? It’s fine if you don’t care about losing audit entries.
 
 Dinesh
 
> On Feb 28, 2019, at 6:41 AM, Joshua McKenzie 
 wrote:
> 
> One of the things we've run into historically, on a *lot* of axes, is
 that
> "just put it in C*" for various functionality looks great from a user
>>> and
> usability perspective, and proves to be something of a nightmare from
>>> an
> admin / cluster behavior perspective.
> 
> i.e. - cluster suffering so you're writing hints? Write them to C*
>>> tables
> and watch the cluster suffer more! :)
> Same thing probably holds true for audit logging - at a time frame when
> things are getting hairy w/a cluster, if you're writing that audit
 logging
> into C* proper (and dealing with ser/deser, compaction pressure,
>>> flushing
> pressure, etc) from that, there's a compounding effect of pressure and
 pain
> on the cluster.
> 
> So the TL;DR we as a project kind of philosophically have been moving
> towards (I think that's valid to say?) is: use C* for the things it's
> absolutely great at, and try to side-channel other recovery operations
>>> as
> much as you can (see: file-based hints) to stay out of its way.
> 
> Same thing held true w/design of CDC - I debated "materialize in memory
 for
> consumer to take over socket", and "keep the data in another C* table",
 but
> the ramifications to perf and core I/O operations in C* the moment
>>> things
> start to go badly were significant enough that the route we went was
>>> "do
 no
> harm". For better or for worse, as there's obvious tradeoffs there.
> 
>> On Thu, Feb 28, 2019 at 7:46 AM Sagar 
 wrote:
>> 
>> Thanks all for the pointers.
>> 
>> @Joseph,
>> 
>> I have gone through the links shared by you. Also, I have been looking
 at
>> the code base.
>> 
>> I understand the fact that pushing the logs to ES or Solr is a lot
 easier
>> to do. Having said that, the only reason I thought having something
>>> like
>> this might help is, if I don't want to add more pieces and still
 provide a
>> central piece of audit logging within Cassandra itself and still be
>> queryable.
>> 
>> In terms of usages, one of them could definitely be CDC related use
 cases.
>> With data being stored in tables and being queryable, it can become a
 lot
>> more easier to expose this data to external systems like Kafka
>>> Connect,
>> Debezium which have the ability to push data to Kafka for example.
>>> Note
>> that pushing data to Kafka is just an example, but what I mean is, if
>>> we
>> can have data in tables, then instead of everyone writing custom
>>> custom
>> loggers, they can hook into this table info and take action.
>> 
>> Regarding the infinite loop question, I have done some analysis, and
>>> in
 my
>> opinion, instead of tweaking the behaviour of Binlog and the way it
>> functions currently, we can actually spin up another tailer thread to
 the
>> same Chronicle Queue which can do the needful. This way the config
 options
>> etc all remain the same(apart from the logger ofcourse).
>> 
>> Let me know if any of it makes sense :D
>> 
>> Thanks!
>> Sagar.
>> 
>> 
>> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi
>>>  
>> wrote:
>> 
>>> 
>>> 
 On Feb 27, 2019, at 10:41 AM, Joseph Lynch 
>>> wrote:
 
 Vinay can confirm, but as far as I am aware we have no current plans
 to
 implement audit 

RE: Looking for common cases in cassandra fit to autoheal

2019-03-01 Thread Kenneth Brotman
Michael is right.  It's not the topic of this list.  Write me offline and I'll 
be glad to direct you to some resources with the information you are looking 
for.

Kenneth Brotman 

-Original Message-
From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael Shuler
Sent: Friday, March 01, 2019 12:49 PM
To: dev@cassandra.apache.org
Subject: Re: Looking for common cases in cassandra fit to autoheal

Your request is not related to the topic of this mailing list, the
development of Apache Cassandra. Your question would be better suited
for the user@ mailing list. Your question is also quite vague, and you
might include some detail or context about what exactly you are looking
for. You may also wish to drop the unenforceable email footer or use a
personal email address, if it's appended by your mail server.

-- 
Kind regards,
Michael
http://www.pbandjelly.org/2011/03/to-whom-it-may-concern/

On 3/1/19 2:33 PM, Sundaramoorthy, Natarajan wrote:
> Can someone please reply? Thanks
>  
>  
> 
> 
> -Original Message-
> From: Sundaramoorthy, Natarajan [mailto:natarajan_sundaramoor...@optum.com] 
> Sent: Wednesday, February 27, 2019 3:17 PM
> To: dev@cassandra.apache.org
> Subject: Looking for common cases in cassandra fit to autoheal
> 
> Can someone please provide some common cases in cassandra which can be 
> candidate for autoheal? Looking for log files and issue..Both from VM and/or 
> pod world welcome...Thanks in advance for help
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Looking for common cases in cassandra fit to autoheal

2019-03-01 Thread Dinesh Joshi
Natarajan, it looks like your company is trying to build a monetizable solution 
around Cassandra. It would be helpful for you to look at the literature around 
Cassandra and consult with folks who run Cassandra. Perhaps also contribute 
back learnings to our community so we might benefit from it too.

Dinesh

> On Mar 1, 2019, at 12:49 PM, Michael Shuler  wrote:
> 
> Your request is not related to the topic of this mailing list, the
> development of Apache Cassandra. Your question would be better suited
> for the user@ mailing list. Your question is also quite vague, and you
> might include some detail or context about what exactly you are looking
> for. You may also wish to drop the unenforceable email footer or use a
> personal email address, if it's appended by your mail server.
> 
> -- 
> Kind regards,
> Michael
> http://www.pbandjelly.org/2011/03/to-whom-it-may-concern/
> 
>> On 3/1/19 2:33 PM, Sundaramoorthy, Natarajan wrote:
>> Can someone please reply? Thanks
>>  
>>  
>> 
>> 
>> -Original Message-
>> From: Sundaramoorthy, Natarajan [mailto:natarajan_sundaramoor...@optum.com] 
>> Sent: Wednesday, February 27, 2019 3:17 PM
>> To: dev@cassandra.apache.org
>> Subject: Looking for common cases in cassandra fit to autoheal
>> 
>> Can someone please provide some common cases in cassandra which can be 
>> candidate for autoheal? Looking for log files and issue..Both from VM and/or 
>> pod world welcome...Thanks in advance for help
>> 
>> This e-mail, including attachments, may include confidential and/or
>> proprietary information, and may be used only by the person or entity
>> to which it is addressed. If the reader of this e-mail is not the intended
>> recipient or his or her authorized agent, the reader is hereby notified
>> that any dissemination, distribution or copying of this e-mail is
>> prohibited. If you have received this e-mail in error, please notify the
>> sender by replying to this message and delete this e-mail immediately.
>> 
>> This e-mail, including attachments, may include confidential and/or
>> proprietary information, and may be used only by the person or entity
>> to which it is addressed. If the reader of this e-mail is not the intended
>> recipient or his or her authorized agent, the reader is hereby notified
>> that any dissemination, distribution or copying of this e-mail is
>> prohibited. If you have received this e-mail in error, please notify the
>> sender by replying to this message and delete this e-mail immediately.
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Looking for common cases in cassandra fit to autoheal

2019-03-01 Thread Michael Shuler
Your request is not related to the topic of this mailing list, the
development of Apache Cassandra. Your question would be better suited
for the user@ mailing list. Your question is also quite vague, and you
might include some detail or context about what exactly you are looking
for. You may also wish to drop the unenforceable email footer or use a
personal email address, if it's appended by your mail server.

-- 
Kind regards,
Michael
http://www.pbandjelly.org/2011/03/to-whom-it-may-concern/

On 3/1/19 2:33 PM, Sundaramoorthy, Natarajan wrote:
> Can someone please reply? Thanks
>  
>  
> 
> 
> -Original Message-
> From: Sundaramoorthy, Natarajan [mailto:natarajan_sundaramoor...@optum.com] 
> Sent: Wednesday, February 27, 2019 3:17 PM
> To: dev@cassandra.apache.org
> Subject: Looking for common cases in cassandra fit to autoheal
> 
> Can someone please provide some common cases in cassandra which can be 
> candidate for autoheal? Looking for log files and issue..Both from VM and/or 
> pod world welcome...Thanks in advance for help
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: Looking for common cases in cassandra fit to autoheal

2019-03-01 Thread Sundaramoorthy, Natarajan
Can someone please reply? Thanks
 
 


-Original Message-
From: Sundaramoorthy, Natarajan [mailto:natarajan_sundaramoor...@optum.com] 
Sent: Wednesday, February 27, 2019 3:17 PM
To: dev@cassandra.apache.org
Subject: Looking for common cases in cassandra fit to autoheal

Can someone please provide some common cases in cassandra which can be 
candidate for autoheal? Looking for log files and issue..Both from VM and/or 
pod world welcome...Thanks in advance for help

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Audit logging to tables.

2019-03-01 Thread Jeremiah D Jordan
AFAIK the Full Query Logging binary format was already made more general in 
order to support using that format for the audit logging.

-Jeremiah

> On Mar 1, 2019, at 11:38 AM, Joshua McKenzie  wrote:
> 
> Is there a world in which a general purpose, side-channel file storage
> format for transient things like this (hints, batches, audit logs, etc)
> could be useful as a first class citizen in the codebase? i.e. a world in
> which we refactored some of the hints-specific reader/writer code to be
> used for things like this if/when they come up?
> 
> On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad  > wrote:
> 
>> Agreed with Dinesh and Josh.  I would *never* put the audit log back in
>> Cassandra.
>> 
>> This is extendable, Sagar, so you're free to do as you want, but I'm very
>> opposed to putting a ticking time bomb in Cassandra proper.
>> 
>> Jon
>> 
>> 
>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi 
>> wrote:
>> 
>>> I strongly echo Josh’s sentiment. Imagine losing audit entries because C*
>>> is overloaded? It’s fine if you don’t care about losing audit entries.
>>> 
>>> Dinesh
>>> 
 On Feb 28, 2019, at 6:41 AM, Joshua McKenzie 
>>> wrote:
 
 One of the things we've run into historically, on a *lot* of axes, is
>>> that
 "just put it in C*" for various functionality looks great from a user
>> and
 usability perspective, and proves to be something of a nightmare from
>> an
 admin / cluster behavior perspective.
 
 i.e. - cluster suffering so you're writing hints? Write them to C*
>> tables
 and watch the cluster suffer more! :)
 Same thing probably holds true for audit logging - at a time frame when
 things are getting hairy w/a cluster, if you're writing that audit
>>> logging
 into C* proper (and dealing with ser/deser, compaction pressure,
>> flushing
 pressure, etc) from that, there's a compounding effect of pressure and
>>> pain
 on the cluster.
 
 So the TL;DR we as a project kind of philosophically have been moving
 towards (I think that's valid to say?) is: use C* for the things it's
 absolutely great at, and try to side-channel other recovery operations
>> as
 much as you can (see: file-based hints) to stay out of its way.
 
 Same thing held true w/design of CDC - I debated "materialize in memory
>>> for
 consumer to take over socket", and "keep the data in another C* table",
>>> but
 the ramifications to perf and core I/O operations in C* the moment
>> things
 start to go badly were significant enough that the route we went was
>> "do
>>> no
 harm". For better or for worse, as there's obvious tradeoffs there.
 
> On Thu, Feb 28, 2019 at 7:46 AM Sagar 
>>> wrote:
> 
> Thanks all for the pointers.
> 
> @Joseph,
> 
> I have gone through the links shared by you. Also, I have been looking
>>> at
> the code base.
> 
> I understand the fact that pushing the logs to ES or Solr is a lot
>>> easier
> to do. Having said that, the only reason I thought having something
>> like
> this might help is, if I don't want to add more pieces and still
>>> provide a
> central piece of audit logging within Cassandra itself and still be
> queryable.
> 
> In terms of usages, one of them could definitely be CDC related use
>>> cases.
> With data being stored in tables and being queryable, it can become a
>>> lot
> more easier to expose this data to external systems like Kafka
>> Connect,
> Debezium which have the ability to push data to Kafka for example.
>> Note
> that pushing data to Kafka is just an example, but what I mean is, if
>> we
> can have data in tables, then instead of everyone writing custom
>> custom
> loggers, they can hook into this table info and take action.
> 
> Regarding the infinite loop question, I have done some analysis, and
>> in
>>> my
> opinion, instead of tweaking the behaviour of Binlog and the way it
> functions currently, we can actually spin up another tailer thread to
>>> the
> same Chronicle Queue which can do the needful. This way the config
>>> options
> etc all remain the same(apart from the logger ofcourse).
> 
> Let me know if any of it makes sense :D
> 
> Thanks!
> Sagar.
> 
> 
> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi
>> >>> 
> wrote:
> 
>> 
>> 
>>> On Feb 27, 2019, at 10:41 AM, Joseph Lynch 
>> wrote:
>>> 
>>> Vinay can confirm, but as far as I am aware we have no current plans
>>> to
>>> implement audit logging to a table directly, but the implementation
>> is
>>> fully pluggable (like compaction, compression, etc ...). Check out
>> the
>> blog
>>> post [1] and documentation [2] Vinay wrote for more details, but the
>> short
>> 
>> +1. I am still curious as to why you'd want to store audit log
>> entries
>> back in Cassandra? Dependin

Re: Audit logging to tables.

2019-03-01 Thread Joshua McKenzie
Is there a world in which a general purpose, side-channel file storage
format for transient things like this (hints, batches, audit logs, etc)
could be useful as a first class citizen in the codebase? i.e. a world in
which we refactored some of the hints-specific reader/writer code to be
used for things like this if/when they come up?

On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad  wrote:

> Agreed with Dinesh and Josh.  I would *never* put the audit log back in
> Cassandra.
>
> This is extendable, Sagar, so you're free to do as you want, but I'm very
> opposed to putting a ticking time bomb in Cassandra proper.
>
> Jon
>
>
> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi 
> wrote:
>
> > I strongly echo Josh’s sentiment. Imagine losing audit entries because C*
> > is overloaded? It’s fine if you don’t care about losing audit entries.
> >
> > Dinesh
> >
> > > On Feb 28, 2019, at 6:41 AM, Joshua McKenzie 
> > wrote:
> > >
> > > One of the things we've run into historically, on a *lot* of axes, is
> > that
> > > "just put it in C*" for various functionality looks great from a user
> and
> > > usability perspective, and proves to be something of a nightmare from
> an
> > > admin / cluster behavior perspective.
> > >
> > > i.e. - cluster suffering so you're writing hints? Write them to C*
> tables
> > > and watch the cluster suffer more! :)
> > > Same thing probably holds true for audit logging - at a time frame when
> > > things are getting hairy w/a cluster, if you're writing that audit
> > logging
> > > into C* proper (and dealing with ser/deser, compaction pressure,
> flushing
> > > pressure, etc) from that, there's a compounding effect of pressure and
> > pain
> > > on the cluster.
> > >
> > > So the TL;DR we as a project kind of philosophically have been moving
> > > towards (I think that's valid to say?) is: use C* for the things it's
> > > absolutely great at, and try to side-channel other recovery operations
> as
> > > much as you can (see: file-based hints) to stay out of its way.
> > >
> > > Same thing held true w/design of CDC - I debated "materialize in memory
> > for
> > > consumer to take over socket", and "keep the data in another C* table",
> > but
> > > the ramifications to perf and core I/O operations in C* the moment
> things
> > > start to go badly were significant enough that the route we went was
> "do
> > no
> > > harm". For better or for worse, as there's obvious tradeoffs there.
> > >
> > >> On Thu, Feb 28, 2019 at 7:46 AM Sagar 
> > wrote:
> > >>
> > >> Thanks all for the pointers.
> > >>
> > >> @Joseph,
> > >>
> > >> I have gone through the links shared by you. Also, I have been looking
> > at
> > >> the code base.
> > >>
> > >> I understand the fact that pushing the logs to ES or Solr is a lot
> > easier
> > >> to do. Having said that, the only reason I thought having something
> like
> > >> this might help is, if I don't want to add more pieces and still
> > provide a
> > >> central piece of audit logging within Cassandra itself and still be
> > >> queryable.
> > >>
> > >> In terms of usages, one of them could definitely be CDC related use
> > cases.
> > >> With data being stored in tables and being queryable, it can become a
> > lot
> > >> more easier to expose this data to external systems like Kafka
> Connect,
> > >> Debezium which have the ability to push data to Kafka for example.
> Note
> > >> that pushing data to Kafka is just an example, but what I mean is, if
> we
> > >> can have data in tables, then instead of everyone writing custom
> custom
> > >> loggers, they can hook into this table info and take action.
> > >>
> > >> Regarding the infinite loop question, I have done some analysis, and
> in
> > my
> > >> opinion, instead of tweaking the behaviour of Binlog and the way it
> > >> functions currently, we can actually spin up another tailer thread to
> > the
> > >> same Chronicle Queue which can do the needful. This way the config
> > options
> > >> etc all remain the same(apart from the logger ofcourse).
> > >>
> > >> Let me know if any of it makes sense :D
> > >>
> > >> Thanks!
> > >> Sagar.
> > >>
> > >>
> > >> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi
>  > >
> > >> wrote:
> > >>
> > >>>
> > >>>
> >  On Feb 27, 2019, at 10:41 AM, Joseph Lynch 
> > >>> wrote:
> > 
> >  Vinay can confirm, but as far as I am aware we have no current plans
> > to
> >  implement audit logging to a table directly, but the implementation
> is
> >  fully pluggable (like compaction, compression, etc ...). Check out
> the
> > >>> blog
> >  post [1] and documentation [2] Vinay wrote for more details, but the
> > >>> short
> > >>>
> > >>> +1. I am still curious as to why you'd want to store audit log
> entries
> > >>> back in Cassandra? Depending on the scale it can generate a lot of
> load
> > >> and
> > >>> I think you'd end up in an infinite loop because as you're inserting
> > the
> > >>> audit log entry you'll generate a new one and so on unless you black
> > list

Re: How Apache Cassandra handles flaky tests

2019-03-01 Thread Stanislav Kozlovski
Those are all helpful tips and all make complete sense to me. Thank you very 
much for sharing your experience! :)

On 2019/02/27 00:55:34, "d...@yahoo.com.INVALID"  
wrote: 
> +1 to everything Jeff said. As someone who has worked on flaky tests not just 
> in Cassandra's context, I know it can be hard to deal with them. > 
> However, it's best to root cause them. I have found some flaky tests were 
> genuine issues that needed fixing in Cassandra. Sometimes the flakiness is 
> due to underpowered VMs running low on resources or in one case tests failed 
> due to the kernel settings different between systems. Explore tuning the VM 
> settings used for the test execution. I usually don't prefer adding retries 
> but in some cases retries can be helpful. Rewriting the tests to reduce 
> dependencies on external systems or using mocks is another useful method in 
> reducing the flakiness. Try breaking up tests if they're too big. Finally 
> deleting tests can also be a solution but use it sparingly. > 
> I am believe in the broken windows theory so it is critical that you spend 
> time fixing them else everyone ignores them and attributes all failures to 
> "flakiness" leading to real issues sneaking in.> 
> Dinesh> 
> 
> On Tuesday, February 26, 2019, 12:06:10 PM PST, Jeff Jirsa 
>  wrote:  > 
>  > 
>  > 
> 
> 
> > On Feb 26, 2019, at 8:26 AM, Stanislav Kozlovski  
> > wrote:> 
> > > 
> > Hey there Cassandra community,> 
> > > 
> > I work on a fellow open-source project - Apache Kafka - and there we have 
> > been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every 
> > Pull Request and due to test flakiness, almost all of them turn out red 
> > with 1 or 2 tests (completely unrelated to the change in the PR) failing. 
> > This has resulted in committers ignoring them and merging the changes 
> > either way, or in the worst case - rerunning the hour-long build until it 
> > becomes green.> 
> 
> I hope most committers wont commit unless the flakey rest is definitely not 
> in the subsystem they touched. But yes, one of the motivations for speeding 
> up tests (parallelized on a containerized hosted CI platform) was to cut down 
> the time for (re-)running> 
>  > 
> > This test flakiness has also slowed down our releases significantly.> 
> > > 
> > In general, I was just curious to understand if this is a problem that 
> > Cassandra faces as well.> 
> 
> Yes> 
> 
> 
> > Does your project have a lot of intermittently failing tests,> 
> 
> Sometimes more than others. There were a few big pushes to get green, though 
> it naturally regresses a bit over time > 
> 
> > do you have any active process of addressing such tests (during the initial 
> > review, after realizing it is flaky, etc). Any pointers will be greatly 
> > appreciated!> 
> 
> I don’t think we’ve solved this convincingly. Different large (corporate) 
> contributors have done long one time passes, and that helped a ton, but I 
> don’t think there are any silver bullets yet.> 
> -> 
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org> 
> For additional commands, e-mail: dev-h...@cassandra.apache.org> 
>   >