Re: [DISCUSS] Diagnostic events in virtual tables

2021-07-21 Thread Stefan Miklosovic
Thank you both for the answers, really appreciate it.

My concern is that there would be "too much data" in them and it might
grow and grow as a node is live, however, that being said, I think
there is currently a hardcap like 200 events per event type in the
current implementation (1). Hence I see that it is somehow limited
even for now which might be just fine. Nevertheless that limit might
be at least configurable (at runtime).

Ideally, there should be some kind of way to persist them anywhere,
not only to some in-memory structure so virtual tables and bin logs
are clearly two candidates to do this. So when I extend this idea, it
should be just pluggable or some kind of extension point should be
there.

I will give it a second thought how that might be generalised and I'll
try to come up with CEP covering this.

Thank you once again for the response.

(1) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/diag/store/DiagnosticEventMemoryStore.java#L39

On Wed, 21 Jul 2021 at 18:22, Jeremiah D Jordan
 wrote:
>
> Yes, I think it would make sense to have the events available in a virtual 
> table, especially if we are trying to move our operational management in that 
> direction.
>
> But, why does it need to be bin log or virtual tables?  Why not both?  The 
> virtual tables could even return the data stored in the bin log making them 
> persistent if wanted.
>
> -Jeremiah
>
> > On Jul 21, 2021, at 7:45 AM, Paulo Motta  wrote:
> >
> > I'm not very familiar with diagnostic events, but I think there's great
> > value in providing a Virtual Table implementation to diagnostic events,
> > since this will help in the long term objective of providing a unified
> > interface to node monitoring, so +1 from my side. I think it would
> > definitely help to write a CEP to detail the proposal and cast a vote if
> > there are no objections.
> >
> > In case you are not aware, Apache has the concept of lazy consensus <
> > https://community.apache.org/committers/lazyConsensus.html>, so as long as
> > nobody opposes your proposal you're good to move forward with it, so people
> > not caring to even dropping an e-mail can actually be a good thing. ;)
> >
> > Em qua., 21 de jul. de 2021 às 03:39, Stefan Miklosovic <
> > stefan.mikloso...@instaclustr.com> escreveu:
> >
> >> Hi,
> >>
> >> should I create CEP first or people just absolutely do not care to
> >> even drop an email and it does not make sense to them?
> >>
> >> Regards
> >>
> >> On Mon, 19 Jul 2021 at 15:32, Stefan Miklosovic
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I wonder if people would be interested in having diagnostic events in
> >>> virtual tables?
> >>>
> >>> I put something together here (heavy wip) (1) but that is the idea I
> >> have.
> >>>
> >>> I know that Stefan Podkowinski did a spectacular job under
> >>> CASSANDRA-12944 and the possibility to persist locally was elaborated
> >>> on in (2) where the conclusion was made that maybe it is more suitable
> >>> to put it into chronicle queues via BinLog path and so on.
> >>>
> >>> The benefits of bin log is that we have events persisted and they can
> >>> be inspected further when node is offline.
> >>>
> >>> While data in virtual tables cease to exist after nodes are down, one
> >>> nice benefit of having it in virtual tables is that we can query it
> >>> comfortably via CQL and I think that this solution is more suitable to
> >>> have on an every day basis from operator's point of view. There is
> >>> still a way to dump it somewhere else anyway if one is really
> >>> interested in doing so.
> >>>
> >>> Do you think that the bin log solution is overall superior to the
> >>> virtual tables approach and we should just forget about having it in
> >>> virtual tables?
> >>>
> >>> If that is the case, what I would like to see is to have some
> >>> pluggability here so I might implement this on my own and configure a
> >>> node to put it to virtual tables for me, it does not necessarily have
> >>> to be the part of Cassandra code base if we are strongly against that.
> >>>
> >>> (1)
> >> https://github.com/instaclustr/cassandra/commit/0dd60dc0a847619fdb704b700154e624b21a0c35
> >>> (2) https://issues.apache.org/jira/browse/CASSANDRA-13460
> >>>
> >>> Regards
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Diagnostic events in virtual tables

2021-07-21 Thread Jeremiah D Jordan
Yes, I think it would make sense to have the events available in a virtual 
table, especially if we are trying to move our operational management in that 
direction.

But, why does it need to be bin log or virtual tables?  Why not both?  The 
virtual tables could even return the data stored in the bin log making them 
persistent if wanted.

-Jeremiah

> On Jul 21, 2021, at 7:45 AM, Paulo Motta  wrote:
> 
> I'm not very familiar with diagnostic events, but I think there's great
> value in providing a Virtual Table implementation to diagnostic events,
> since this will help in the long term objective of providing a unified
> interface to node monitoring, so +1 from my side. I think it would
> definitely help to write a CEP to detail the proposal and cast a vote if
> there are no objections.
> 
> In case you are not aware, Apache has the concept of lazy consensus <
> https://community.apache.org/committers/lazyConsensus.html>, so as long as
> nobody opposes your proposal you're good to move forward with it, so people
> not caring to even dropping an e-mail can actually be a good thing. ;)
> 
> Em qua., 21 de jul. de 2021 às 03:39, Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> escreveu:
> 
>> Hi,
>> 
>> should I create CEP first or people just absolutely do not care to
>> even drop an email and it does not make sense to them?
>> 
>> Regards
>> 
>> On Mon, 19 Jul 2021 at 15:32, Stefan Miklosovic
>>  wrote:
>>> 
>>> Hi,
>>> 
>>> I wonder if people would be interested in having diagnostic events in
>>> virtual tables?
>>> 
>>> I put something together here (heavy wip) (1) but that is the idea I
>> have.
>>> 
>>> I know that Stefan Podkowinski did a spectacular job under
>>> CASSANDRA-12944 and the possibility to persist locally was elaborated
>>> on in (2) where the conclusion was made that maybe it is more suitable
>>> to put it into chronicle queues via BinLog path and so on.
>>> 
>>> The benefits of bin log is that we have events persisted and they can
>>> be inspected further when node is offline.
>>> 
>>> While data in virtual tables cease to exist after nodes are down, one
>>> nice benefit of having it in virtual tables is that we can query it
>>> comfortably via CQL and I think that this solution is more suitable to
>>> have on an every day basis from operator's point of view. There is
>>> still a way to dump it somewhere else anyway if one is really
>>> interested in doing so.
>>> 
>>> Do you think that the bin log solution is overall superior to the
>>> virtual tables approach and we should just forget about having it in
>>> virtual tables?
>>> 
>>> If that is the case, what I would like to see is to have some
>>> pluggability here so I might implement this on my own and configure a
>>> node to put it to virtual tables for me, it does not necessarily have
>>> to be the part of Cassandra code base if we are strongly against that.
>>> 
>>> (1)
>> https://github.com/instaclustr/cassandra/commit/0dd60dc0a847619fdb704b700154e624b21a0c35
>>> (2) https://issues.apache.org/jira/browse/CASSANDRA-13460
>>> 
>>> Regards
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread Branimir Lambov
> Why is flushing control bad to do in CFS and better in the
  memtable?

I wonder why you would understand this as something that takes away
control instead of giving it. The CFS is not configurable. With the
CEP, memtables are configurable at the table level. It is entirely
possible to implement a memtable wrapper that provides any of the
examples of functionalities you mention -- and that would be fully
configurable (just as example, one could very well select a
time-series-optimized-flush wrapper over skip-list memtable).



> is this proposal going to take an angle towards per-range
  memtables?

This is another question that the proposal leaves to the memtable
implementation (or wrapper), but it does make sense to make sure the
interfaces provide the necessary support for sharding (e.g. by
providing suitable shard boundaries that split the owned space; note
that we already have sstable/compaction-per-range functionality with
multiple data directories and it makes sense to ensure that the
provided splits are in some agreement with the data directory
boundaries).



> Why would the write process of the table not ask the table what
  settings it has and instead asks the memtable what settings the
  table has?

The reason for this is that memtables are the primary reason the
commit log needs to preserve data. The question of whether ot not the
memtable needs its content to be present and retained in the commit
log until flush (writesAreDurable) is a question that only the
memtable can answer.

writesShouldSkipCommitLog is a result of scope reduction (call it
laziness on my part). I could not find a way to tell if commit log
data may be required for point-in-time-restore or any other feature,
and the existing method of turning the commit log off does not have
the right granularity. I am very open to suggestions here.



> Why is streaming in the memtable? [...] the wanted behavior is just
  disabling automated flushing

Yes, if zero-copy-streaming is not enabled. And that's exactly what
this method is there for -- to make sure sstables are not copied
whole, and that a flush is not done at the end.

Regards,
Branimir

On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org 
wrote:

> I would love to help out with this in any way that I can, FYI. Definitely
> one of the more impactful performance improvements to the codebase, given
> the benefits to compaction and memory behaviour.
>
> From: bened...@apache.org 
> Date: Wednesday, 21 July 2021 at 14:32
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > memtable-as-a-commitlog-index
>
> Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually
> there was a paper that did this a long time ago), and it could be very nice
> (if for no other benefit than reducing heap utilisation). I don’t think
> this requires that they be modelled as the same concept, however, only that
> the Memtable must be able to receive an address into a commit log entry and
> to adopt partial ownership over the entry’s lifecycle.
>
>
> From: Branimir Lambov 
> Date: Wednesday, 21 July 2021 at 14:28
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > In general, I think we need to make up our mind as to whether we
>   consider the Memtable and CommitLog one logical entity [...], or
>   whether we want to further untangle those two components from an
>   architectural perspective which we started down that road on with
>   the pluggable storage engine work.
>
> This CEP is intentionally not attempting to answer this question. FWIW
> I do not see them as separable (there's evidence to this fact in the
> codebase), but there are valid secondary uses of the commit log that
> are served well enough by the current architecture.
>
> It is important, however, to let the memtable implementation opt out,
> to permit it to provide its own solution for data persistence.
>
> We should revisit this in the future, especially if Benedict's shared
> log facility and my plans for a memtable-as-a-commitlog-index
> evolve.
>
> Regards,
> Branimir
>
> On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:
>
> > Hi,
> >
> > It is nice to see these going forward (and a great use of CEP) so thanks
> > for the proposal. I have my reservations regarding the linking of
> memtable
> > to CommitLog and flushing and should not leak abstraction from one to
> > another. And I don't see the reasoning why they should be, it doesn't
> seem
> > to add anything else than tight coupling of components, reducing reuse
> and
> > making things unnecessarily complicated. Also, the streaming notions seem
> > weird to me - how are they related to memtable? Why should memtable care
> > about the behavior outside memtable's responsibility?
> >
> > Some misc (with some thoughts split / duplicated to different parts)
> quotes
> > and comments:
> >
> > > Tight coupling between CFS and memtable will be reduced: flushing
> > 

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread bened...@apache.org
I would love to help out with this in any way that I can, FYI. Definitely one 
of the more impactful performance improvements to the codebase, given the 
benefits to compaction and memory behaviour.

From: bened...@apache.org 
Date: Wednesday, 21 July 2021 at 14:32
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> memtable-as-a-commitlog-index

Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually there 
was a paper that did this a long time ago), and it could be very nice (if for 
no other benefit than reducing heap utilisation). I don’t think this requires 
that they be modelled as the same concept, however, only that the Memtable must 
be able to receive an address into a commit log entry and to adopt partial 
ownership over the entry’s lifecycle.


From: Branimir Lambov 
Date: Wednesday, 21 July 2021 at 14:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a 

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread bened...@apache.org
> memtable-as-a-commitlog-index

Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually there 
was a paper that did this a long time ago), and it could be very nice (if for 
no other benefit than reducing heap utilisation). I don’t think this requires 
that they be modelled as the same concept, however, only that the Memtable must 
be able to receive an address into a commit log entry and to adopt partial 
ownership over the entry’s lifecycle.


From: Branimir Lambov 
Date: Wednesday, 21 July 2021 at 14:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a bit outside the scope
> of CEP, but just to ensure that the implementation does not block such
> improvement.
>
> Interfaces:
>
> > boolean writesAreDurable()
> > boolean writesShouldSkipCommitLog()
>
> The placement inside memtable implementation for these methods just feels
> incredibly wrong to me. The writing pipeline should have these configured
> and 

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread Branimir Lambov
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a bit outside the scope
> of CEP, but just to ensure that the implementation does not block such
> improvement.
>
> Interfaces:
>
> > boolean writesAreDurable()
> > boolean writesShouldSkipCommitLog()
>
> The placement inside memtable implementation for these methods just feels
> incredibly wrong to me. The writing pipeline should have these configured
> and they could differ for each table even with the same memtable
> implementation. Lets take the example of an in-memory memtable use case
> that's never written to a SSTable. We could have one table with just simply
> in-memory cached storage and another one with a Redis style persistence of
> AOF, where writes would be written to the commitlog for fast recovery, but
> the data is otherwise always only kept in the memtable instead of writing
> to the SSTable (for performance reasons). Same implementation of memtable
> still.
>
> Why would the write process of the table not ask the table what settings it
> has and instead 

Re: [DISCUSS] Diagnostic events in virtual tables

2021-07-21 Thread Paulo Motta
I'm not very familiar with diagnostic events, but I think there's great
value in providing a Virtual Table implementation to diagnostic events,
since this will help in the long term objective of providing a unified
interface to node monitoring, so +1 from my side. I think it would
definitely help to write a CEP to detail the proposal and cast a vote if
there are no objections.

In case you are not aware, Apache has the concept of lazy consensus <
https://community.apache.org/committers/lazyConsensus.html>, so as long as
nobody opposes your proposal you're good to move forward with it, so people
not caring to even dropping an e-mail can actually be a good thing. ;)

Em qua., 21 de jul. de 2021 às 03:39, Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> escreveu:

> Hi,
>
> should I create CEP first or people just absolutely do not care to
> even drop an email and it does not make sense to them?
>
> Regards
>
> On Mon, 19 Jul 2021 at 15:32, Stefan Miklosovic
>  wrote:
> >
> > Hi,
> >
> > I wonder if people would be interested in having diagnostic events in
> > virtual tables?
> >
> > I put something together here (heavy wip) (1) but that is the idea I
> have.
> >
> > I know that Stefan Podkowinski did a spectacular job under
> > CASSANDRA-12944 and the possibility to persist locally was elaborated
> > on in (2) where the conclusion was made that maybe it is more suitable
> > to put it into chronicle queues via BinLog path and so on.
> >
> > The benefits of bin log is that we have events persisted and they can
> > be inspected further when node is offline.
> >
> > While data in virtual tables cease to exist after nodes are down, one
> > nice benefit of having it in virtual tables is that we can query it
> > comfortably via CQL and I think that this solution is more suitable to
> > have on an every day basis from operator's point of view. There is
> > still a way to dump it somewhere else anyway if one is really
> > interested in doing so.
> >
> > Do you think that the bin log solution is overall superior to the
> > virtual tables approach and we should just forget about having it in
> > virtual tables?
> >
> > If that is the case, what I would like to see is to have some
> > pluggability here so I might implement this on my own and configure a
> > node to put it to virtual tables for me, it does not necessarily have
> > to be the part of Cassandra code base if we are strongly against that.
> >
> > (1)
> https://github.com/instaclustr/cassandra/commit/0dd60dc0a847619fdb704b700154e624b21a0c35
> > (2) https://issues.apache.org/jira/browse/CASSANDRA-13460
> >
> > Regards
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread Michael Burman
Hi,

It is nice to see these going forward (and a great use of CEP) so thanks
for the proposal. I have my reservations regarding the linking of memtable
to CommitLog and flushing and should not leak abstraction from one to
another. And I don't see the reasoning why they should be, it doesn't seem
to add anything else than tight coupling of components, reducing reuse and
making things unnecessarily complicated. Also, the streaming notions seem
weird to me - how are they related to memtable? Why should memtable care
about the behavior outside memtable's responsibility?

Some misc (with some thoughts split / duplicated to different parts) quotes
and comments:

> Tight coupling between CFS and memtable will be reduced: flushing
functionality is to be extracted, controlling memtable memory and period
expiration will be handled by the memtable.

Why is flushing control bad to do in CFS and better in the memtable? Doing
it outside memtable would allow to control the flushing regardless of how
the actual memtable is implemented. For example, lets say someone would
want to implement the HBase's accordion to Cassandra. It shouldn't matter
what the implementation of memtable is as the compaction of different
memtables could be beneficial to all implementations. Or the flushing would
push the memtable to a proper caching instead of only to disk.

Or if we had per table caching structure, we could control the flushing of
memtables and the cache structure separately. Some data benefits from LRU
and some from MRW (most-recently-written) caching strategies. But both
could benefit from the same memtable implementation, it's the data and how
its used that could control how the flushing should work. For example time
series data behaves quite differently in terms of data accesses to
something more "random".

Or even "total memory control" which would check which tables need more
memory to do their writes and which do not. Or that the memory doesn't grow
over a boundary and needs to manually maintain how much is dedicated to
caching and how much to memtables waiting to be flushed. Or delay flushing
because the disks can't keep up etc. Not to be implemented in this CEP, but
pushing this strategy to memtable would prevent many features.

> Beyond thread-safety, the concurrency constraints of the memtable are
intentionally left unspecified.

I like this. I could see use-cases where a single-thread implementation
could actually outperform some concurrent data structures. But it also
provides me with a question, is this proposal going to take an angle
towards per-range memtables? There are certainly benefits to splitting the
memtables as it would reduce the "n" in the operations, thus providing less
overhead in lookups and writes. Although, taking it one step backwards I
could see the benefit of having a commitlog per range also, which would
allow higher utilization of NVME drives with larger queue depths. And why
not per-range-sstables for faster scale-outs and .. a bit outside the scope
of CEP, but just to ensure that the implementation does not block such
improvement.

Interfaces:

> boolean writesAreDurable()
> boolean writesShouldSkipCommitLog()

The placement inside memtable implementation for these methods just feels
incredibly wrong to me. The writing pipeline should have these configured
and they could differ for each table even with the same memtable
implementation. Lets take the example of an in-memory memtable use case
that's never written to a SSTable. We could have one table with just simply
in-memory cached storage and another one with a Redis style persistence of
AOF, where writes would be written to the commitlog for fast recovery, but
the data is otherwise always only kept in the memtable instead of writing
to the SSTable (for performance reasons). Same implementation of memtable
still.

Why would the write process of the table not ask the table what settings it
has and instead asks the memtable what settings the table has? This seems
counterintuitive to me. Even the persistent memory case is a bit
questionable, why not simply disable commitlog in the writing process? Why
ask the memtable?

This feels like memtable is going to be the write pipeline, but to me that
doesn't feel like the correct architectural decision. I'd rather see these
decisions done outside the memtable. Even a persistent memory memtable user
might want to have a commitlog enabled for data capture / shipping logs, or
layers of persistence speed. The whole persistent memory without any
commercially known future is a bit weird at the moment (even Optane has no
known manufacturing anymore with last factory being dismantled based on
public information).

> boolean streamToMemtable()

And that one I don't understand. Why is streaming in the memtable? This
smells like a scope creep from something else. The explanation would
indicate to me that the wanted behavior is just disabling automated
flushing.

But these are just some questions that came to 

Re: [DISCUSS] Diagnostic events in virtual tables

2021-07-21 Thread Stefan Miklosovic
Hi,

should I create CEP first or people just absolutely do not care to
even drop an email and it does not make sense to them?

Regards

On Mon, 19 Jul 2021 at 15:32, Stefan Miklosovic
 wrote:
>
> Hi,
>
> I wonder if people would be interested in having diagnostic events in
> virtual tables?
>
> I put something together here (heavy wip) (1) but that is the idea I have.
>
> I know that Stefan Podkowinski did a spectacular job under
> CASSANDRA-12944 and the possibility to persist locally was elaborated
> on in (2) where the conclusion was made that maybe it is more suitable
> to put it into chronicle queues via BinLog path and so on.
>
> The benefits of bin log is that we have events persisted and they can
> be inspected further when node is offline.
>
> While data in virtual tables cease to exist after nodes are down, one
> nice benefit of having it in virtual tables is that we can query it
> comfortably via CQL and I think that this solution is more suitable to
> have on an every day basis from operator's point of view. There is
> still a way to dump it somewhere else anyway if one is really
> interested in doing so.
>
> Do you think that the bin log solution is overall superior to the
> virtual tables approach and we should just forget about having it in
> virtual tables?
>
> If that is the case, what I would like to see is to have some
> pluggability here so I might implement this on my own and configure a
> node to put it to virtual tables for me, it does not necessarily have
> to be the part of Cassandra code base if we are strongly against that.
>
> (1) 
> https://github.com/instaclustr/cassandra/commit/0dd60dc0a847619fdb704b700154e624b21a0c35
> (2) https://issues.apache.org/jira/browse/CASSANDRA-13460
>
> Regards

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org