> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman <y...@iki.fi> wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a bit outside the scope
> of CEP, but just to ensure that the implementation does not block such
> improvement.
>
> Interfaces:
>
> > boolean writesAreDurable()
> > boolean writesShouldSkipCommitLog()
>
> The placement inside memtable implementation for these methods just feels
> incredibly wrong to me. The writing pipeline should have these configured
> and they could differ for each table even with the same memtable
> implementation. Lets take the example of an in-memory memtable use case
> that's never written to a SSTable. We could have one table with just simply
> in-memory cached storage and another one with a Redis style persistence of
> AOF, where writes would be written to the commitlog for fast recovery, but
> the data is otherwise always only kept in the memtable instead of writing
> to the SSTable (for performance reasons). Same implementation of memtable
> still.
>
> Why would the write process of the table not ask the table what settings it
> has and instead asks the memtable what settings the table has? This seems
> counterintuitive to me. Even the persistent memory case is a bit
> questionable, why not simply disable commitlog in the writing process? Why
> ask the memtable?
>
> This feels like memtable is going to be the write pipeline, but to me that
> doesn't feel like the correct architectural decision. I'd rather see these
> decisions done outside the memtable. Even a persistent memory memtable user
> might want to have a commitlog enabled for data capture / shipping logs, or
> layers of persistence speed. The whole persistent memory without any
> commercially known future is a bit weird at the moment (even Optane has no
> known manufacturing anymore with last factory being dismantled based on
> public information).
>
> > boolean streamToMemtable()
>
> And that one I don't understand. Why is streaming in the memtable? This
> smells like a scope creep from something else. The explanation would
> indicate to me that the wanted behavior is just disabling automated
> flushing.
>
> But these are just some questions that came to my mind while reading this.
> And I don't want to sound too negative (most of the features are really
> something I'd like to see), perhaps I just misunderstood some of the
> motivations why stuff should be brought to memtable instead of being
> implemented outside memtable. Perhaps there's something else in the write
> pipeline arch that needs fixing but is now masqueraded inside this CEP.
>
> I'm definitely interested to hear more.
>
>   - Micke
>
> On Wed, 21 Jul 2021 at 08:24, Berenguer Blasi <berenguerbl...@gmail.com>
> wrote:
>
> > +1. De-tangling, going more modular and clean interfaces sgtm.
> >
> > On 20/7/21 21:45, Nate McCall wrote:
> > > Yay for pluggable memtables!! I havent gone over this in detail yet,
> but
> > > personally I've always thought integrating something like Arrow would
> be
> > > cool for sharing data (that's as far as i've gotten, but anything that
> > > makes that kind of experimentation easier would also help with mocking
> > test
> > > plumbing, so +1 from me).
> > >
> > > Thanks for putting this together!
> > >
> > > -Nate
> > >
> > > On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
> > > branimir.lam...@datastax.com> wrote:
> > >
> > >> Proposal for a mechanism for plugging in memtable implementations:
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
> > >>
> > >> The proposal supports using custom memtable implementations to support
> > >> development and testing of improved alternatives, but also enables a
> > >> broader definition of "memtable" to better support more advanced use
> > cases
> > >> like persistent memory. To this end, memtable implementations are
> given
> > >> control over flushing and storing data in the commit log, enabling
> > >> solutions that implement their own durability mechanisms and live much
> > >> longer than their classical counterparts. Taken to the extreme, this
> > also
> > >> enables memtables that never flush (in other words, alternative
> storage
> > >> engines) in a minimally-invasive manner.
> > >>
> > >> I am curious to hear your thoughts on the proposal.
> > >>
> > >> Regards,
> > >> Branimir
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


-- 
Branimir Lambov
e. branimir.lam...@datastax.com
w. www.datastax.com

Reply via email to