Re: [DISCUSS] CEP-11: Pluggable memtable implementations

Branimir Lambov Fri, 23 Jul 2021 06:37:14 -0700

> CEP indicates the flushing behavior is suddenly more tied to the Memtable
  implementation level rather than being configurable at the table level


The specific things that change with the proposal are:
- Flushes are supplied with a reason (e.g. memory full, schema change,
prepare
  to stream).
- The memtable can reject a flush request.
- The logic to initiate "memory full" and "period expired" flushes moves to
the
  memtable where it conceptually belongs.

Is the latter what worries you? For reusability, the current logic is
extracted
in a base class that the skiplist/trie/7282 implementations derive from.


> I'm not sure if the "isDurable" + "shouldSkip" is interesting instead
  of "shouldWrite"(etc). But I also wonder in cases where point-in-time
restore
  is required how one could achieve it without a commit log(can persistent
  memory memtable be rolled back?).

That's exactly the reason why the two flags are separate. To use PITR, you
use
the commit log but make sure that it does not treat the segments covered by
the
persistent memtable as dirty(i.e. writesAreDurable but not
writesShouldSkipCommitLog); commit log segments are written only to be
archived, and PITR restores a memtable snapshot and applies the mutations
after
it.

Am I misunderstanding the question?


> Although I do feel like persistent memory exceptions make stuff more
complex.

The persistent memtables were the reason that drove this functionality, but
think about it also as an easy way to do pluggable storage engines. I may
not
be up to date with the consensus in the community on this, but I don't see
us
investing the effort to have fully-fledged pluggable storage engines of the
CASSANDRA-13475 type any time soon.

To make the memtable a storage engine you need two things:
- an opt out of flushing, so that the memtable is the only component that
serves
  reads,
- an opt out of the commit log, so that the memtable is the only component
that
  serves writes,

plus some solutions for the secondary uses of sstables (streaming) and
commit
log (PITR, CDC).

The proposal gives it that, with a little more control than just opt-out.
It can
work for the pmem (opt out of both) and rocksdb (opt out of flushing only)
use cases, but for me it will also be useful to experiment with a memtable
that
includes its own version of a commit log (opt out of commit log only).


On Thu, Jul 22, 2021 at 4:00 PM Michael Burman <y...@iki.fi> wrote:

> On Wed, 21 Jul 2021 at 17:24, Branimir Lambov <
> branimir.lam...@datastax.com>
> wrote:
>
> > > Why is flushing control bad to do in CFS and better in the
> >   memtable?
> >
> > I wonder why you would understand this as something that takes away
> > control instead of giving it. The CFS is not configurable. With the
> > CEP, memtables are configurable at the table level. It is entirely
> > possible to implement a memtable wrapper that provides any of the
> > examples of functionalities you mention -- and that would be fully
> > configurable (just as example, one could very well select a
> > time-series-optimized-flush wrapper over skip-list memtable).
> >
> >
> I think this was a bit of miscommunication. I'm not in favor of keeping it
> in the CFS, but at least to me (as a reader) CEP indicates the flushing
> behavior is suddenly more tied to the Memtable implementation level rather
> than being configurable at the table level. Thus that would not reduce
> coupling of different flush strategies, but instead just move it from CFS
> to Memtable-implementation. And especially with multiple Memtable
> implementations that would mean the reusable parts of flushing could end up
> being difficult to reuse. If not the intention, then good.
>
>
> >
> > This is another question that the proposal leaves to the memtable
> > implementation (or wrapper), but it does make sense to make sure the
> > interfaces provide the necessary support for sharding
> >
>
> + 1 to this, that's a good limitation of scope to get forward. I think this
> was originally touched in 7282 (where I had it in the memtable impl), but
> then got pushed one step outside.
>
> writesShouldSkipCommitLog is a result of scope reduction (call it
> > laziness on my part). I could not find a way to tell if commit log
> > data may be required for point-in-time-restore or any other feature,
> > and the existing method of turning the commit log off does not have
> > the right granularity. I am very open to suggestions here.
> >
>
> Could this be limited to a single parameter? I'm not sure if the
> "isDurable" + "shouldSkip" is interesting instead of "shouldWrite" (etc).
> But I also wonder in cases where point-in-time restore is required how one
> could achieve it without a commit log (can persistent memory memtable be
> rolled back?). That does have an effect on backups. I have to read your
> impl how you intended to rewrite the process from Keyspace (where the
> requirement for "isDurable" starts from).
>
> Although I do feel like persistent memory exceptions make stuff more
> complex.
>
>
>
> >
> >
> >
> > > Why is streaming in the memtable? [...] the wanted behavior is just
> >   disabling automated flushing
> >
> > Yes, if zero-copy-streaming is not enabled. And that's exactly what
> > this method is there for -- to make sure sstables are not copied
> > whole, and that a flush is not done at the end.
> >
> > Regards,
> > Branimir
> >
> > On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org <bened...@apache.org
> >
> > wrote:
> >
> > > I would love to help out with this in any way that I can, FYI.
> Definitely
> > > one of the more impactful performance improvements to the codebase,
> given
> > > the benefits to compaction and memory behaviour.
> > >
> > > From: bened...@apache.org <bened...@apache.org>
> > > Date: Wednesday, 21 July 2021 at 14:32
> > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > memtable-as-a-commitlog-index
> > >
> > > Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually
> > > there was a paper that did this a long time ago), and it could be very
> > nice
> > > (if for no other benefit than reducing heap utilisation). I don’t think
> > > this requires that they be modelled as the same concept, however, only
> > that
> > > the Memtable must be able to receive an address into a commit log entry
> > and
> > > to adopt partial ownership over the entry’s lifecycle.
> > >
> > >
> > > From: Branimir Lambov <branimir.lam...@datastax.com>
> > > Date: Wednesday, 21 July 2021 at 14:28
> > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > In general, I think we need to make up our mind as to whether we
> > >   consider the Memtable and CommitLog one logical entity [...], or
> > >   whether we want to further untangle those two components from an
> > >   architectural perspective which we started down that road on with
> > >   the pluggable storage engine work.
> > >
> > > This CEP is intentionally not attempting to answer this question. FWIW
> > > I do not see them as separable (there's evidence to this fact in the
> > > codebase), but there are valid secondary uses of the commit log that
> > > are served well enough by the current architecture.
> > >
> > > It is important, however, to let the memtable implementation opt out,
> > > to permit it to provide its own solution for data persistence.
> > >
> > > We should revisit this in the future, especially if Benedict's shared
> > > log facility and my plans for a memtable-as-a-commitlog-index
> > > evolve.
> > >
> > > Regards,
> > > Branimir
> > >
> > > On Wed, Jul 21, 2021 at 1:34 PM Michael Burman <y...@iki.fi> wrote:
> > >
> > > > Hi,
> > > >
> > > > It is nice to see these going forward (and a great use of CEP) so
> > thanks
> > > > for the proposal. I have my reservations regarding the linking of
> > > memtable
> > > > to CommitLog and flushing and should not leak abstraction from one to
> > > > another. And I don't see the reasoning why they should be, it doesn't
> > > seem
> > > > to add anything else than tight coupling of components, reducing
> reuse
> > > and
> > > > making things unnecessarily complicated. Also, the streaming notions
> > seem
> > > > weird to me - how are they related to memtable? Why should memtable
> > care
> > > > about the behavior outside memtable's responsibility?
> > > >
> > > > Some misc (with some thoughts split / duplicated to different parts)
> > > quotes
> > > > and comments:
> > > >
> > > > > Tight coupling between CFS and memtable will be reduced: flushing
> > > > functionality is to be extracted, controlling memtable memory and
> > period
> > > > expiration will be handled by the memtable.
> > > >
> > > > Why is flushing control bad to do in CFS and better in the memtable?
> > > Doing
> > > > it outside memtable would allow to control the flushing regardless of
> > how
> > > > the actual memtable is implemented. For example, lets say someone
> would
> > > > want to implement the HBase's accordion to Cassandra. It shouldn't
> > matter
> > > > what the implementation of memtable is as the compaction of different
> > > > memtables could be beneficial to all implementations. Or the flushing
> > > would
> > > > push the memtable to a proper caching instead of only to disk.
> > > >
> > > > Or if we had per table caching structure, we could control the
> flushing
> > > of
> > > > memtables and the cache structure separately. Some data benefits from
> > LRU
> > > > and some from MRW (most-recently-written) caching strategies. But
> both
> > > > could benefit from the same memtable implementation, it's the data
> and
> > > how
> > > > its used that could control how the flushing should work. For example
> > > time
> > > > series data behaves quite differently in terms of data accesses to
> > > > something more "random".
> > > >
> > > > Or even "total memory control" which would check which tables need
> more
> > > > memory to do their writes and which do not. Or that the memory
> doesn't
> > > grow
> > > > over a boundary and needs to manually maintain how much is dedicated
> to
> > > > caching and how much to memtables waiting to be flushed. Or delay
> > > flushing
> > > > because the disks can't keep up etc. Not to be implemented in this
> CEP,
> > > but
> > > > pushing this strategy to memtable would prevent many features.
> > > >
> > > > > Beyond thread-safety, the concurrency constraints of the memtable
> are
> > > > intentionally left unspecified.
> > > >
> > > > I like this. I could see use-cases where a single-thread
> implementation
> > > > could actually outperform some concurrent data structures. But it
> also
> > > > provides me with a question, is this proposal going to take an angle
> > > > towards per-range memtables? There are certainly benefits to
> splitting
> > > the
> > > > memtables as it would reduce the "n" in the operations, thus
> providing
> > > less
> > > > overhead in lookups and writes. Although, taking it one step
> backwards
> > I
> > > > could see the benefit of having a commitlog per range also, which
> would
> > > > allow higher utilization of NVME drives with larger queue depths. And
> > why
> > > > not per-range-sstables for faster scale-outs and .. a bit outside the
> > > scope
> > > > of CEP, but just to ensure that the implementation does not block
> such
> > > > improvement.
> > > >
> > > > Interfaces:
> > > >
> > > > > boolean writesAreDurable()
> > > > > boolean writesShouldSkipCommitLog()
> > > >
> > > > The placement inside memtable implementation for these methods just
> > feels
> > > > incredibly wrong to me. The writing pipeline should have these
> > configured
> > > > and they could differ for each table even with the same memtable
> > > > implementation. Lets take the example of an in-memory memtable use
> case
> > > > that's never written to a SSTable. We could have one table with just
> > > simply
> > > > in-memory cached storage and another one with a Redis style
> persistence
> > > of
> > > > AOF, where writes would be written to the commitlog for fast
> recovery,
> > > but
> > > > the data is otherwise always only kept in the memtable instead of
> > writing
> > > > to the SSTable (for performance reasons). Same implementation of
> > memtable
> > > > still.
> > > >
> > > > Why would the write process of the table not ask the table what
> > settings
> > > it
> > > > has and instead asks the memtable what settings the table has? This
> > seems
> > > > counterintuitive to me. Even the persistent memory case is a bit
> > > > questionable, why not simply disable commitlog in the writing
> process?
> > > Why
> > > > ask the memtable?
> > > >
> > > > This feels like memtable is going to be the write pipeline, but to me
> > > that
> > > > doesn't feel like the correct architectural decision. I'd rather see
> > > these
> > > > decisions done outside the memtable. Even a persistent memory
> memtable
> > > user
> > > > might want to have a commitlog enabled for data capture / shipping
> > logs,
> > > or
> > > > layers of persistence speed. The whole persistent memory without any
> > > > commercially known future is a bit weird at the moment (even Optane
> has
> > > no
> > > > known manufacturing anymore with last factory being dismantled based
> on
> > > > public information).
> > > >
> > > > > boolean streamToMemtable()
> > > >
> > > > And that one I don't understand. Why is streaming in the memtable?
> This
> > > > smells like a scope creep from something else. The explanation would
> > > > indicate to me that the wanted behavior is just disabling automated
> > > > flushing.
> > > >
> > > > But these are just some questions that came to my mind while reading
> > > this.
> > > > And I don't want to sound too negative (most of the features are
> really
> > > > something I'd like to see), perhaps I just misunderstood some of the
> > > > motivations why stuff should be brought to memtable instead of being
> > > > implemented outside memtable. Perhaps there's something else in the
> > write
> > > > pipeline arch that needs fixing but is now masqueraded inside this
> CEP.
> > > >
> > > > I'm definitely interested to hear more.
> > > >
> > > >   - Micke
> > > >
> > > > On Wed, 21 Jul 2021 at 08:24, Berenguer Blasi <
> > berenguerbl...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1. De-tangling, going more modular and clean interfaces sgtm.
> > > > >
> > > > > On 20/7/21 21:45, Nate McCall wrote:
> > > > > > Yay for pluggable memtables!! I havent gone over this in detail
> > yet,
> > > > but
> > > > > > personally I've always thought integrating something like Arrow
> > would
> > > > be
> > > > > > cool for sharing data (that's as far as i've gotten, but anything
> > > that
> > > > > > makes that kind of experimentation easier would also help with
> > > mocking
> > > > > test
> > > > > > plumbing, so +1 from me).
> > > > > >
> > > > > > Thanks for putting this together!
> > > > > >
> > > > > > -Nate
> > > > > >
> > > > > > On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
> > > > > > branimir.lam...@datastax.com> wrote:
> > > > > >
> > > > > >> Proposal for a mechanism for plugging in memtable
> implementations:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
> > > > > >>
> > > > > >> The proposal supports using custom memtable implementations to
> > > support
> > > > > >> development and testing of improved alternatives, but also
> > enables a
> > > > > >> broader definition of "memtable" to better support more advanced
> > use
> > > > > cases
> > > > > >> like persistent memory. To this end, memtable implementations
> are
> > > > given
> > > > > >> control over flushing and storing data in the commit log,
> enabling
> > > > > >> solutions that implement their own durability mechanisms and
> live
> > > much
> > > > > >> longer than their classical counterparts. Taken to the extreme,
> > this
> > > > > also
> > > > > >> enables memtables that never flush (in other words, alternative
> > > > storage
> > > > > >> engines) in a minimally-invasive manner.
> > > > > >>
> > > > > >> I am curious to hear your thoughts on the proposal.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Branimir
> > > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Branimir Lambov
> > > e. branimir.lam...@datastax.com
> > > w. www.datastax.com<http://www.datastax.com>
> > >
> >
> >
> > --
> > Branimir Lambov
> > e. branimir.lam...@datastax.com
> > w. www.datastax.com
> >
>


-- 
Branimir Lambov
e. branimir.lam...@datastax.com
w. www.datastax.com

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

Reply via email to