>From the CEP:

Batches (including unconditional batches) on transactional tables will receive 
ACID properties, and grammatically correct conditional batch operations that 
would be rejected for operating over multiple CQL partitions will now be 
supported


From: Paulo Motta <pauloricard...@gmail.com>
Date: Friday, 1 October 2021 at 15:30
To: Cassandra DEV <dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Can you just answer what palpable feature will be available once this CEP
lands because this is still not clear to me (and perhaps to others) from
the current CEP structure. The current document details thoroughly the
protocol but in my view lacks to illustrate what specific API, methods,
modules will become available to developers, how it fits into the larger
picture and interacts with existing modules if at all and perhaps a few
examples of how it can be used to build features on top.

Em sex., 1 de out. de 2021 às 11:10, bened...@apache.org <
bened...@apache.org> escreveu:

> I’m not, though it might seem that way. I disagree with your views about
> how CEP should be structured. Since the CEP process was itself codified via
> the CEP process, if you want to recodify how CEP work, the correct way is
> via the CEP process itself.
>
> The discussion is being drawn in multiple directions away from the CEP
> itself, and I am trying to keep this particular thread focused on the
> business at hand, not meta discussions around CEP structure that will no
> doubt be unproductive given likely irreconcilable views about the topic,
> nor discussions about other CEP that could have been.
>
> If you want to start a separate exploratory discussion thread about CEP
> structure without filing a CEP feel free to do so.
>
>
> From: Paulo Motta <pauloricard...@gmail.com>
> Date: Friday, 1 October 2021 at 15:04
> To: Cassandra DEV <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > If you want to impose your views on CEP structure on others, please file
> a CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
> This sounds very kafkaesque. You know I won't file a meta-CEP to change the
> structure of CEP so you're just using this as an excuse to just shut the
> discussion on the lack of clarity on what actual palpable feature will be
> available once the CEP lands. :-)
>
> I'm just providing my humble feedback on how a CEP could be more digestible
> and easier to consume from an external point of view, and this seems like
> an appropriate and contextualized place to voice this opinion which is
> perhaps shared by others.
>
> Em sex., 1 de out. de 2021 às 10:55, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > I disagree with you. However, this is the wrong forum to have a meta
> > discussion about how CEP should be structured.
> >
> > If you want to impose your views on CEP structure on others, please file
> a
> > CEP with the additional restrictions and guidance you want to impose and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> >
> > From: Paulo Motta <pauloricard...@gmail.com>
> > Date: Friday, 1 October 2021 at 14:48
> > To: Cassandra DEV <dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > >  The proposal as it stands today is exceptionally thorough, more so
> than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > The protocol is thoroughly described, but in my view CEP is a forum to
> > discuss the high level architecture and plan for adding a full end-to-end
> > enhancement to the database, breaking it into sub-CEPs if needed, as long
> > as the full plan is known in advance, otherwise the community will not
> have
> > the context to judge the full extent and impact of the proposed
> > enhancement.
> >
> > > Since it remains unclear to me what either yourself or Jonathan want to
> > see as an alternative
> >
> > I would personally like to see something along these lines:
> >
> > CEP1: Add ACID-compliant atomic batches
> > - UX changes needed: none, CQL provides the grammar we need.
> > - Distributed transaction protocol needed: Accord (link to white paper if
> > you want specific details about the protcool)
> > - High-level architecture: what new components will be added, how
> existing
> > components will be modified, what new messages will be added, what new
> > configuration knobs will be introduced, what are the milestones of the
> > project, etc.
> >
> > CEP2: Make LWT faster and more reliable
> > - UX changes needed: none
> > - Distributed transaction protocol needed: Accord, already added by
> > previous CEP.
> > - High-level architecture: blablabla... and so on.
> >
> > Em sex., 1 de out. de 2021 às 10:19, bened...@apache.org <
> > bened...@apache.org> escreveu:
> >
> > > I think this is getting circular and unproductive. Basic disagreements
> > > about whether the CEP specifies a feature I am inclined to leave for a
> > > vote. In my view the CEP specifies several features, both immediate
> ones
> > > for the user (ACID batches and multi-key LWTS) and developer-focused
> ones
> > > around ground-breaking semantics that will be enabled.
> > >
> > > The proposal as it stands today is exceptionally thorough, more so than
> > > any other CEP to date, or any CEP is likely to be in the near future.
> > >
> > > This is a Cassandra Enhancement *Proposal*, and at some point we have
> to
> > > engage with what is proposed, not what you might like to be proposed.
> > Since
> > > it remains unclear to me what either yourself or Jonathan want to see
> as
> > an
> > > alternative, at this point it would seem more productive to produce
> your
> > > own proposals for the community to consider. It is possible for
> multiple
> > > transaction systems to co-exist, if you feel this is necessary.
> > >
> > >
> > >
> > > From: Paulo Motta <pauloricard...@gmail.com>
> > > Date: Friday, 1 October 2021 at 13:58
> > > To: Cassandra DEV <dev@cassandra.apache.org>
> > > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > I share similar feelings as jbellis that this proposal seems to be
> > focusing
> > > on the protocol itself but lacking the actual feature that will use the
> > > protocol which IMO a key element to discuss on a CEP.
> > >
> > > It's similar to saying: hey I want to add this Tries Serialization
> > Protocol
> > > to Cassandra, but not providing specific details of how this protocol
> is
> > > going to be used.
> > >
> > > I think the right route for a CEP is to describe the feature that will
> be
> > > added to the database and the protocol is a mere requirement of the
> > > high-level feature, for example:
> > >
> > > CEP: Add Trie-backed memtable
> > > - Trie Serialization Protocol: implementation detail of the above CEP
> > >
> > > What is the difficulty of taking this approach, picking one of the
> myriad
> > > of features that will be enabled by Accord and using that as the
> initial
> > > CEP to introduce the protocol to the database?
> > >
> > > Em sex., 1 de out. de 2021 às 08:37, bened...@apache.org <
> > > bened...@apache.org> escreveu:
> > >
> > > > Actually, thinking about it again, the simple optimistic protocol
> would
> > > in
> > > > fact guarantee system forward progress (i.e. independent of
> transaction
> > > > formulation).
> > > >
> > > >
> > > > From: bened...@apache.org <bened...@apache.org>
> > > > Date: Friday, 1 October 2021 at 09:14
> > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > > Hi Jonathan,
> > > >
> > > > It would be great if we could achieve a bandwidth higher than 1-2
> short
> > > > emails per week. It remains unclear to me what your goal is, and it
> > would
> > > > help if you could make a statement like “I want Cassandra to be able
> to
> > > do
> > > > X” so that we can respond directly to it. I am also available to have
> > > > another call, in which we can have a back and forth, please feel free
> > to
> > > > propose a London-compatible time within the next week that is
> suitable
> > > for
> > > > you.
> > > >
> > > > In my opinion we are at risk of veering off-topic, though. This CEP
> is
> > > not
> > > > to deliver interactive transactions, and to my knowledge nobody is
> > > > proposing a CEP for interactive transactions. So, for the CEP at hand
> > the
> > > > salient question seems: does this CEP prevent us from implementing
> > > > interactive transactions with properties X, Y, Z in future? To which
> > the
> > > > answer is almost certainly no.
> > > >
> > > > However, to continue the discussion and respond directly to your
> > queries,
> > > > I believe we agree on the definition of an interactive transaction.
> > > >
> > > > Two protocols were loosely outlined. The first, using timestamps for
> > > > optimistic concurrency control, would indeed involve the possibility
> of
> > > > aborts. It would not however inherently adopt the issue of LWTs where
> > no
> > > > transaction is able to make progress. Whether or not progress is
> > > guaranteed
> > > > (in a livelock-free sense) would depend on the structure of the
> > > > transactions that were interfering.
> > > >
> > > > This approach has the advantage of being very simple to implement, so
> > > that
> > > > we could realistically support interactive transactions quite
> quickly.
> > It
> > > > has the additional advantage that transactions would execute very
> > quickly
> > > > by avoiding the WAN during construction, and as a result may in
> > practice
> > > > experience fewer aborts than protocols that guarantee
> livelock-freedom.
> > > >
> > > > The second protocol proposed using read/write intents and would be
> able
> > > to
> > > > support almost any behaviour you want. We could even utilise
> > pessimistic
> > > > concurrency control, or anything in-between. This is its own huge
> > design
> > > > space, and discussion of this approach and the trade-offs that could
> be
> > > > made is (in my opinion) entirely out of scope for this CEP.
> > > >
> > > >
> > > > From: Jonathan Ellis <jbel...@gmail.com>
> > > > Date: Friday, 1 October 2021 at 05:00
> > > > To: dev <dev@cassandra.apache.org>
> > > > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > > The obstacle for me is you've provided a protocol but not a fully
> > fleshed
> > > > out architecture, so it's hard to fill in some of the blanks.  But it
> > > looks
> > > > to me like optimistic concurrency control for interactive
> transactions
> > > > applied to Accord would leave you in a LWT-like situation under
> fairly
> > > > light contention where nobody actually makes progress due to retries.
> > > >
> > > > To make sure we're talking about the same thing, as Henrik pointed
> out,
> > > > interactive transactions mean multiple round trips from the client
> > > within a
> > > > transaction.  For example, here
> > > > <
> > > >
> > >
> >
> https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213
> > > > >
> > > > is a simple implementation of the TPC-C New Order transaction.  The
> > high
> > > > level logic (via
> > > > <
> > > >
> > >
> >
> https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm
> > > > >)
> > > > is,
> > > >
> > > >    1. Get records describing a warehouse, customer, & district
> > > >    2. Update the district
> > > >    3. Increment next available order number
> > > >    4. Insert record into Order and New-Order tables
> > > >    5. For 5-15 items, get Item record, get/update Stock record
> > > >    6. Insert Order-Line Record
> > > >
> > > > As you can see, this requires a lot of client-side logic mixed in
> with
> > > the
> > > > actual SQL commands.
> > > >
> > > >
> > > > On Thu, Sep 30, 2021 at 2:30 AM bened...@apache.org <
> > bened...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Essentially this, although I think in practice we will need to
> track
> > > each
> > > > > partition’s timestamp separately (or optionally for reduced
> > conflicts,
> > > > each
> > > > > row or datum’s), and make them all part of the conditional
> > application
> > > of
> > > > > the transaction - at least for strict-serializability.
> > > > >
> > > > > The alternative is to insert read/write intents for the transaction
> > > > during
> > > > > each step, and to confirm they are still valid on commit, but this
> > > > approach
> > > > > would require a WAN round-trip for each step in the interactive
> > > > > transaction, whereas the timestamp-validating approach can use a
> LAN
> > > > > round-trip for each step besides the final one, and is also much
> > > simpler
> > > > to
> > > > > implement.
> > > > >
> > > > >
> > > > > From: Blake Eggleston <beggles...@apple.com.INVALID>
> > > > > Date: Thursday, 30 September 2021 at 05:47
> > > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > > > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > > > You could establish a lower timestamp bound and buffer transaction
> > > state
> > > > > on the coordinator, then make the commit an operation that only
> > applies
> > > > if
> > > > > all partitions involved haven’t been changed by a more recent
> > > timestamp.
> > > > > You could also implement mvcc either in the storage layer or for
> some
> > > > > period of time by buffering commits on each replica before
> applying.
> > > > >
> > > > > > On Sep 29, 2021, at 6:18 PM, Jonathan Ellis <jbel...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > How are interactive transactions possible with Accord?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 21, 2021 at 11:56 PM bened...@apache.org <
> > > > > bened...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Could you explain why you believe this trade-off is necessary?
> We
> > > can
> > > > > >> support full SQL just fine with Accord, and I hope that we
> > > eventually
> > > > > do so.
> > > > > >>
> > > > > >> This domain is incredibly complex, so it is easy to reach wrong
> > > > > >> conclusions. I would invite you again to propose a system for
> > > > discussion
> > > > > >> that you think offers something Accord is unable to, and that
> you
> > > > > consider
> > > > > >> desirable, and we can work from there.
> > > > > >>
> > > > > >> To pre-empt some possible discussions, I am not aware of
> anything
> > we
> > > > > >> cannot do with Accord that we could do with either Calvin or
> > > Spanner.
> > > > > >> Interactive transactions are possible on top of Accord, as are
> > > > > transactions
> > > > > >> with an unknown read/write set. In each case the only cost is
> that
> > > > they
> > > > > >> would use optimistic concurrency control, which is no worse the
> > > > spanner
> > > > > >> derivatives anyway (which I have to assume is your benchmark in
> > this
> > > > > >> regard). I do not expect to deliver either functionality
> > initially,
> > > > but
> > > > > >> Accord takes us most of the way there for both.
> > > > > >>
> > > > > >>
> > > > > >> From: Jonathan Ellis <jbel...@gmail.com>
> > > > > >> Date: Wednesday, 22 September 2021 at 05:36
> > > > > >> To: dev <dev@cassandra.apache.org>
> > > > > >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > > > >> Right, I'm looking for exactly a discussion on the high level
> > goals.
> > > > > >> Instead of saying "here's the goals and we ruled out X because
> Y"
> > we
> > > > > should
> > > > > >> start with a discussion around, "Approach A allows X and W,
> > > approach B
> > > > > >> allows Y and Z" and decide together what the goals should be and
> > and
> > > > > what
> > > > > >> we are willing to trade to get those goals, e.g., are we willing
> > to
> > > > > give up
> > > > > >> global strict serializability to get the ability to support full
> > > SQL.
> > > > > Both
> > > > > >> of these are nice to have!
> > > > > >>
> > > > > >> On Tue, Sep 21, 2021 at 9:52 PM bened...@apache.org <
> > > > > bened...@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi Jonathan,
> > > > > >>>
> > > > > >>> These other systems are incompatible with the goals of the
> CEP. I
> > > do
> > > > > >>> discuss them (besides 2PC) in both the whitepaper and the CEP,
> > and
> > > > will
> > > > > >>> summarise that discussion below. A true and accurate comparison
> > of
> > > > > these
> > > > > >>> other systems is essentially intractable, as there are complex
> > > > > subtleties
> > > > > >>> to each flavour, and those who are interested would be better
> > > served
> > > > by
> > > > > >>> performing their own research.
> > > > > >>>
> > > > > >>> I think it is more productive to focus on what we want to
> achieve
> > > as
> > > > a
> > > > > >>> community. If you believe the goals of this CEP are wrong for
> the
> > > > > >> project,
> > > > > >>> let’s focus on that. If you want to compare and contrast
> specific
> > > > > facets
> > > > > >> of
> > > > > >>> alternative systems that you consider to be preferable in some
> > > > > dimension,
> > > > > >>> let’s do that here or in a Q&A as proposed by Joey.
> > > > > >>>
> > > > > >>> The relevant goals are that we:
> > > > > >>>
> > > > > >>>
> > > > > >>>  1.  Guarantee strict serializable isolation on commodity
> > hardware
> > > > > >>>  2.  Scale to any cluster size
> > > > > >>>  3.  Achieve optimal latency
> > > > > >>>
> > > > > >>> The approach taken by Spanner derivatives is rejected by (1)
> > > because
> > > > > they
> > > > > >>> guarantee only Serializable isolation (they additionally fail
> > (3)).
> > > > > From
> > > > > >>> watching talks by YugaByte, and inferring from Cockroach’s
> > > > > >>> panic-cluster-death under clock skew, this is clearly
> considered
> > by
> > > > > >>> everyone to be undesirable but necessary to achieve
> scalability.
> > > > > >>>
> > > > > >>> The approach taken by FaunaDB (Calvin) is rejected by (2)
> because
> > > its
> > > > > >>> sequencing layer requires a global leader process for the
> > cluster,
> > > > > which
> > > > > >> is
> > > > > >>> incompatible with Cassandra’s scalability requirements. It
> > > > additionally
> > > > > >>> fails (3) for global clients.
> > > > > >>>
> > > > > >>> Two phase commit fails (3). As an aside, AFAICT DynamoDB is
> > today a
> > > > > >>> Spanner clone for its multi-key transaction functionality, not
> > 2PC.
> > > > > >>>
> > > > > >>> Systems such as RAMP with even weaker isolation are not
> > considered
> > > > for
> > > > > >> the
> > > > > >>> simple reason that they do not even claim to meet (1).
> > > > > >>>
> > > > > >>> If we want to additionally offer weaker isolation levels than
> > > > > >>> Serializable, such as that provided by the recent RAMP-TAO
> paper,
> > > > > >> Cassandra
> > > > > >>> is likely able to support multiple distinct transaction layers
> > that
> > > > > >> operate
> > > > > >>> independently. I would encourage you to file a CEP to explore
> how
> > > we
> > > > > can
> > > > > >>> meet these distinct use cases, but I consider them to be
> niche. I
> > > > > expect
> > > > > >>> that a majority of our user base desire strict serializable
> > > > isolation,
> > > > > >> and
> > > > > >>> certainly no less than serializable isolation, to augment the
> > > > existing
> > > > > >>> weaker isolation offered by quorum reads and writes.
> > > > > >>>
> > > > > >>> I would tangentially note that we are not an AP database under
> > > normal
> > > > > >>> recommended operation. A minority in any network partition
> cannot
> > > > reach
> > > > > >>> QUORUM, so under recommended usage we are a high-availability
> > > > > leaderless
> > > > > >> CP
> > > > > >>> database.
> > > > > >>>
> > > > > >>>
> > > > > >>> From: Jonathan Ellis <jbel...@gmail.com>
> > > > > >>> Date: Tuesday, 21 September 2021 at 23:45
> > > > > >>> To: dev <dev@cassandra.apache.org>
> > > > > >>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > > > >>> Benedict, thanks for taking the lead in putting this together.
> > > Since
> > > > > >>> Cassandra is the only relevant database today designed around a
> > > > > >> leaderless
> > > > > >>> architecture, it's quite likely that we'll be better served
> with
> > a
> > > > > custom
> > > > > >>> transaction design instead of trying to retrofit one from CP
> > > systems.
> > > > > >>>
> > > > > >>> The whitepaper here is a good description of the consensus
> > > algorithm
> > > > > >> itself
> > > > > >>> as well as its robustness and stability characteristics, and
> its
> > > > > >> comparison
> > > > > >>> with other state-of-the-art consensus algorithms is very
> useful.
> > > In
> > > > > the
> > > > > >>> context of Cassandra, where a consensus algorithm is only part
> of
> > > > what
> > > > > >> will
> > > > > >>> be implemented, I'd like to see a more complete evaluation of
> the
> > > > > >>> transactional side of things as well, including performance
> > > > > >> characteristics
> > > > > >>> as well as the types of transactions that can be supported and
> at
> > > > > least a
> > > > > >>> general idea of what it would look like applied to Cassandra.
> > This
> > > > will
> > > > > >>> allow the PMC to make a more informed decision about what
> > tradeoffs
> > > > are
> > > > > >>> best for the entire long-term project of first supplementing
> and
> > > > > >> ultimately
> > > > > >>> replacing LWT.
> > > > > >>>
> > > > > >>> (Allowing users to mix LWT and AP Cassandra operations against
> > the
> > > > same
> > > > > >>> rows was probably a mistake, so in contrast with LWT we’re not
> > > > looking
> > > > > >> for
> > > > > >>> something fast enough for occasional use but rather something
> > > within
> > > > a
> > > > > >>> reasonable factor of AP operations, appropriate to being the
> only
> > > way
> > > > > to
> > > > > >>> interact with tables declared as such.)
> > > > > >>>
> > > > > >>> Besides Accord, this should cover
> > > > > >>>
> > > > > >>> - Calvin and FaunaDB
> > > > > >>> - A Spanner derivative (no opinion on whether that should be
> > > > Cockroach
> > > > > or
> > > > > >>> Yugabyte, I don’t think it’s necessary to cover both)
> > > > > >>> - A 2PC implementation (the Accord paper mentions DynamoDB but
> I
> > > > > suspect
> > > > > >>> there is more public information about MongoDB)
> > > > > >>> - RAMP
> > > > > >>>
> > > > > >>> Here’s an example of what I mean:
> > > > > >>>
> > > > > >>> =Calvin=
> > > > > >>>
> > > > > >>> Approach: global consensus (Paxos in Calvin, Raft in FaunaDB)
> to
> > > > order
> > > > > >>> transactions, then replicas execute the transactions
> > independently
> > > > with
> > > > > >> no
> > > > > >>> further coordination.  No SPOF.  Transactions are batched by
> each
> > > > > >> sequencer
> > > > > >>> to keep this from becoming a bottleneck.
> > > > > >>>
> > > > > >>> Performance: Calvin paper (published 2012) reports linear
> scaling
> > > of
> > > > > >> TPC-C
> > > > > >>> New Order up to 500,000 transactions/s on 100 machines (EC2 XL
> > > > machines
> > > > > >>> with 7GB ram and 8 virtual cores).  Note that TPC-C New Order
> is
> > > > > composed
> > > > > >>> of four reads and four writes, so this is effectively 2M reads
> > and
> > > 2M
> > > > > >>> writes as we normally measure them in C*.
> > > > > >>>
> > > > > >>> Calvin supports mixed read/write transactions, but because the
> > > > > >> transaction
> > > > > >>> execution logic requires knowing all partition keys in advance
> to
> > > > > ensure
> > > > > >>> that all replicas can reproduce the same results with no
> > > > coordination,
> > > > > >>> reads against non-PK predicates must be done ahead of time
> > > > > >> (transparently,
> > > > > >>> by the server) to determine the set of keys, and this must be
> > > retried
> > > > > if
> > > > > >>> the set of rows affected is updated before the actual
> transaction
> > > > > >> executes.
> > > > > >>>
> > > > > >>> Batching and global consensus adds latency -- 100ms in the
> Calvin
> > > > paper
> > > > > >> and
> > > > > >>> apparently about 50ms in FaunaDB.  Glass half full: all
> > > transactions
> > > > > >>> (including multi-partition updates) are equally performant in
> > > Calvin
> > > > > >> since
> > > > > >>> the coordination is handled up front in the sequencing step.
> > Glass
> > > > > half
> > > > > >>> empty: even single-row reads and writes have to pay the full
> > > > > coordination
> > > > > >>> cost.  Fauna has optimized this away for reads but I am not
> aware
> > > of
> > > > a
> > > > > >>> description of how they changed the design to allow this.
> > > > > >>>
> > > > > >>> Functionality and limitations: since the entire transaction
> must
> > be
> > > > > known
> > > > > >>> in advance to allow coordination-less execution at the
> replicas,
> > > > Calvin
> > > > > >>> cannot support interactive transactions at all.  FaunaDB
> > mitigates
> > > > this
> > > > > >> by
> > > > > >>> allowing server-side logic to be included, but a Calvin
> approach
> > > will
> > > > > >> never
> > > > > >>> be able to offer SQL compatibility.
> > > > > >>>
> > > > > >>> Guarantees: Calvin transactions are strictly serializable.
> There
> > > is
> > > > no
> > > > > >>> additional complexity or performance hit to generalizing to
> > > multiple
> > > > > >>> regions, apart from the speed of light.  And since Calvin is
> > > already
> > > > > >> paying
> > > > > >>> a batching latency penalty, this is less painful than for other
> > > > > systems.
> > > > > >>>
> > > > > >>> Application to Cassandra: B-.  Distributed transactions are
> > handled
> > > > by
> > > > > >> the
> > > > > >>> sequencing and scheduling layers, which are leaderless, and
> > > Calvin’s
> > > > > >>> requirements for the storage layer are easily met by C*.  But
> > > Calvin
> > > > > also
> > > > > >>> requires a global consensus protocol and LWT is almost
> certainly
> > > not
> > > > > >>> sufficiently performant, so this would require ZK or etcd
> > > (reasonable
> > > > > >> for a
> > > > > >>> library approach but not for replacing LWT in C* itself), or an
> > > > > >>> implementation of Accord.  I don’t believe Calvin would require
> > > > > >> additional
> > > > > >>> table-level metadata in Cassandra.
> > > > > >>>
> > > > > >>> On Sun, Sep 5, 2021 at 9:33 AM bened...@apache.org <
> > > > > bened...@apache.org>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Wiki:
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> > > > > >>>> Whitepaper:
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> > > > > >>>> <
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> > > > > >>>>>
> > > > > >>>> Prototype: https://github.com/belliottsmith/accord
> > > > > >>>>
> > > > > >>>> Hi everyone, I’d like to propose this CEP for adoption by the
> > > > > >> community.
> > > > > >>>>
> > > > > >>>> Cassandra has benefitted from LWTs for many years, but
> > application
> > > > > >>>> developers that want to ensure consistency for complex
> > operations
> > > > must
> > > > > >>>> either accept the scalability bottleneck of serializing all
> > > related
> > > > > >> state
> > > > > >>>> through a single partition, or layer a complex state machine
> on
> > > top
> > > > of
> > > > > >>> the
> > > > > >>>> database. These are sophisticated and costly activities that
> our
> > > > users
> > > > > >>>> should not be expected to undertake. Since distributed
> databases
> > > are
> > > > > >>>> beginning to offer distributed transactions with fewer
> caveats,
> > it
> > > > is
> > > > > >>> past
> > > > > >>>> time for Cassandra to do so as well.
> > > > > >>>>
> > > > > >>>> This CEP proposes the use of several novel techniques that
> build
> > > > upon
> > > > > >>>> research (that followed EPaxos) to deliver (non-interactive)
> > > general
> > > > > >>>> purpose distributed transactions. The approach is outlined in
> > the
> > > > > >>> wikipage
> > > > > >>>> and in more detail in the linked whitepaper. Importantly, by
> > > > adopting
> > > > > >>> this
> > > > > >>>> approach we will be the _only_ distributed database to offer
> > > global,
> > > > > >>>> scalable, strict serializable transactions in one wide area
> > > > > round-trip.
> > > > > >>>> This would represent a significant improvement in the state of
> > the
> > > > > art,
> > > > > >>>> both in the academic literature and in commercial or open
> source
> > > > > >>> offerings.
> > > > > >>>>
> > > > > >>>> This work has been partially realised in a prototype. This
> > partial
> > > > > >>>> prototype has been verified against Jepsen.io’s Maelstrom
> > library
> > > > and
> > > > > >>>> dedicated in-tree strict serializability verification tools,
> but
> > > > much
> > > > > >>> work
> > > > > >>>> remains for the work to be production capable and integrated
> > into
> > > > > >>> Cassandra.
> > > > > >>>>
> > > > > >>>> I propose including the prototype in the project as a new
> source
> > > > > >>>> repository, to be developed as a standalone library for
> > > integration
> > > > > >> into
> > > > > >>>> Cassandra. I hope the community sees the important value
> > > proposition
> > > > > of
> > > > > >>>> this proposal, and will adopt the CEP after this discussion,
> so
> > > that
> > > > > >> the
> > > > > >>>> library and its integration into Cassandra can be developed in
> > > > > parallel
> > > > > >>> and
> > > > > >>>> with the involvement of the wider community.
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Jonathan Ellis
> > > > > >>> co-founder, http://www.datastax.com
> > > > > >>> @spyced
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Jonathan Ellis
> > > > > >> co-founder, http://www.datastax.com
> > > > > >> @spyced
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jonathan Ellis
> > > > > > co-founder, http://www.datastax.com
> > > > > > @spyced
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >
> > > >
> > > >
> > > > --
> > > > Jonathan Ellis
> > > > co-founder, http://www.datastax.com
> > > > @spyced
> > > >
> > >
> >
>

Reply via email to