Re: [DISCUSS] CEP-15: General Purpose Transactions

bened...@apache.org Mon, 20 Sep 2021 09:52:41 -0700

Hi Miles,

Thanks for the interest and your questions. So far as I can tell from your 
questions you are basing your understanding on the wiki page – in which case I 
would recommend reading the whitepaper, which answers most of your questions. 
Unfortunately I did not understand every question, but I have tried my best to 
respond to every point.


> is there any plan to ensure safety under scaling operations or DC 
> (de)commissioning?

This is a topic that has come up before under regular Paxos. Many of the 
topology changes will be safe for Paxos in the near future, and we certainly 
expect Accord to be safe under all topology changes. Some of this will require 
improvements to Cassandra that colleagues will be proposing more generally in 
the near future, but the whitepaper outlines Accord’s side of the equation in 
section 5.

> What consistency levels will be supported under Accord?

This is related to the above discussion that I expect to be addressed by 
colleagues in the near future. I can discuss my beliefs about how the project 
should move forwards on this, but since I’ve already done so in the past, and I 
don’t think it is core to Accord, I think it is probably best handled in a 
dedicated discussion.

> Further explanation here would be good.

I think on this topic it would be best to consult the whitepaper, as it 
describes dependencies quite precisely better than I can do here. Some 
familiarity with the literature e.g. EPaxos would also help.

In short: there is no dependency graph for any transaction, just a dependency 
set. This is a set of transactions that _may_ execute before us. It includes 
all transactions that _will_ execute before us, but each coordinator for the 
transaction (i.e. if the coordinator fails and another takes its place) may 
assemble a different super-set of the true execution dependencies (i.e. those 
that will actually execute before us)

> What is t0 here?

This is again described precisely in the whitepaper, but it in essence it is 
any value the coordinator would like to propose as an execution timestamp (so 
long as it is globally unique). Normally this is simply (now,coordinator-id).

> Do we have one clock-per-shard-per-node? Or is there a single clock for all 
> transactions on a node?

One per unique global identity, so for simplicity probably per node, but it 
could easily be per shard, or more granular – another global identity would 
simply need to be issued to ensure the logical clock produces globally unique 
values.

> What happens in network partitions?

I’m sorry, I don’t understand the question.

> In a cross-shard transaction does maintaining simple majorities of replicas 
> protect you from potential inconsistencies arising when a transaction W10 
> addressing partitions p1, p2 comes from a different majority (potentially 
> isolated due to a network partition) from earlier writes W[1,9] to p1 only?
It seems that this may cause a sudden change to the dependancy graph for 
partition p2 which may render it vulnerable to strange effects?

I don’t really follow the question, but there are no sudden changes to any 
dependencies, so I think the answer is no.

> Do we consider adversarial cases or any sort of byzantine faults? (That’s a 
> bit out of left field, feel free
to kick me.)

Specified in the paper, and the answer is “no”.

> Why do we prefer Lamport clocks to vector clocks or other types of logical 
> clock?

I’m unsure how to answer this question. Vector clocks are substantially more 
costly, and it’s unclear what they would offer us here. Perhaps you could 
explain why you would consider them, and what other logical clocks you are 
considering?

I will note I did not start from the concept of Lamport clocks, however they 
are a widely understood concept and really Accord boils down to using Lamport 
clocks to derive _some_ total order, and then ensuring that total order is 
durable. This seemed like a simple way to explain the protocol, since it’s only 
a sentence or two.

> Related to the earlier point: when we say `union` here - what set are we 
> forming a union over? Is it a union of all dependancies t_n < t as seen by 
> all coordinators? I presume that the logic precludes the possibility that 
> these dependancies will conflict, since all foregoing transactions which are 
> in progress as dependancies must be non-conflicting with earlier transactions 
> in the dependancy graph?

I don’t understand all of this question, sorry. To the first point, the 
coordinator will receive responses from some fast-path quorum of replicas. The 
responses will each contain the dependency set computed by the replica that 
sent it, and the coordinator will use the union of these sets as the total set 
of dependencies.

> In any case, further information about how the dependancy graph is computed 
> would be interesting.

It is defined precisely in the paper, but it is simply all those conflicting 
transactions the replica has seen that were initially proposed a lower 
execution timestamp.

> Every replica? Or only those participating in the transaction?

Those participating in the transaction.

> When speaking about the simple majority of nodes to whom the max(t) value 
> returned will be proposed to - It sounds like this need not be the same 
> majority from whom the original sets of T_n and dependancies was obtained?
> Is there a proof to show that the dependancies created from the union of the 
> first set of replicas resolves to an acceptable dependancy graph for an 
> arbitrary majority of replicas? (Especially given that a majority of replicas 
> is not a majority of nodes, given we are in a cross-shard scenario here).

I think there’s some confusion about how dependencies work, as well as some 
issues in nomenclature that mean I can’t unfortunately parse all of your 
questions. I think it might be better to revisit after digesting these answers. 
One confusion that I infer might be arising is that these majorities are _per 
shard_ not _global_. All quorums are obtained per-shard, and a cross-shard 
operation must achieve simultaneous quorums in every shard.

The paper has a brief proof of correctness you can read that I think is 
adequately compelling. We have a detailed proof that needs to be cleaned up 
before being published as an appendix.

> What happens in cases where the replica set has changed due to (a) scaling RF 
> in a single DC (b) adding a whole new DC?

> Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
> Lamport clocks only impose partial, not total order. I’m guessing we’re 
> thinking of a different type of logical clock when we speak of Lamport clocks 
> here (but my expertise is sketchy on this topic).

The original paper is available for you to skim. With the addition of a 
per-process id component a total order is achieved.

> I would be interested in further exploration of the unhappy path (where 'a 
> newer ballot has been issued by a recovery coordinator to take over the 
> transaction’). I understand that this may be partially covered in the 
> pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
> been issued’ language with the ‘any R in responses had X as Applied, 
> Committed, or Accepted’ language.

Again, here I would refer you to the whitepaper. However the pseudocode does 
mostly cover it, but it helps if you are conversant with consensus protocols, 
and especially leaderless ones.

The “any R in responses has X as Applied, Committed or Accepted” refers to the 
boolean (Applied|Committed|Accepted)[X]=true that is set on a replica during 
the execution of the protocol, as specified in their pseudocode.

The reference to newer ballots is simply classic paxos leader election, so that 
only one coordinator may complete the transaction.



From: Miles Garnsey <miles.garn...@datastax.com>
Date: Monday, 20 September 2021 at 09:34
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
If Accord can fulfil its aims it sounds like a huge improvement to the state of 
the art in distributed transaction processing. Congrats to all involved in 
pulling the proposal together.

I was holding off on feedback since this is quite in depth and I don’t want to 
bike shed, I still haven’t spent as much time understanding this as I’d like.

Regardless, I’ll make the following notes in case they’re helpful. My feedback 
is more to satisfy my own curiosity and stimulate discussion than to suggest 
that there are any flaws here. I applaud the proposed testing approach and 
think it is the only way to be certain that the proposed consistency guarantees 
will be upheld.

General

I’m curious if/how this proposal addresses issues we have seen when scaling; I 
see reference to simple majorities of nodes - is there any plan to ensure 
safety under scaling operations or DC (de)commissioning?

What consistency levels will be supported under Accord? Will it simply be a 
single CL representing a majority of nodes across the whole cluster? (This at 
least would mitigate the issues I’ve seen when folks want to switch from 
EACH_SERIAL to SERIAL).

Accord

> Accord instead assembles an inconsistent set of dependencies.


Further explanation here would be good. Do we mean to say that the dependancies 
may differ according to which transactions the coordinator has witnessed at the 
time the incoming transaction is first seen? This would make sense if some 
nodes had not fully committed a foregoing transaction.

Is it correct to think of this step as assembling a dependancy graph of 
foregoing transactions which must be completed ahead of progressing the 
incoming new transaction?

Fast Path

> A coordinator C proposes a timestamp t0 to at least a quorum of a fast path 
> electorate. If t0 is larger than all timestamps witnessed for all prior 
> conflicting transactions, t0 is accepted by a replica. If a fast path quorum 
> of responses accept, the transaction is agreed to execute at t0. Replicas 
> respond with the set of transactions they have witnessed that may execute 
> with a lower timestamp, i.e. those with a lower t0.

What is t0 here? I’m guessing it is the Lamport clock time of the most recent 
mutation to the partition? May be worth clarifying because otherwise the 
perception may be that it is the commencement time of the current transaction 
which may not be the intention.

Regarding the use of logical clocks in general -

Do we have one clock-per-shard-per-node? Or is there a single clock for all 
transactions on a node?
What happens in network partitions?
In a cross-shard transaction does maintaining simple majorities of replicas 
protect you from potential inconsistencies arising when a transaction W10 
addressing partitions p1, p2 comes from a different majority (potentially 
isolated due to a network partition) from earlier writes W[1,9] to p1 only?
It seems that this may cause a sudden change to the dependancy graph for 
partition p2 which may render it vulnerable to strange effects?
Do we consider adversarial cases or any sort of byzantine faults? (That’s a bit 
out of left field, feel free to kick me.)
Why do we prefer Lamport clocks to vector clocks or other types of logical 
clock?

Slow Path

> This value is proposed to at least a simple majority of nodes, along with the 
> union of the dependenciesreceived


Related to the earlier point: when we say `union` here - what set are we 
forming a union over? Is it a union of all dependancies t_n < t as seen by all 
coordinators? I presume that the logic precludes the possibility that these 
dependancies will conflict, since all foregoing transactions which are in 
progress as dependancies must be non-conflicting with earlier transactions in 
the dependancy graph?

In any case, further information about how the dependancy graph is computed 
would be interesting.

> The inclusion of dependencies in the proposal is solely to facilitate 
> Recovery of other transactions that may be incomplete - these are stored on 
> each replica to facilitate decisions at recovery.


Every replica? Or only those participating in the transaction?

> If C fails to reach fast path consensus it takes the highest t it witnessed 
> from its responses, which constitutes a simple Lamport clock value imposing a 
> valid total order. This value is proposed to at least a simple majority of 
> nodes,


When speaking about the simple majority of nodes to whom the max(t) value 
returned will be proposed to -
It sounds like this need not be the same majority from whom the original sets 
of T_n and dependancies was obtained?
Is there a proof to show that the dependancies created from the union of the 
first set of replicas resolves to an acceptable dependancy graph for an 
arbitrary majority of replicas? (Especially given that a majority of replicas 
is not a majority of nodes, given we are in a cross-shard scenario here).
What happens in cases where the replica set has changed due to (a) scaling RF 
in a single DC (b) adding a whole new DC?
Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
Lamport clocks only impose partial, not total order. I’m guessing we’re 
thinking of a different type of logical clock when we speak of Lamport clocks 
here (but my expertise is sketchy on this topic).

Recovery

I would be interested in further exploration of the unhappy path (where 'a 
newer ballot has been issued by a recovery coordinator to take over the 
transaction’). I understand that this may be partially covered in the 
pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
been issued’ language with the ‘any R in responses had X as Applied, Committed, 
or Accepted’ language.

Well done again and thank you for pushing the envelope in this area Benedict.

Miles

> On 15 Sep 2021, at 11:33 pm, bened...@apache.org wrote:
>
>> I would kind of expect this work, if it pans out, to _replace_ the current 
>> paxos implementation
>
> That’s a good point. I think the clear direction of travel would be total 
> replacement of Paxos, but I anticipate that this will be feature-flagged at 
> least initially. So for some period of time we may maintain both options, 
> with the advanced CQL functionality disabled if you opt for classic Paxos.
>
> I think this is a necessary corollary of a requirement to support live 
> upgrades – something that is non-negotiable IMO, but that I have also 
> neglected to discuss in the CEP. I will rectify this. An open question is if 
> we want to support live downgrades back to Classic Paxos. I kind of expect 
> that we will, though that will no doubt be informed by the difficulty of 
> doing so.
>
> Either way, this means the deprecation cycle for Classic Paxos is probably a 
> separate and future decision for the community. We could choose to maintain 
> it indefinitely, but I would vote to retire it the following major version.
>
> A related open question is defaults – I would probably vote for new clusters 
> to default to Accord, and existing clusters to need to run a migration 
> command after fully upgrading the cluster.
>
> From: Sylvain Lebresne <lebre...@gmail.com>
> Date: Wednesday, 15 September 2021 at 14:13
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Fwiw, it makes sense to me to talk about CQL syntax evolution separately.
>
> It's pretty clear to me that we _can_ extend CQL to make sure of a general
> purpose transaction mechanism, so I don't think deciding if we want a
> general purpose transaction mechanism has to depend on deciding on the
> syntax. Especially since the syntax question can get pretty far on its own
> and could be a serious upfront distraction.
>
> And as you said, there are even queries that can be expressed with the
> current syntax that we refuse now and would be able to accept with this, so
> those could be "ground zero" of what this work would allow.
>
> But outside of pure syntax questions, one thing that I don't see discussed
> in the CEP (or did I miss it) is what the relationship of this new
> mechanism with the existing paxos implementation would be? I would kind of
> expect this work, if it pans out, to _replace_ the current paxos
> implementation (because 1) why not and 2) the idea of having 2
> serialization mechanisms that serialize separately sounds like a nightmare
> from the user POV) but it isn't stated clearly. If replacement is indeed
> the intent, then I think there needs to be a plan for the upgrade path. If
> that's not the intent, then what?
> --
> Sylvain
>
>
> On Wed, Sep 15, 2021 at 12:09 PM bened...@apache.org <bened...@apache.org>
> wrote:
>
>> Ok, so the act of typing out an example was actually a really good
>> reminder of just how limited our functionality is today, even for single
>> partition operations.
>>
>> I don’t want to distract from any discussion around the underlying
>> protocol, but we could kick off a separate conversation about how to evolve
>> CQL sooner than later if there is the appetite. There are no concrete
>> proposals to discuss, it would be brainstorming.
>>
>> Do people also generally agree this work warrants a distinct CEP, or would
>> people prefer to see this developed under the same umbrella?
>>
>>
>>
>> From: bened...@apache.org <bened...@apache.org>
>> Date: Wednesday, 15 September 2021 at 09:19
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>>> perhaps we can prepare these as examples
>>
>> There are grammatically correct CQL queries today that cannot be executed,
>> that this work will naturally remove the restrictions on. I’m certainly
>> happy to specify one of these for the CEP if it will help the reader.
>>
>> I want to exclude “new CQL commands” or any other enhancement to the
>> grammar from the scope of the CEP, however. This work will enable a range
>> of improvements to the UX, but I think this work is a separate, long-term
>> project of evolution that deserves its own CEPs, and will likely involve
>> input from a wider range of contributors and users. If nobody else starts
>> such CEPs, I will do so in due course (much further down the line).
>>
>> Assuming there is not significant dissent on this point I will update the
>> CEP to reflect this non-goal.
>>
>>
>>
>> From: C. Scott Andreas <sc...@paradoxica.net>
>> Date: Wednesday, 15 September 2021 at 00:31
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Cc: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Adding a few notes from my perspective as well –
>>
>> Re: the UX question, thanks for asking this.
>>
>> I agree that offering a set of example queries and use cases may help make
>> the specific use cases more understandable; perhaps we can prepare these as
>> examples to be included in the CEP.
>>
>> I do think that all potential UX directions begin with the specification
>> of the protocol that will underly them, as what can be expressed by it may
>> be a superset of what's immediately exposed by CQL. But at minimum it's
>> great to have a sense of the queries one might be able to issue to focus a
>> reading of the whitepaper.
>>
>> Re: "Can we not start using it as an external dependency, and later
>> re-evaluate if it's necessary to bring it into the project or even incubate
>> it as another Apache project"
>>
>> I think it would be valuable to the project for the work to be incubated
>> in a separate repository as part of the Apache Cassandra project itself,
>> much like the in-JVM dtest API and Harry. This pattern worked well for
>> those projects as they incubated as it allowed them to evolve outside the
>> primary codebase, but subject to the same project governance, set of PMC
>> members, committers, and so on. Like those libraries, it also makes sense
>> as the Cassandra project is the first (and at this time) only known
>> intended consumer of the library, though there may be more in the future.
>>
>> If the proposal is accepted, the time horizon envisioned for this work's
>> completion is ~9 months to a standard of production readiness. The
>> contributors see value in the work being donated to and governed by the
>> contribution practices of the Foundation. Doing so ensures that it is being
>> developed openly and with full opportunity for review and contribution of
>> others, while also solidifying contribution of the IP to the project.
>>
>> Spinning up a separate ASF incubation project is an interesting idea, but
>> I feel that doing so would introduce a far greater overhead in process and
>> governance, and that the most suitable governance and set of committers/PMC
>> members are those of the Apache Cassandra project itself.
>>
>> On Sep 14, 2021, at 3:53 PM, "bened...@apache.org" <bened...@apache.org>
>> wrote:
>>
>>
>> Hi Paulo,
>>
>> First and foremost, I believe this proposal in its current form focuses on
>> the protocol details (HOW?) but lacks the bigger picture on how this is
>> going to be exposed to the user (WHAT)?
>>
>> In my opinion this CEP embodies a coherent distinct and complex piece of
>> work, that requires specialist expertise. You have after all just suggested
>> a month to read only the existing proposal ??
>>
>> UX is a whole other kind of discussion, that can be quite opinionated, and
>> requires different expertise. It is in my opinion helpful to break out work
>> that is not tightly coupled, as well as work that requires different
>> expertise. As you point out, multi-key UX features are largely independent
>> of any underlying implementation, likely can be done in parallel, and even
>> with different contributors.
>>
>> Can we not start using it as an external dependency
>>
>> I would love to understand your rationale, as this is a surprising
>> suggestion to me. This is just like any other subsystem, but we would be
>> managing it as a separate library primarily for modularity reasons. The
>> reality is that this option should anyway be considered unavailable. This
>> is a proposed contribution to the Cassandra project, which we can either
>> accept or reject.
>>
>> Isn't this a good chance to make the serialization protocol pluggable
>> with clearly defined integration points
>>
>> It has recently been demonstrated to be possible to build a system that
>> can safely switch between different consensus protocols. However, this was
>> very sophisticated work that would require its own CEP, one that we would
>> be unable to resource. Even if we could this would be insufficient. This
>> goal has never been achieved for a multi-shard transaction protocol to my
>> knowledge, and multi-shard transaction protocols are much more divergent in
>> implementation detail than consensus protocols.
>>
>> so we could easily switch implementations with different guarantees… (ie.
>> Apache Ratis)
>>
>> As far as I know, there are no other strict serializable protocols
>> available to plug in today. Apache Ratis appears to be a straightforward
>> Raft implementation, and therefore it is a linearizable consensus protocol.
>> It is not multi-shard transaction protocol at all, let alone strict
>> serializable. It could be used in place of Paxos, but not Accord.
>>
>>
>>
>> From: Paulo Motta <pauloricard...@gmail.com>
>> Date: Tuesday, 14 September 2021 at 22:55
>> To: Cassandra DEV <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> I can start with some preliminary comments while I get more familiarized
>> with the proposal:
>>
>> - First and foremost, I believe this proposal in its current form focuses
>> on the protocol details (HOW?) but lacks the bigger picture on how this is
>> going to be exposed to the user (WHAT)? Is exposing linearizable
>> transactions to the user not a goal of this proposal? If not, I think the
>> proposal is missing the UX (ie. what CQL commands are going to be added
>> etc) on how these transactions are going to be exposed.
>>
>> - Why do we need to bring the library into the project umbrella? Can we not
>> start using it as an external dependency, and later re-evaluate if it's
>> necessary to bring it into the project or even incubate it as another
>> Apache project? I feel we may be importing unnecessary management overhead
>> into the project while only a small subset of contributors will be involved
>> with the core protocol.
>>
>> - Isn't this a good chance to make the serialization protocol pluggable
>> with clearly defined integration points, so we could easily switch
>> implementations with different guarantees, trade-offs and performance
>> considerations while leaving the UX intact? This would also allow us to
>> easily benchmark the protocol against alternatives (ie. Apache Ratis) and
>> validate the performance claims. I think the best way to do that would be
>> to define what the feature will look like to the end user (UX), define the
>> integration points necessary to support this feature, and use accord as the
>> first implementation of these integration points.
>>
>> Em ter., 14 de set. de 2021 às 17:57, Paulo Motta <
>> pauloricard...@gmail.com>
>> escreveu:
>>
>> Given the extensiveness and complexity of the proposal I'd suggest leaving
>> it a little longer (perhaps 4 weeks from the publish date?) for people to
>> get a bit more familiarized and have the chance to comment before casting a
>> vote. I glanced through the proposal - and it looks outstanding, very
>> promising work guys! - but would like a bit more time to take a deeper look
>> and digest it before potentially commenting on it.
>>
>> Em ter., 14 de set. de 2021 às 17:30, bened...@apache.org <
>> bened...@apache.org> escreveu:
>>
>> Has anyone had a chance to read the drafts, and has any feedback or
>> questions? Does anybody still anticipate doing so in the near future? Or
>> shall we move to a vote?
>>
>> From: bened...@apache.org <bened...@apache.org>
>> Date: Tuesday, 7 September 2021 at 21:27
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Hi Jake,
>>
>>> What structural changes are planned to support an external dependency
>> project like this
>>
>> To add to Blake’s answer, in case there’s some confusion over this, the
>> proposal is to include this library within the Apache Cassandra project. So
>> I wouldn’t think of it as an external dependency. This PMC and community
>> will still have the usual oversight over direction and development, and
>> APIs will be developed solely with the intention of their integration with
>> Cassandra.
>>
>>> Will this effort eventually replace consistency levels in C*?
>>
>> I hope we’ll have some very related discussions around consistency levels
>> in the coming months more generally, but I don’t think that is tightly
>> coupled to this work. I agree with you both that we won’t want to
>> perpetuate the problems you’ve highlighted though.
>>
>> Henrik:
>>> I was referring to the property that Calvin transactions also need to
>> be sent to the cluster in a single shot
>>
>> Ah, yes. In that case I agree, and I tried to point to this direction in
>> an earlier email, where I discussed the use of scripting languages (i.e.
>> transactionally modifying the database with some subset of arbitrary
>> computation). I think the JVM is particularly suited to offering quite
>> powerful distributed transactions in this vein, and it will be interesting
>> to see what we might develop in this direction in future.
>>
>>
>> From: Jake Luciani <jak...@gmail.com>
>> Date: Tuesday, 7 September 2021 at 19:27
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Great thanks for the information
>>
>> On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
>> <beggles...@apple.com.invalid> wrote:
>>
>>> Hi Jake,
>>>
>>>> 1. Will this effort eventually replace consistency levels in C*? I
>> ask
>>>> because one of the shortcomings of our paxos today is
>>>> it can be easily mixed with non serialized consistencies and therefore
>>>> users commonly break consistency by for example reading at CL.ONE
>> while
>>>> also
>>>> using LWTs.
>>>
>>> This will likely require CLs to be specified at the schema level for
>>> tables using multi partition transactions. I’d expect this to be
>> available
>>> for other tables, but not required.
>>>
>>>> 2. What structural changes are planned to support an external
>> dependency
>>>> project like this? Are there some high level interfaces you expect
>> the
>>>> project to adhere to?
>>>
>>> There will be some interfaces that need to be implemented in C* to
>> support
>>> the library. You can find the current interfaces in the accord.api
>> package,
>>> but these were written to support some initial testing, and not intended
>>> for integration into C* as is. Things are pretty fluid right now and
>> will
>>> be rewritten / refactored multiple times over the next few months.
>>>
>>> Thanks,
>>>
>>> Blake
>>>
>>>
>>>> On Sun, Sep 5, 2021 at 10:33 AM bened...@apache.org <
>> bened...@apache.org
>>>>
>>>> wrote:
>>>>
>>>>> Wiki:
>>>>>
>>>
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
>>>>> Whitepaper:
>>>>>
>>>
>>
>> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
>>>>> <
>>>>>
>>>
>>
>> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
>>>>>>
>>>>> Prototype: https://github.com/belliottsmith/accord
>>>>>
>>>>> Hi everyone, I’d like to propose this CEP for adoption by the
>> community.
>>>>>
>>>>> Cassandra has benefitted from LWTs for many years, but application
>>>>> developers that want to ensure consistency for complex operations
>> must
>>>>> either accept the scalability bottleneck of serializing all related
>>> state
>>>>> through a single partition, or layer a complex state machine on top
>> of
>>> the
>>>>> database. These are sophisticated and costly activities that our
>> users
>>>>> should not be expected to undertake. Since distributed databases are
>>>>> beginning to offer distributed transactions with fewer caveats, it is
>>> past
>>>>> time for Cassandra to do so as well.
>>>>>
>>>>> This CEP proposes the use of several novel techniques that build upon
>>>>> research (that followed EPaxos) to deliver (non-interactive) general
>>>>> purpose distributed transactions. The approach is outlined in the
>>> wikipage
>>>>> and in more detail in the linked whitepaper. Importantly, by adopting
>>> this
>>>>> approach we will be the _only_ distributed database to offer global,
>>>>> scalable, strict serializable transactions in one wide area
>> round-trip.
>>>>> This would represent a significant improvement in the state of the
>> art,
>>>>> both in the academic literature and in commercial or open source
>>> offerings.
>>>>>
>>>>> This work has been partially realised in a prototype. This partial
>>>>> prototype has been verified against Jepsen.io’s Maelstrom library and
>>>>> dedicated in-tree strict serializability verification tools, but much
>>> work
>>>>> remains for the work to be production capable and integrated into
>>> Cassandra.
>>>>>
>>>>> I propose including the prototype in the project as a new source
>>>>> repository, to be developed as a standalone library for integration
>> into
>>>>> Cassandra. I hope the community sees the important value proposition
>> of
>>>>> this proposal, and will adopt the CEP after this discussion, so that
>> the
>>>>> library and its integration into Cassandra can be developed in
>> parallel
>>> and
>>>>> with the involvement of the wider community.
>>>>>
>>>>
>>>>
>>>> --
>>>> http://twitter.com/tjake
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>>
>>
>> --
>> http://twitter.com/tjake
>>

Re: [DISCUSS] CEP-15: General Purpose Transactions

Reply via email to