Hi Sam,

Great to see this CEP. I have been documenting a few common 'patterns of
distributed systems, and have documented a pattern called 'consistent core
<https://martinfowler.com/articles/patterns-of-distributed-systems/consistent-core.html>'
referring to the source code of various systems which use a linearizable
metadata store. I have also documented patterns like 'lease'
<https://martinfowler.com/articles/patterns-of-distributed-systems/time-bound-lease.html>
and
'state watch
<https://martinfowler.com/articles/patterns-of-distributed-systems/state-watch.html>'
which are commonly used by a consistent core. I also recently documented
how a typical partition assignment and partition movement is implemented in
systems that use a consistent core-based metadata store. (In systems like
YugabyteDb, Cockroachdb, Kafka etc..)
It might be of some use as a quick reference for this CEP to be compared
with others who use similar architecture.
A quick question about using existing Paxos machinery. I see that
implementing a Replicated Log
<https://martinfowler.com/articles/patterns-of-distributed-systems/#PatternSequenceForImplementingReplicatedLog>
needs
significant changes, particularly about how two phases of Paxos are
implemented over the entire log. So will it be better to use Raft instead?


Thanks,
Unmesh

On 2022/08/23 08:50:27 Sam Tunnicliffe wrote:
> Thanks!
> The core of the proposal is around the sequencing metadata changes and
ensuring that they're delivered to/processed by nodes in the right order
and at the right time. The actual mechanisms for imposing that order and
for maintaining the log are pretty simple to implement. We envision using
the existing Paxos machinery by default, but swapping that for an
alternative implemention would not be difficult.
>
>
> > On 22 Aug 2022, at 19:14, Derek Chen-Becker <de...@chen-becker.org>
wrote:
> >
> > This looks really interesting; thanks for putting this together! Just
so I'm clear on CEP nomenclature, having external management of metadata as
a non-goal doesn't preclude some future use, correct? Coincidentally, I'm
working on my ApacheCon talk on improving modularity in Cassandra and one
of the ideas I'm discussing is pluggably (?) replacing gossip with
something(s) that allow us to externalize some of the complexity of
maintaining consistency. I need to digest the proposal you've made, but I
don't see the two ideas being at odds on my first read.
> >
> > Cheers,
> >
> > Derek
> >
> > On Mon, Aug 22, 2022 at 6:45 AM Sam Tunnicliffe <s...@beobal.com <
ma...@beobal.com>> wrote:
> > Hi,
> >
> > I'd like to open discussion about this CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
<
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21:+Transactional+Cluster+Metadata>

> > Cluster metadata in Cassandra comprises a number of disparate elements
including, but not limited to, distributed schema, topology and token
ownership. Following the general design principles of Cassandra, the
mechanisms for coordinating updates to cluster state have favoured eventual
consistency, with probabilisitic delivery via gossip being a prime example.
Undoubtedly, this approach has benefits, not least in terms of resilience,
particularly in highly fluid distributed environments. However, this is not
the reality of most Cassandra deployments, where the total number of nodes
is relatively small (i.e. in the low thousands) and the rate of change
tends to be low.
> >
> > Historically, a significant proportion of issues affecting operators
and users of Cassandra have been due, at least in part, to a lack of
strongly consistent cluster metadata. In response to this, we propose a
design which aims to provide linearizability of metadata changes whilst
ensuring that the effects of those changes are made visible to all nodes in
a strongly consistent manner. At its core, it is also pluggable, enabling
Cassandra-derived projects to supply their own implementations if desired.
> >
> > In addition to the CEP document itself, we aim to publish a working
prototype of the proposed design. Obviously, this does not implement the
entire proposal and there are several parts which remain only partially
complete. It does include the core of the system, including a good deal of
test infrastructure, so may serve as both illustration of the design and a
starting point for real implementation.
> >
> >
> >
> > --
> > +---------------------------------------------------------------+
> > | Derek Chen-Becker |
> > | GPG Key available at https://keybase.io/dchenbecker <
https://keybase.io/dchenbecker> and |
> > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org <
https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org> |
> > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC |
> > +---------------------------------------------------------------+
> >
>
>

Reply via email to