Hi!
Alexey,
> If we want to support etcd as a metastorage - let's do this as a concrete
configuration option, a
> first-class citizen of the system rather than an SPI implementation with
a rigid interface.
On one side this is quite reasonable. But on the other side, if someone
wants to adopt, for example Apache Zookeeper or
some other proprietary external lock service, we could provide basic
interfaces to do the job.

> Thus, by default, they will be mixed which will significantly simplify
cluster setup and usability.
According to raft specs, the leader processes all requests from clients.
Leader's response latency is a crucial thing for the whole cluster
stability.
Cluster setup simplicity is a subject of documentation, scripts and so on,
i.e. starting kafka is quite easy.

Also, if we use mixed approach, service discovery protocol should be
implemented.This is necessary, because we should discover nodes firstly in
order to choose finite subset for RAFT ensemble.
For example, Consul by HashiCorp uses gossip protocol to do the job. (Nodes
participating in RAFT are called servers,  [1]

If we use separated approach, we could use service discovery pattern that
is common for zookeeper or etcd (data node create record with TTL and renew
it. (EPHEMERAL node approach for zk),
other data nodes watches for new records)

Some words about PacificA
Article  [2] -- is just brief descriptions and ideas. Alexey, is there any
formal specification of this protocol? Preferrably in TLA+?


[1] -- https://www.consul.io/docs/architecture/gossip
[2] --
https://www.microsoft.com/en-us/research/wp-content/uploads/2008/02/tr-2008-25.pdf




пт, 23 окт. 2020 г. в 13:05, Alexey Goncharuk <alexey.goncha...@gmail.com>:

> Hello Ivan,
>
> Thanks for the feedback, see my comments inline:
>
> чт, 22 окт. 2020 г. в 17:59, Ivan Daschinsky <ivanda...@gmail.com>:
>
> > Hi!
> > Alexey, your proposal looks great. Can I ask you some questions?
> > 1. Is nodes, that take part of metastorage replication group (raft
> > candidates and leader) are expected to also bear cache data and
> participate
> > in cache transactions?
> >    As for me, it seems quite dangerous to mix roles. For example, heavy
> > load from users can cause long GC pauses on leader of replication group
> and
> > therefore failure, new leader election, etc.
> >
> I think both ways should be possible. The set of nodes that hold
> metastorage should be defined declaratively in runtime, as well as the set
> of nodes holding table data. Thus, by default, they will be mixed which
> will significantly simplify cluster setup and usability, but when needed,
> this should be easily adjusted in runtime by the cluster administrator.
>
>
> > 2. If previous statement is true, other question arises. If one of
> > candidates or leader fails, how will a insufficient node will be chosen
> > from regular nodes to form full ensemble? Random one?
> >
> Similarly - by default, a 'best' node will be chosen from the available
> ones, but the administrator can override this.
>
>
> > 3. Do you think, that this metastorage implementation can be pluggable?
> it
> > can be implemented on top of etcd, for example.
>
> I think the metastorage abstraction must be clearly separated so it is
> possible to change the implementation. Moreover, I was thinking that we may
> use etcd to speed up the development of other system components while we
> are working on our own protocol implementation. However, I do not think we
> should expose it as a pluggable public API. If we want to support etcd as a
> metastorage - let's do this as a concrete configuration option, a
> first-class citizen of the system rather than an SPI implementation with a
> rigid interface.
>
> WDYT?
>


-- 
Sincerely yours, Ivan Daschinskiy

Reply via email to