Re: [DISCUSSION] KIP-650: Enhance Kafkaesque Raft semantics

2020-12-06 Thread Unmesh Joshi
Hi Boyang,

Thanks for the KIP..
I think there are two aspects of linearizable read implementations in Raft.

1. Providing linearizable reads from the leader
   Even read requests from the leader might not return the latest committed
results if the leader is partitioned. The leader needs to make sure that it
is not partitioned and superseded by a new leader. So it needs to send
heartbeats and wait till it gets the response from the quorum of followers,
before returning a response to read requests.
(I think this is the issue found in etcd and consul with jepsen tests,
https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul)
etcd, has implemented readIndex mechanism for this
https://github.com/etcd-io/etcd/pull/6212/commits/e3e39930229830b2991ec917ec5d2ba625febd3f
)

2. Providing safe and strictly ordered reads from followers.
When reading from followers, it's important to make sure that external
clients reading from two different followers see results in strict order.
In zookeeper this is provided by sync requests. As described in the
discussion below in zookeeper, sync won't provide linearizable reads, as
it's always possible that between a sync and return of a read, a new entry
is committed on the leader.
https://mail-archives.apache.org/mod_mbox/zookeeper-user/201303.mbox/%3CCAJwFCa0Hoekc14Zy6i0LyLj=eraf8jimqmzadohokqjntmt...@mail.gmail.com%3E
The discussion in Raft thesis about reads from followers is mainly to make
sure clients are not affected by followers which are partitioned, and get
latest updates.
This is an important concern, and I think the KIP talks mostly about this.
It will probably be useful to have a Kafka specific scenario rather than an
Alice and Bob scenario to describe the situation.
e.g. A client or observers reading from a follower, might never receive
TopicRecords and PartitionRecords when a topic is created on controller
quorum and the particular follower is partitioned. Or a producer/consumer
will never know about a broker which is fenced.

It will be also good to mention which type of clients will be affected by
this scenario. Brokers in the Kafka cluster always need to talk to the
leader of the controller quorum for reading metadata. Because their
heartbeats and leases are tracked by the leader of the controller quorum,
they can not talk to followers.
It's mostly the producers and consumers, which need to read metadata from
the controller quorum which might be affected by a partitioned follower.

Thanks,
Unmesh





On Sat, Dec 5, 2020 at 3:25 AM Guozhang Wang  wrote:

> Hi Boyang,
>
> Thanks for raising this up. I have a few thoughts about the "Non-leader
> Linearizable Read", and I think there are two goals we can consider
> achieving here from a user's perspective:
>
> 1) that each of your queries on the raft log is always going to return the
> latest committed result.
>
> 2) that your consecutive queries would not "go back".
>
> And as we can see 1) is a stronger guarantee than 2), meaning that
> achieving 1) would always guarantee 2) as well.
>
> In your football example, in order for Alice and Bob to agree on each
> other, we would have to achieve 1) above; but practically it may be okay
> for Alice and Bob to see different results temporarily. However, for a
> single user like Alice, it is usually required that if she issued the query
> twice, and the first returns "final result 1:0", the second should not
> return "no final result yet". And to achieve this only 2) is needed. And it
> is easier to achieve 2) without the proposed new request/resp, for example,
> we can let the client associate its query with an offset which it got from
> the previous query result, and require whoever is answering that query to
> have applied the state machine at least up to that offset.
>
> If we consider in Kafka's case, I feel that in most scenarios just
> achieving 2) is good enough, e.g. for querying metadata. Today many of the
> error cases actually come from the fact that if you query two different
> brokers, your results may actually "go back in time". So I'd like to play
> devil's advocate here and ask us if achieving a stronger semantics is
> indeed needed in Kafka.
>
> Guozhang
>
>
>
> On Wed, Dec 2, 2020 at 10:17 AM Boyang Chen 
> wrote:
>
> > Hey there,
> >
> > I would like to start a discussion thread for a KIP to improve on our
> > existing Kafka Raft semantics, specifically adding pre-vote and
> > linearizable read:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics
> >
> > Let me know what you think, thank you!
> >
>
>
> --
> -- Guozhang
>


Re: [DISCUSSION] KIP-650: Enhance Kafkaesque Raft semantics

2020-12-04 Thread Guozhang Wang
Hi Boyang,

Thanks for raising this up. I have a few thoughts about the "Non-leader
Linearizable Read", and I think there are two goals we can consider
achieving here from a user's perspective:

1) that each of your queries on the raft log is always going to return the
latest committed result.

2) that your consecutive queries would not "go back".

And as we can see 1) is a stronger guarantee than 2), meaning that
achieving 1) would always guarantee 2) as well.

In your football example, in order for Alice and Bob to agree on each
other, we would have to achieve 1) above; but practically it may be okay
for Alice and Bob to see different results temporarily. However, for a
single user like Alice, it is usually required that if she issued the query
twice, and the first returns "final result 1:0", the second should not
return "no final result yet". And to achieve this only 2) is needed. And it
is easier to achieve 2) without the proposed new request/resp, for example,
we can let the client associate its query with an offset which it got from
the previous query result, and require whoever is answering that query to
have applied the state machine at least up to that offset.

If we consider in Kafka's case, I feel that in most scenarios just
achieving 2) is good enough, e.g. for querying metadata. Today many of the
error cases actually come from the fact that if you query two different
brokers, your results may actually "go back in time". So I'd like to play
devil's advocate here and ask us if achieving a stronger semantics is
indeed needed in Kafka.

Guozhang



On Wed, Dec 2, 2020 at 10:17 AM Boyang Chen 
wrote:

> Hey there,
>
> I would like to start a discussion thread for a KIP to improve on our
> existing Kafka Raft semantics, specifically adding pre-vote and
> linearizable read:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics
>
> Let me know what you think, thank you!
>


-- 
-- Guozhang


[DISCUSSION] KIP-650: Enhance Kafkaesque Raft semantics

2020-12-02 Thread Boyang Chen
Hey there,

I would like to start a discussion thread for a KIP to improve on our
existing Kafka Raft semantics, specifically adding pre-vote and
linearizable read:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics

Let me know what you think, thank you!