Hi,
I have built a small prototype of how the controller backed by consensus
implementation will look like.
https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/Kip631Controller.scala
This is a miniature Kafka like implementation I use to teach distributed
systems at ThoughtWorks, and also experiment with my work on patterns of
distributed systems
<https://martinfowler.com/articles/patterns-of-distributed-systems/>.
It has a working end to end test to demonstrate how the overall system will
possibly look like
https://github.com/unmeshjoshi/distrib-broker/blob/master/src/test/scala/com/dist/simplekafka/ProducerConsumerKIP500Test.scala
The Kip631Controller itself depends on a Consensus module, to demonstrate
how possible interactions with the consensus module will look like
 (The Consensus can be pluggable really, with an API to allow reading
replicated log upto HighWaterMark)
As of now I do not have complete state transition implementation of broker
states. But it has implementation of LeaseTracker
https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/LeaderLeaseTracker.scala
to demonstrate LeaseTracker's interaction with the consensus module. (It
needs to propose and commit a FenceBroker record for expired leases)
I am hoping to add more specific tests for broker state transition here
https://github.com/unmeshjoshi/distrib-broker/blob/master/src/test/scala/com/dist/simplekafka/kip500/Kip631ControllerTest.scala

I hope it will be helpful in driving some discussions and doing quick
experimentation outside of the Kafka codebase.

Thanks,
Unmesh






On Tue, Aug 4, 2020 at 7:52 AM Unmesh Joshi <unmeshjo...@gmail.com> wrote:

> Hi,
>
> >>>>The LeaseStartTimeMs is expected to be the broker's
> 'System.currentTimeMillis()' at the point of the request. The active
> controller will add its lease period to this in order >>>>to compute
> the LeaseEndTimeMs.
> I think this is a bit confusing.  Monotonic clock on the active controller
> should be used to track leases.
> For example,
> https://issues.apache.org/jira/browse/ZOOKEEPER-1616
>
> https://github.com/etcd-io/etcd/pull/6888/commits/e7f4010ccaf28b6ce64fe514d25a4b2fa459d114
>
> Then we will not need LeaseStartTimeMs?
> Instead of LeaseStartTimeMs, can we call it LeaseTTL? The active
> controller can then calculate LeaseEndTime = System.nanoTime() + LeaseTTL.
> In this case we might just drop LeaseEndTimeMs from the response, as the
> broker already knows about the TTL and can send heartbeats at some fraction
> of TTL, say every TTL/4 milliseconds.(elapsed time on the broker measured
> by System.nanoTime)
>
> Thanks,
> Unmesh
>
>
>
>
>
> On Tue, Aug 4, 2020 at 4:48 AM Colin McCabe <cmcc...@apache.org> wrote:
>
>> On Mon, Aug 3, 2020, at 15:51, Jose Garcia Sancio wrote:
>> > Thanks for the KIP Colin,
>> >
>> > Here is a partial review:
>> >
>> > > 1. Even when a broker and a controller are co-located in the same
>> JVM, they must
>> > > have different node IDs
>> >
>> > Why? What problem are you trying to solve?
>> >
>>
>> Hi Jose,
>>
>> Thanks for the comments.
>>
>> We did talk about this a bit earlier in the thread.  The controller is
>> always on its own port, even when it is running in the same JVM as a
>> broker.  So it would not make sense to share the same ID here-- your
>> messages would not get through to the controller if you sent them to the
>> broker port instead.  And clearly if the controller is on a separate host
>> from any broker, it can't share a broker id.
>>
>> >
>> > > 2. Node IDs must be set in the configuration file for brokers and
>> controllers.
>> >
>> > I understand that controller IDs must be static and in the
>> > configuration file to be able to generate consensus in KIP-595. Why
>> > are the broker nodes which are observers in KIP-595 cannot discover
>> > their ID on first boot and persist their ID for consistency in future
>> > restarts?
>> >
>>
>> This is discussed in the rejected alternatives section.  Basically, node
>> ID auto-assignment is complicated and antiquated in the age of Kubernetes,
>> Chef, Puppet, Ansible, etc.
>>
>> >
>> > > 3. Controller processes will listen on a separate endpoint from
>> brokers
>> >
>> > Why is this? Kafka supports multi endpoints. For example, one broker
>> > can have one endpoint for listening to connections by other brokers
>> > and another endpoint for connections from admin, producer and consumer
>> > clients.
>> >
>>
>> The reason for having separate ports is discussed in KIP-590.  Basically,
>> it is so that control plane traffic can be isolated from data plane
>> traffic, as much as possible.  This is the existing situation with
>> ZooKeeper.  Since ZK is on a separate port, the client cannot disrupt the
>> cluster by flooding it with traffic (unless the admin has unwisely exposed
>> all internal ports, but this would cause bigger security issues).  We want
>> to preserve this property.
>>
>> > > 4. In the case of controller RPCs like AlterIsr, the controller
>> handles this by not sending
>> > > back a response until the designated change has been persisted.
>> >
>> > Should we enumerate these RPCs? For example, we also have
>> > `ListPartitionReassignments` which is a read operation and goes
>> > directly to the controller. The naive solution would be to return the
>> > uncommitted state in the controller.
>> >
>>
>> Hmm.  The KIP says that the "active controller must not make [...] future
>> state "visible" to the rest of the cluster until it has been made
>> persistent."  So we don't return uncommitted state for read operations.
>>
>> >
>> > 5.
>> > This KIP mentions a topic named __kafka_metadata. KIP-595 and KIP-630
>> > mention a partition named __cluster_metadata. We should reconcile this
>> > difference.
>> >
>>
>> Jason did write that the name of the controller topic is "not a formal
>> part of [the KIP-595] proposal."  I think he wanted to avoid having the
>> discussion about the topic name in two different KIPs.  :)
>>
>> Let's discuss the metadata topic name here in KIP-631, and update KIP-595
>> as required once this one is accepted.
>>
>> best,
>> Colin
>>
>> >
>> > --
>> > -Jose
>> >
>>
>

Reply via email to