Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

Boyang Chen Wed, 06 Feb 2019 21:59:02 -0800

Thanks Konstantine for the great summary! +1 for having a separate KIP 
discussing the trade-offs for using a new serialization format for the protocol 
encoding. We probably could discuss a wider options and benchmark on the 
performance before reaching a final decision.

Best,
Boyang
________________________________
From: Konstantine Karantasis <konstant...@confluent.io>
Sent: Tuesday, February 5, 2019 4:23 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka 
Connect

Hi all,

Thank you for your comments so far.
Now that KIP freeze and feature freeze are behind us for version 2.2, I'd
like to bring this thread back at the top of the email stack, with the
following suggestion:

I'll be changing KIP-415's description to include a serialization format
that extends the current scheme and is based on Kafka structs.

The initial suggestion to transition to using an alternative serialization
format (e.g. flatbuffers) was made just in case we saw this would have a
good potential and we could arrive in a quick consensus on this matter. I
believe the arguments for such a transition make sense, but the pros are
probably not enough to outweigh the introduction of a dependency at this
point and justify changes in every client that will potentially use
incremental cooperative rebalancing in the future. The changes in the
rebalancing protocol have not been very frequent so far.

Admittedly, even more important is the fact that the discussion around the
serialization format of the new protocol is only tangentially related to
the core of KIP-415. Thus, in order to keep the discussion focused on the
essential changes required by KIP-415, which are expected to have
significant impact in addressing the stop-the-world effect, I'd like to
punt any optimizations to the serialization format and change the KIP to
describe a schema that depends on Kafka structs as the current (V0) version
does.

I hope this will allow us to make progress easier and bring the changes of
this new rebalancing protocol to Kafka clients, beginning with Kafka
Connect, in a more applicable and less disruptive way.

I'll change the schema descriptions by end of day.

Looking forward to your next comments!

Konstantine

On Mon, Jan 28, 2019 at 5:22 PM Konstantine Karantasis <
konstant...@confluent.io> wrote:

>
> Hi Ismael,
> thanks for bringing up serialization in the discussion!
>
> Indeed, JSON was considered given it's the prevalent text-based
> serialization option.
>
> In comparison to flatbuffers, most generic pros and cons are valid in this
> context too. Higher perfomance during serde, small size, optional fields,
> strongly typed and others.
>
> Specifically, for Connect's use case, flatbuffers serialization, although
> it introduces a single dependency, it appears more appealing for the
> following reasons:
>
> * The protocol is evolving from a binary format again to a binary one.
> * Although new fields, nested or not, are expected to be introduced (as in
> KIP-415) or old fields may get deprecated, the protocol schemas are
> expected to be simple, mostly flat and manageable. We won't need to process
> arbitrarily nested structures during runtime, for which JSON would be a
> better fit. The current proposal aims to make the current append only
> format a bit more flexible.
> * It's good to keep performance tight because the loop that includes
> subprotocol serde will need to accomodate resource release and assignment.
> Also, rebalancing in it's incremental cooperative form which is expected to
> be lighter has the potential to start happening more frequently. Parsing
> JSON with Jackson has been a hotspot in certain occasions in the past if I
> remember correctly.
> * Evolution will be facilitated by handling or ignoring optional fields
> easily. The protocol may evolve with fewer hard version bumps like the one
> proposed here from V0 to V1.
> * Optional fields are omitted, not just compressed.
> * Unpacking of fields does not require deserialization of the whole
> message, making transition between versions or flavors of the protocol easy
> and performant.
> * Flatbuffers' specification is simple and can be implemented, even in the
> absence of appropriate clients.
>
> I hope the above highlight why flatbuffers is a good candidate for this
> use case and, thus, worth adding as a dependency.
> Strictly speaking, yes, they introduce a new compile-time dependency. But
> during runtime, such a dependency seems equivalent to introducing a JSON
> parser (such as Jackson that is already being used in AK).
>
> Your question is very valid. It's probably worth adding an item under
> rejected alternatives, once we agree how we want to move forward.
>
> Best,
> Konstantine
>
>
>
> On Fri, Jan 25, 2019 at 11:13 PM Ismael Juma <isma...@gmail.com> wrote:
>
>> Thanks for the KIP Konstantine. Quick question: introducing a new
>> serialization format (ie flatbuffers) has major implications. Have we
>> considered json? If so, why did we reject it?
>>
>> Ismael
>>
>> On Fri, Jan 11, 2019, 3:44 PM Konstantine Karantasis <
>> konstant...@confluent.io wrote:
>>
>> > Hi all,
>> >
>> > I just published KIP-415: Incremental Cooperative Rebalancing in Kafka
>> > Connect
>> > on the wiki here:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >
>> > This is the first KIP to suggest an implementation of incremental and
>> > cooperative rebalancing in the context of Kafka Connect. It aims to
>> provide
>> > an adequate solution to the stop-the-world effect that occurs in a
>> Connect
>> > cluster whenever a new connector configuration is submitted or a Connect
>> > Worker is added or removed from the cluster.
>> >
>> > Looking forward to your insightful feedback!
>> >
>> > Regards,
>> > Konstantine
>> >
>>
>

Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

Reply via email to