Hi Radim,

Thanks for the KIP! CRaC is an interesting project and it could be a
useful feature in Kafka clients.

The KIP is pretty vague in terms of the expected behavior of clients
when checkpointing and restoring. For example:

1. A consumer may have pre-fetched records in memory. When it is
checkpointed, its group will rebalance and another consumer will
consume the same records. When the initial consumer is restored, will
it process its pre-fetched records and basically reprocess record
already handled by other consumers?

2. Producers may have records in-flight or in the producer buffer when
they are checkpointed. How do you propose to handle these cases?

3. Clients may have loaded plugins such as serializers. These plugins
may establish network connections too. How are these expected to
automatically reconnect when the application is restored?

Thanks,
Mickael


On Wed, Apr 26, 2023 at 8:27 AM Radim Vansa <rva...@azul.com.invalid> wrote:
>
> Hi all,
>
> I haven't seen much reactions on this proposal. Is there any general
> policy regarding dependencies, or a prior decision that would hint on this?
>
> Thanks!
>
> Radim
>
>
> On 21. 04. 23 10:10, Radim Vansa wrote:
> > Caution: This email originated from outside of the organization. Do
> > not click links or open attachments unless you recognize the sender
> > and know the content is safe.
> >
> >
> > Thank you,
> >
> > now to be tracked as KIP-921:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-921%3A+OpenJDK+CRaC+support
> >
> >
> > Radim
> >
> > On 20. 04. 23 15:26, Josep Prat wrote:
> >> Hi Radim,
> >> You should have now permissions to create a KIP.
> >>
> >> Best,
> >>
> >> On Thu, Apr 20, 2023 at 2:22 PM Radim Vansa <rva...@azul.com.invalid>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> upon filing a PR [1] with some initial support for OpenJDK CRaC
> >>> [2][3] I
> >>> was directed here to raise a KIP (I don't have the permissions in
> >>> wiki/JIRA to create the KIP page yet, though).
> >>>
> >>> In a nutshell, CRaC intends to provide a way to checkpoint (snapshot)
> >>> and persist a running Java application and later restore it,
> >>> possibly on
> >>> a different computer. This can be used to significantly speed up the
> >>> boot process (from seconds or minutes to tens of milliseconds), live
> >>> replication or migration of the heated up application. This is not
> >>> entirely transparent to the application; the application can register
> >>> for notification when this is happening, and sometime has to assist
> >>> with
> >>> that to prevent unexpected state after restore - e.g. close network
> >>> connections and files.
> >>>
> >>> CRaC is not integrated yet into the mainline JDK; JEP is being
> >>> prepared,
> >>> and users are welcome to try out our builds. However even when this
> >>> gets
> >>> into JDK we can't expect users jump onto the latest release
> >>> immediately;
> >>> therefore we provide a facade package org.crac [4] that delegates to
> >>> the
> >>> implementation, if it is present in the running JDK, or provides a
> >>> no-op
> >>> implementation.
> >>>
> >>> With or without the implementation, the support for CRaC in the
> >>> application should be designed to have a minimal impact on performance
> >>> (few extra objects, some volatile reads...). On the other hand the
> >>> checkpoint operation itself can be non-trivial in this matter.
> >>> Therefore
> >>> the main consideration should be about the maintenance costs -
> >>> keeping a
> >>> small JAR in dependencies and some extra code in networking and
> >>> persistence.
> >>>
> >>> The support for CRaC does not have to be all-in for all components -
> >>> maybe it does not make sense to snapshot a Broker. My PR was for Kafka
> >>> Clients because the open network connections need to be handled in a
> >>> web
> >>> application (in my case I am enabling CRaC in Quarkus Superheros [5]
> >>> demo). The PR does not handle all possible client-side uses; as I am
> >>> not
> >>> familiar with Kafka I follow the whack-a-mole strategy.
> >>>
> >>> It is possible that the C/R could be handled in a different layer, e.g.
> >>> in Quarkus integration code. However our intent is to push the changes
> >>> as low in the technology stack as possible, to provide the best fanout
> >>> to users without duplicating maintenance efforts. Also having the
> >>> support higher up can be fragile and break encapsulation.
> >>>
> >>> Thank you for your consideration, I hope that you'll appreciate our
> >>> attempt to innovate the Java ecosystem.
> >>>
> >>> Radim Vansa
> >>>
> >>> PS: I'd appreciate if someone could give me the permissions on wiki to
> >>> create a proper KIP! Username: rvansa (both Confluence and JIRA).
> >>>
> >>> [1] https://github.com/apache/kafka/pull/13619
> >>>
> >>> [2] https://wiki.openjdk.org/display/crac
> >>>
> >>> [3] https://github.com/openjdk/crac
> >>>
> >>> [4] https://github.com/CRaC/org.crac
> >>>
> >>> [5] https://quarkus.io/quarkus-workshops/super-heroes/
> >>>
> >>>
> >> --
> >> [image: Aiven] <https://www.aiven.io>
> >>
> >> *Josep Prat*
> >> Open Source Engineering Director, *Aiven*
> >> josep.p...@aiven.io   |   +491715557497
> >> aiven.io <https://www.aiven.io>   |
> >> <https://www.facebook.com/aivencloud>
> >>    <https://www.linkedin.com/company/aiven/>
> >> <https://twitter.com/aiven_io>
> >> *Aiven Deutschland GmbH*
> >> Alexanderufer 3-7, 10117 Berlin
> >> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >> Amtsgericht Charlottenburg, HRB 209739 B
> >>

Reply via email to