Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-08 Thread Abhijeet Kumar
Hi Christo,

Please find my comments inline.

On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
wrote:

> Hello Abhijeet and Jun,
>
> I have been mulling this KIP over a bit more in recent days!
>
> re: Jun
>
> I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> retrospect it should have been fairly obvious. I would need to go an update
> KIP-1005 myself then, thank you for giving the useful reference!
>
> 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> which experienced a disk failure) to decrease the time a partition spends
> in under-replicated state. In other words, a follower which has just fallen
> out of ISR, but has local data will continue using today's Tiered Storage
> replication protocol (i.e. fetching from earliest-local). I further believe
> he has taken this approach so that local state of replicas which have just
> fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> Abhijeet, have I understood (and summarised) what you are proposing
> correctly?
>
> Yes, your understanding is correct. We want to limit the behavior changes
only to new replicas.


> 5. I think in today's Tiered Storage we know the leader's log-start-offset
> from the FetchResponse and we can learn its local-log-start-offset from the
> ListOffsets by asking for earliest-local timestamp (-4). But granted, this
> ought to be added as an additional API call in the KIP.
>
>
Yes, I clarified this in my reply to Jun. I will add this missing detail in
the KIP.


> re: Abhijeet
>
> 101. I am still a bit confused as to why you want to include a new offset
> (i.e. pending-upload-offset) when you yourself mention that you could use
> an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> end your Motivation with "In this KIP, we will focus only on the follower
> fetch protocol using the *last-tiered-offset*" and then in the following
> sections you talk about pending-upload-offset. I understand this might be
> classified as an implementation detail, but if you introduce a new offset
> (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> API (i.e. introduce -6) and thus document it in this KIP as such. However,
> the last-tiered-offset ought to already be exposed as part of KIP-1005
> (under implementation). Am I misunderstanding something here?
>

I have tried to clarify this in my reply to Jun.

> The follower needs to build the local data starting from the offset
> Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> corresponding leader-epoch.
> There are two ways to do this:
>1. We add support in ListOffsetRequest to be able to fetch this offset
> (and leader epoch) from the leader.
>2. Or, fetch the tiered-offset (which is already supported). From this
> offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
> the tiered-offset.
>   However, we still need the leader epoch for offset, since there is
> no guarantee that the leader epoch for Earliest-Pending-Upload-Offset will
> be the same as the
>   leader epoch for tiered-offset. We may need another API call to the
> leader for this.
> I prefer the first approach. The only problem with the first approach is
> that it introduces one more offset. The second approach avoids this problem
> but is a little complicated.



>
> Best,
> Christo
>
> On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the KIP. Left a few comments.
> >
> > 1. "A drawback of using the last-tiered-offset is that this new follower
> > would possess only a limited number of locally stored segments. Should it
> > ascend to the role of leader, there is a risk of needing to fetch these
> > segments from the remote storage, potentially impacting broker
> > performance."
> > Since we support consumers fetching from followers, this is a potential
> > issue on the follower side too. In theory, it's possible for a segment to
> > be tiered immediately after rolling. In that case, there could be very
> > little data after last-tiered-offset. It would be useful to think through
> > how to address this issue.
> >
> > 2. ListOffsetsRequest:
> > 2.1 Typically, we need to bump up the version of the request if we add a
> > new value for timestamp. See
> >
> >
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> > .
> > 2.2 Since this changes the inter broker request protocol, it would be
> > useful to have a section on upgrade (e.g. new IBP/metadata.version).
> >
> > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch
> the
> > last-tiered-offset from the leader, and make a separate leader call to
> > fetch leader epoch for the following offset."
> > Why do we need to make a separate call for the leader epoch?
> > ListOffsetsResponse include both the offset and the corresponding epo

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-08 Thread Abhijeet Kumar
Hi Jun,

Thank you for taking the time to review the KIP. Please find my comments
inline.

On Fri, Apr 5, 2024 at 12:09 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>

We plan to have a follow-up KIP that will address both the deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).


>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
>

Yes, let me update the KIP to include this change. We will need a new
timestamp corresponding to Earliest-Pending-Upload-Offset.


> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> Make sense. I will update the KIP to capture this.


> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> I understand there is some confusion here. Let me try to explain this.

The follower needs to build the local data starting from the offset
Earliest-Pending-Upload-Offset. Hence it needs the offset and the
corresponding leader-epoch.
There are two ways to do this:
   1. We add support in ListOffsetRequest to be able to fetch this offset
(and leader epoch) from the leader.
   2. Or, fetch the tiered-offset (which is already supported). From this
offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
the tiered-offset.
  However, we still need the leader epoch for offset, since there is no
guarantee that the leader epoch for Earliest-Pending-Upload-Offset will be
the same as the
  leader epoch for tiered-offset. We may need another API call to the
leader for this.

I prefer the first approach. The only problem with the first approach is
that it introduces one more offset. The second approach avoids this problem
but is a little complicated.


> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
>
We want to limit this new behavior only to new replicas. Replicas that
become out of ISR are excluded from this behavior change. Those will
continue with the existing behavior.


> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>

Missed this detail. The follower will need to call the leader APi to fetch
the EarliestLocal offset for this.


> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader to
> > give
> > > back to a follower - earliest-pending-upload-offset or the
> > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> ought
> > to
> > > already be exposing that offset as part of the ListOffsets API, no?
> > >
> > > Best,
> > > Christo
> > >
> > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have created KIP-1023 to introduce follower fetch from tiered
> offset.
> > > > This feature will be helpful in significantly reducing Kafka
> > > > rebalance/rebuild times when the cluster is enabled with tiered
> > storage.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2798

2024-04-08 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2797

2024-04-08 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-1022 Formatting and Updating Features

2024-04-08 Thread Justine Olshan
Hey Jun,

That's a good question. I think maybe for simplicity, we can have a single
config?
If that makes sense, I will update the KIP.

Justine

On Mon, Apr 8, 2024 at 3:20 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the updated KIP.
>
> One more question related to KIP-1014. It introduced a new
> config unstable.metadata.versions.enable. Does each new feature need to
> have a corresponding config to enable the testing of unstable features or
> should we have a generic config enabling the testing of all unstable
> features?
>
> Jun
>
> On Thu, Apr 4, 2024 at 8:24 PM Justine Olshan  >
> wrote:
>
> > I'm hoping this covers the majority of comments. I will go ahead and open
> > the vote in the next day or so.
> >
> > Thanks,
> > Justine
> >
> > On Wed, Apr 3, 2024 at 3:31 PM Justine Olshan 
> > wrote:
> >
> > > Find and replace has failed me :(
> > >
> > > Group version seems a little vague, but we can update it. Hopefully
> find
> > > and replace won't fail me again, otherwise I will get another email on
> > this.
> > >
> > > Justine
> > >
> > > On Wed, Apr 3, 2024 at 12:15 PM David Jacot
>  > >
> > > wrote:
> > >
> > >> Thanks, Justine.
> > >>
> > >> * Should we also use `group.version` (GV) as I suggested in my
> previous
> > >> message in order to be consistent?
> > >> * Should we add both names to the `Public Interfaces` section?
> > >> * There is still at least one usage of `transaction.protocol.verison`
> in
> > >> the KIP too.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Wed, Apr 3, 2024 at 6:29 PM Justine Olshan
> > >> 
> > >> wrote:
> > >>
> > >> > I had missed the David's message yesterday about the naming for
> > >> transaction
> > >> > version vs transaction protocol version.
> > >> >
> > >> > After some offline discussion with Jun, Artem, and David, we agreed
> > that
> > >> > transaction version is simpler and conveys more than just protocol
> > >> changes
> > >> > (flexible records for example)
> > >> >
> > >> > I will update the KIP as well as KIP-890
> > >> >
> > >> > Thanks,
> > >> > Justine
> > >> >
> > >> > On Tue, Apr 2, 2024 at 2:50 PM Justine Olshan  >
> > >> > wrote:
> > >> >
> > >> > > Updated!
> > >> > >
> > >> > > Justine
> > >> > >
> > >> > > On Tue, Apr 2, 2024 at 2:40 PM Jun Rao 
> > >> wrote:
> > >> > >
> > >> > >> Hi, Justine,
> > >> > >>
> > >> > >> Thanks for the reply.
> > >> > >>
> > >> > >> 21. Sounds good. It would be useful to document that.
> > >> > >>
> > >> > >> 22. Should we add the IV in "metadata.version=17 has no
> > dependencies"
> > >> > too?
> > >> > >>
> > >> > >> Jun
> > >> > >>
> > >> > >>
> > >> > >> On Tue, Apr 2, 2024 at 11:31 AM Justine Olshan
> > >> > >> 
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Jun,
> > >> > >> >
> > >> > >> > 21. Next producer ID field doesn't need to be populated for TV
> 1.
> > >> We
> > >> > >> don't
> > >> > >> > have the same need to retain this since it is written directly
> to
> > >> the
> > >> > >> > transaction log in InitProducerId. It is only needed for
> KIP-890
> > >> part
> > >> > 2
> > >> > >> /
> > >> > >> > TV 2.
> > >> > >> >
> > >> > >> > 22. We can do that.
> > >> > >> >
> > >> > >> > Justine
> > >> > >> >
> > >> > >> > On Tue, Apr 2, 2024 at 10:41 AM Jun Rao
>  > >
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > > Hi, Justine,
> > >> > >> > >
> > >> > >> > > Thanks for the reply.
> > >> > >> > >
> > >> > >> > > 21. What about the new NextProducerId field? Will that be
> > >> populated
> > >> > >> with
> > >> > >> > TV
> > >> > >> > > 1?
> > >> > >> > >
> > >> > >> > > 22. In the dependencies output, should we show both IV and
> > level
> > >> for
> > >> > >> > > metadata.version too?
> > >> > >> > >
> > >> > >> > > Jun
> > >> > >> > >
> > >> > >> > > On Mon, Apr 1, 2024 at 4:43 PM Justine Olshan
> > >> > >> >  > >> > >> > > >
> > >> > >> > > wrote:
> > >> > >> > >
> > >> > >> > > > Hi Jun,
> > >> > >> > > >
> > >> > >> > > > 20. I can update the KIP.
> > >> > >> > > >
> > >> > >> > > > 21. This is used to complete some of the work with KIP-360.
> > (We
> > >> > use
> > >> > >> > > > previous producer ID there, but never persisted it which
> was
> > in
> > >> > the
> > >> > >> KIP
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820
> > >> > >> )
> > >> > >> > > > The KIP also mentions including previous epoch but we
> > >> explained in
> > >> > >> this
> > >> > >> > > KIP
> > >> > >> > > > how we can figure this out.
> > >> > >> > > >
> > >> > >> > > > Justine
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > > On Mon, Apr 1, 2024 at 3:56 PM Jun Rao
> > >> 
> > >> > >> > wrote:
> > >> > >> > > >
> > >> > >> > > > > Hi, Justine,
> > >> > >> > > > >
> > >> > >> > > > > Thanks for the updated KIP. A couple of more comments.
> > >> > >> > > > >
> > >> > >> > > > > 20. Could we show the output of version-mapping?
> > >> > >> > > > >
> > >> > >> > > > > 21. "Transaction version 1 wi

[jira] [Resolved] (KAFKA-16455) Check partition exists before send reassignments to server in ReassignPartitionsCommand

2024-04-08 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16455.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Check partition exists before send reassignments to server in 
> ReassignPartitionsCommand
> ---
>
> Key: KAFKA-16455
> URL: https://issues.apache.org/jira/browse/KAFKA-16455
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Minor
> Fix For: 3.8.0
>
>
> Currently, when executing {{kafka-reassign-partitions.sh}} with the 
> {{--execute}} option, if a partition number specified in the JSON file does 
> not exist, this check occurs only when submitting the reassignments to 
> {{alterPartitionReassignments}} on the server-side.
> We can perform this check in advance before submitting the reassignments to 
> the server side.
> For example, suppose we have three brokers with IDs 1001, 1002, and 1003, and 
> a topic named {{first_topic}} with only three partitions. And execute 
> {code:bash}
> bin/kafka-reassign-partitions.sh 
>   --bootstrap-server 192.168.0.128:9092 
>   --reassignment-json-file reassignment.json 
>   --execute
> {code}
> Where reassignment.json contains
> {code:json}
> {
>   "version": 1,
>   "partitions": [
> {
>   "topic": "first_topic",
>   "partition": 20,
>   "replicas": [1002, 1001, 1003],
>   "log_dirs": ["any", "any", "any"]
> }
>   ]
> }
> {code}
> The console outputs
> {code:java}
> Current partition replica assignment
> {"version":1,"partitions":[]}
> Save this to use as the --reassignment-json-file option during rollback
> Error reassigning partition(s):
> first_topic-20: The partition does not exist.
> {code}
> Apart from the output {{\{"version":1,"partitions":[]\}}} which doesn't 
> provide much help, the error {{first_topic-20: The partition does not 
> exist.}} is reported back to the tool from the server-side, as mentioned 
> earlier. This check could be moved earlier before sending reassignments to 
> server side



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-477: Add PATCH method for connector config in Connect REST API

2024-04-08 Thread Knowles Atchison Jr
+1 (non binding)

On Mon, Apr 8, 2024, 3:30 PM Chris Egerton  wrote:

> Thanks Ivan! +1 (binding) from me.
>
> On Mon, Apr 8, 2024, 06:59 Ivan Yurchenko  wrote:
>
> > Hello!
> >
> > I'd like to put the subj KIP[1] to a vote. Thank you.
> >
> > Best regards,
> > Ivan
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API
> >
>


Re: [DISCUSS] KIP-1022 Formatting and Updating Features

2024-04-08 Thread Jun Rao
Hi, Justine,

Thanks for the updated KIP.

One more question related to KIP-1014. It introduced a new
config unstable.metadata.versions.enable. Does each new feature need to
have a corresponding config to enable the testing of unstable features or
should we have a generic config enabling the testing of all unstable
features?

Jun

On Thu, Apr 4, 2024 at 8:24 PM Justine Olshan 
wrote:

> I'm hoping this covers the majority of comments. I will go ahead and open
> the vote in the next day or so.
>
> Thanks,
> Justine
>
> On Wed, Apr 3, 2024 at 3:31 PM Justine Olshan 
> wrote:
>
> > Find and replace has failed me :(
> >
> > Group version seems a little vague, but we can update it. Hopefully find
> > and replace won't fail me again, otherwise I will get another email on
> this.
> >
> > Justine
> >
> > On Wed, Apr 3, 2024 at 12:15 PM David Jacot  >
> > wrote:
> >
> >> Thanks, Justine.
> >>
> >> * Should we also use `group.version` (GV) as I suggested in my previous
> >> message in order to be consistent?
> >> * Should we add both names to the `Public Interfaces` section?
> >> * There is still at least one usage of `transaction.protocol.verison` in
> >> the KIP too.
> >>
> >> Best,
> >> David
> >>
> >> On Wed, Apr 3, 2024 at 6:29 PM Justine Olshan
> >> 
> >> wrote:
> >>
> >> > I had missed the David's message yesterday about the naming for
> >> transaction
> >> > version vs transaction protocol version.
> >> >
> >> > After some offline discussion with Jun, Artem, and David, we agreed
> that
> >> > transaction version is simpler and conveys more than just protocol
> >> changes
> >> > (flexible records for example)
> >> >
> >> > I will update the KIP as well as KIP-890
> >> >
> >> > Thanks,
> >> > Justine
> >> >
> >> > On Tue, Apr 2, 2024 at 2:50 PM Justine Olshan 
> >> > wrote:
> >> >
> >> > > Updated!
> >> > >
> >> > > Justine
> >> > >
> >> > > On Tue, Apr 2, 2024 at 2:40 PM Jun Rao 
> >> wrote:
> >> > >
> >> > >> Hi, Justine,
> >> > >>
> >> > >> Thanks for the reply.
> >> > >>
> >> > >> 21. Sounds good. It would be useful to document that.
> >> > >>
> >> > >> 22. Should we add the IV in "metadata.version=17 has no
> dependencies"
> >> > too?
> >> > >>
> >> > >> Jun
> >> > >>
> >> > >>
> >> > >> On Tue, Apr 2, 2024 at 11:31 AM Justine Olshan
> >> > >> 
> >> > >> wrote:
> >> > >>
> >> > >> > Jun,
> >> > >> >
> >> > >> > 21. Next producer ID field doesn't need to be populated for TV 1.
> >> We
> >> > >> don't
> >> > >> > have the same need to retain this since it is written directly to
> >> the
> >> > >> > transaction log in InitProducerId. It is only needed for KIP-890
> >> part
> >> > 2
> >> > >> /
> >> > >> > TV 2.
> >> > >> >
> >> > >> > 22. We can do that.
> >> > >> >
> >> > >> > Justine
> >> > >> >
> >> > >> > On Tue, Apr 2, 2024 at 10:41 AM Jun Rao  >
> >> > >> wrote:
> >> > >> >
> >> > >> > > Hi, Justine,
> >> > >> > >
> >> > >> > > Thanks for the reply.
> >> > >> > >
> >> > >> > > 21. What about the new NextProducerId field? Will that be
> >> populated
> >> > >> with
> >> > >> > TV
> >> > >> > > 1?
> >> > >> > >
> >> > >> > > 22. In the dependencies output, should we show both IV and
> level
> >> for
> >> > >> > > metadata.version too?
> >> > >> > >
> >> > >> > > Jun
> >> > >> > >
> >> > >> > > On Mon, Apr 1, 2024 at 4:43 PM Justine Olshan
> >> > >> >  >> > >> > > >
> >> > >> > > wrote:
> >> > >> > >
> >> > >> > > > Hi Jun,
> >> > >> > > >
> >> > >> > > > 20. I can update the KIP.
> >> > >> > > >
> >> > >> > > > 21. This is used to complete some of the work with KIP-360.
> (We
> >> > use
> >> > >> > > > previous producer ID there, but never persisted it which was
> in
> >> > the
> >> > >> KIP
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820
> >> > >> )
> >> > >> > > > The KIP also mentions including previous epoch but we
> >> explained in
> >> > >> this
> >> > >> > > KIP
> >> > >> > > > how we can figure this out.
> >> > >> > > >
> >> > >> > > > Justine
> >> > >> > > >
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > On Mon, Apr 1, 2024 at 3:56 PM Jun Rao
> >> 
> >> > >> > wrote:
> >> > >> > > >
> >> > >> > > > > Hi, Justine,
> >> > >> > > > >
> >> > >> > > > > Thanks for the updated KIP. A couple of more comments.
> >> > >> > > > >
> >> > >> > > > > 20. Could we show the output of version-mapping?
> >> > >> > > > >
> >> > >> > > > > 21. "Transaction version 1 will include the flexible fields
> >> in
> >> > the
> >> > >> > > > > transaction state log, and transaction version 2 will
> include
> >> > the
> >> > >> > > changes
> >> > >> > > > > to the transactional protocol as described by KIP-890
> (epoch
> >> > bumps
> >> > >> > and
> >> > >> > > > > implicit add partitions.)"
> >> > >> > > > >   So TV 1 enables the writing of new tagged fields like
> >> > >> > PrevProducerId?
> >> > >> > > > But
> >> > >> > > > > those fields are only usable after the epoch bump, right?
> >> What
> >> > >> > > > > functionality does T

[jira] [Resolved] (KAFKA-16477) Detect thread leaked client-metrics-reaper in tests

2024-04-08 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16477.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Detect thread leaked client-metrics-reaper in tests
> ---
>
> Key: KAFKA-16477
> URL: https://issues.apache.org/jira/browse/KAFKA-16477
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
> Fix For: 3.8.0
>
>
> After profiling the kafka tests, tons of `client-metrics-reaper` thread not 
> cleanup after BrokerServer shutdown.
> The thread {{client-metrics-reaper}} comes from 
> [ClientMetricsManager#expirationTimer|https://github.com/apache/kafka/blob/a2ee0855ee5e73f3a74555d52294bb4acfd28945/server/src/main/java/org/apache/kafka/server/ClientMetricsManager.java#L115],
>  and BrokerServer#shudown doesn't close ClientMetricsManager which let the 
> timer thread still runs in background.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2796

2024-04-08 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-477: Add PATCH method for connector config in Connect REST API

2024-04-08 Thread Chris Egerton
Thanks Ivan! +1 (binding) from me.

On Mon, Apr 8, 2024, 06:59 Ivan Yurchenko  wrote:

> Hello!
>
> I'd like to put the subj KIP[1] to a vote. Thank you.
>
> Best regards,
> Ivan
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API
>


RE: Re: [DISCUSS] KIP-899: Allow clients to rebootstrap

2024-04-08 Thread Ivan Yurchenko
Hello!

I changed the KIP a bit, specifying that the certain benefit goes to consumers 
not participating in a group, but that other clients can benefit as well in 
certain situations.

You can see the changes in the history [1]

Thank you!

Ivan

[1] 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=240881396&originalVersion=10&revisedVersion=11

On 2023/07/15 16:37:52 Ivan Yurchenko wrote:
> Hello!
> 
> I've made several changes to the KIP based on the comments:
> 
> 1. Reduced the scope to producer and consumer clients only.
> 2. Added more details to the description of the rebootstrap process.
> 3. Documented the role of low values of reconnect.backoff.max.ms in
> preventing rebootstrapping.
> 4. Some wording changes.
> 
> You can see the changes in the history [1]
> 
> I'm planning to put the KIP to a vote in some days if there are no new
> comments.
> 
> Thank you!
> 
> Ivan
> 
> [1]
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=240881396&selectedPageVersions=9&selectedPageVersions=5
> 
> On Tue, 30 May 2023 at 08:23, Ivan Yurchenko 
> wrote:
> 
> > Hi Chris and all,
> >
> > > I believe the logic you've linked is only applicable for the producer and
> > > consumer clients; the admin client does something different (see [1]).
> >
> > I see, thank you for the pointer. It seems the admin client is fairly
> > different from the producer and consumer. Probably it makes sense to reduce
> > the scope of the KIP to the producer and consumer clients only.
> >
> > > it'd be nice to have a definition of when re-bootstrapping
> > > would occur that doesn't rely on internal implementation details. What
> > > user-visible phenomena can we identify that would lead to a
> > > re-bootstrapping?
> >
> > Let's put it this way: "Re-bootstrapping means that the client forgets
> > about nodes it knows about and falls back on the bootstrap nodes as if it
> > had just been initialized. Re-bootstrapping happens when, during a metadata
> > update (which may be scheduled by `metadata.max.age.ms` or caused by
> > certain error responses like NOT_LEADER_OR_FOLLOWER, REPLICA_NOT_AVAILABLE,
> > etc.), the client doesn't have a node with an established connection or
> > establishable connection."
> > Does this sound good?
> >
> > > I also believe that if someone has "
> > > reconnect.backoff.max.ms" set to a low-enough value,
> > > NetworkClient::leastLoadedNode may never return null. In that case,
> > > shouldn't we still attempt a re-bootstrap at some point (if the user has
> > > enabled this feature)?
> >
> > Yes, you're right. Particularly `canConnect` here [1] can always be
> > returning `true` if `reconnect.backoff.max.ms` is low enough.
> > It seems pretty difficult to find a good criteria when re-bootstrapping
> > should be forced in this case, so it'd be difficult to configure and reason
> > about. I think it's worth mentioning in the KIP and later in the
> > documentation, but we should not try to do anything special here.
> >
> > > Would it make sense to re-bootstrap only after "
> > > metadata.max.age.ms" has elapsed since the last metadata update, and
> > when
> > > at least one request has been made to contact each known server and been
> > > met with failure?
> >
> > The first condition is satisfied by the check in the beginning of
> > `maybeUpdate` [2].
> > It seems to me, the second one is also satisfied by `leastLoadedNode`.
> > Admittedly, it's more relaxed than you propose: it tracks unavailability of
> > nodes that was detected by all types of requests, not only by metadata
> > requests.
> > What do you think, would this be enough?
> >
> > [1]
> > https://github.com/apache/kafka/blob/c9a42c85e2c903329b3550181d230527e90e3646/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L698
> > [2]
> > https://github.com/apache/kafka/blob/c9a42c85e2c903329b3550181d230527e90e3646/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1034-L1041
> >
> > Best,
> > Ivan
> >
> >
> > On Tue, 21 Feb 2023 at 20:07, Chris Egerton 
> > wrote:
> >
> >> Hi Ivan,
> >>
> >> I believe the logic you've linked is only applicable for the producer and
> >> consumer clients; the admin client does something different (see [1]).
> >>
> >> Either way, it'd be nice to have a definition of when re-bootstrapping
> >> would occur that doesn't rely on internal implementation details. What
> >> user-visible phenomena can we identify that would lead to a
> >> re-bootstrapping? I also believe that if someone has "
> >> reconnect.backoff.max.ms" set to a low-enough value,
> >> NetworkClient::leastLoadedNode may never return null. In that case,
> >> shouldn't we still attempt a re-bootstrap at some point (if the user has
> >> enabled this feature)? Would it make sense to re-bootstrap only after "
> >> metadata.max.age.ms" has elapsed since the last metadata update, and when
> >> at least one request has been made to contact each known server and been
> >> met with failure?

Re: [VOTE] KIP-1022 Formatting and Updating Features

2024-04-08 Thread Andrew Schofield
Hi Justine,
Thanks for the KIP.

+1 (non-binding)

Thanks,
Andrew

> On 8 Apr 2024, at 18:07, Justine Olshan  wrote:
>
> Hello all,
> I would like to start a vote for KIP-1022 Formatting and Updating Features
> 
>
> Please take a look and cast your vote.
>
> Thanks,
> Justine



[VOTE] KIP-1022 Formatting and Updating Features

2024-04-08 Thread Justine Olshan
Hello all,
I would like to start a vote for KIP-1022 Formatting and Updating Features


Please take a look and cast your vote.

Thanks,
Justine


[VOTE] KIP-477: Add PATCH method for connector config in Connect REST API

2024-04-08 Thread Ivan Yurchenko
Hello!

I'd like to put the subj KIP[1] to a vote. Thank you.

Best regards,
Ivan

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API


Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-08 Thread Andrew Schofield
Hi David,
Thanks for your questions.

70. The Group Coordinator communicates with the Share Coordinator over RPCs.
In the general case, it’s an inter-broker call. It is probably possible to 
optimise
for the situation in which the appropriate GC and SC shards are co-located, but 
the
KIP does not delve that deep into potential performance optimisations.

71. Avoiding collisions would be a good idea, but I do worry about 
retrospectively
introducing a naming convention for groups. I feel that naming conventions will
typically be the responsibility of the cluster administrators based on 
organizational
factors, such as the name of an application.

72. Personally, I don’t like INVALID_GROUP_ID because the group ID is correct 
but
the group is the wrong type. The nearest existing error code that gets that 
across
is INCONSISTENT_GROUP_PROTOCOL. Perhaps this is really showing that a new
error code would be better.

73. The Metadata fields are not used. I have removed them.

74. The metadata is re-evaluated on every change, but only a subset is relevant
for rebalancing. A check is done against the names of the subscribed topics to
see if any relevant changes may have occurred. Then the changes which trigger
a rebalance are topic creation, deletion, change in partitions, or rack IDs for 
the
replicas. I have updated the KIP to make this more clear.

75. The assignment is not persisted because it is much less important that the
assignment survives a GC change. There’s no need to transfer partitions safely 
from
member to member in the way that is required for consumer groups, so as an
optimisation, the assignments for a share group are not persisted. It wouldn’t 
do any
harm, but it just seems unnecessary.

76. In the event that a consumer tries to acknowledge a record that it now 
longer
has the right to acknowledge, the INVALID_RECORD_STATE error code is used.

If the application uses the KafkaShareConsumer.commitSync method, it will
see an InvalidRecordState exception returned. Alternatively, the application can
register an acknowledgement commit callback which will be called with the status
of the acknowledgements that have succeeded or failed.

77. I have tried to tread a careful path with the durable share-partition state 
in this
KIP. The significant choices I made are that:
* Topics are used so that the state is replicated between brokers.
* Log compaction is used to keep a lid on the storage.
* Only one topic is required.

Log compaction as it stands is not ideal for this kind of data, as evidenced by
the DeltaIndex technique I employed.

I can think of a few relatively simple ways to improve upon it.

a) We could use a customised version of the log compactor for this topic that
understands the rules for ShareCheckpoint and ShareDelta records. Essentially,
for each share-partition, the latest ShareCheckpoint and any subsequent 
ShareDelta
records must not be cleaned. Anything else can be cleaned. We could then be sure
that multiple ShareDelta records with the same key would survive cleaning and 
we could
abandon the DeltaIndex technique.

b) Actually what is required is a log per share-partition. Let’s imagine that 
we had
a share-state topic per topic being consumed in a share group, with the same 
number
of partitions as the topic being consumed. We could write many more deltas 
between
checkpoints, and just take periodic checkpoints to keep control of the storage 
used.
Once a checkpoint has been taken, we could use KafkaAdmin.deleteRecords() to
prune all of the older records.

The share-state topics would be more numerous, but we are talking one per topic
per share group that it’s being consumed in. These topics would not be 
compacted.

As you’ll see in the KIP, the Persister interface is intended to be pluggable 
one day.
I know the scheme in the KIP is not ideal. It seems likely to me that future 
KIPs will
improve upon it.

If I can get buy-in for option (b), I’m happy to change this KIP. While option 
(a) is
probably workable, it does seem a bit of a hack to have a customised log 
compactor
just for this topic.

78. How about DeliveryState? I agree that State is overloaded.

79. See (77). 

Thanks,
Andrew


> On 5 Apr 2024, at 05:07, David Arthur  
> wrote:
> 
> Andrew, thanks for the KIP! This is a pretty exciting effort.
> 
> I've finally made it through the KIP, still trying to grok the whole thing.
> Sorry if some of my questions are basic :)
> 
> 
> Concepts:
> 
> 70. Does the Group Coordinator communicate with the Share Coordinator over
> RPC or directly in-process?
> 
> 71. For preventing name collisions with regular consumer groups, could we
> define a reserved share group prefix? E.g., the operator defines "sg_" as a
> prefix for share groups only, and if a regular consumer group tries to use
> that name it fails.
> 
> 72. When a consumer tries to use a share group, or a share consumer tries
> to use a regular group, would INVALID_GROUP_ID make more sense
> than INCONSISTENT_GROUP_PROTOC

[jira] [Resolved] (KAFKA-16478) Links for Kafka 3.5.2 release are broken

2024-04-08 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-16478.

Resolution: Fixed

> Links for Kafka 3.5.2 release are broken
> 
>
> Key: KAFKA-16478
> URL: https://issues.apache.org/jira/browse/KAFKA-16478
> Project: Kafka
>  Issue Type: Bug
>  Components: website
>Affects Versions: 3.5.2
>Reporter: Philipp Trulson
>Assignee: Mickael Maison
>Priority: Major
>
> While trying to update our setup, I noticed that the download links for the 
> 3.5.2 links are broken. They all point to a different host and also contain 
> an additional `/kafka` in their URL. Compare:
> not working:
> [https://downloads.apache.org/kafka/kafka/3.5.2/RELEASE_NOTES.html]
> working:
> [https://archive.apache.org/dist/kafka/3.5.1/RELEASE_NOTES.html]
> [https://downloads.apache.org/kafka/3.6.2/RELEASE_NOTES.html]
> This goes for all links in the release - archives, checksums, signatures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Fix links for Kafka 3.5.2 [kafka-site]

2024-04-08 Thread via GitHub


mimaison merged PR #595:
URL: https://github.com/apache/kafka-site/pull/595


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



RE: [EXT] Re: [DISCUSS] KIP-1033: Add Kafka Streams exception handler for exceptions occuring during processing

2024-04-08 Thread Sebastien Viale
Thanks for your review!

 All the points make sense for us!



We updated the KIP for points 1 and 4.



2/ We followed the DeserializationExceptionHandler interface signature, it was 
not on our mind that the record be forwarded with the ProcessorContext.

   The ProcessingContext is sufficient, we do expect that most people would 
need to access the RecordMetadata.



3/ The use of Record is required, as the error could occurred 
in the middle of a processor where records could be non serializable objects

 As it is a global error catching, the user may need little information about 
the faulty record.

 Assuming that users want to make some specific treatments to the record, they 
can add a try / catch block in the topology.

 It is up to users to cast record value and key in the implementation of the 
ProcessorExceptionHandler.



Cheers

Loïc, Damien and Sébastien


De : Sophie Blee-Goldman 
Envoyé : samedi 6 avril 2024 01:08
À : dev@kafka.apache.org 
Objet : [EXT] Re: [DISCUSS] KIP-1033: Add Kafka Streams exception handler for 
exceptions occuring during processing

Warning External sender Do not click on any links or open any attachments 
unless you trust the sender and know the content is safe.

Hi Damien,

First off thanks for the KIP, this is definitely a much needed feature. On
the
whole it seems pretty straightforward and I am in favor of the proposal.
Just
a few questions and suggestions here and there:

1. One of the #handle method's parameters is "ProcessorNode node", but
ProcessorNode is an internal class (and would expose a lot of internals
that we probably don't want to pass in to an exception handler). Would it
be sufficient to just make this a String and pass in the processor name?

2. Another of the parameters in the ProcessorContext. This would enable
the handler to potentially forward records, which imo should not be done
from the handler since it could only ever call #forward but not direct where
the record is actually forwarded to, and could cause confusion if users
aren't aware that the handler is effectively calling from the context of the
processor that threw the exception.
2a. If you don't explicitly want the ability to forward records, I would
suggest changing the type of this parameter to ProcessingContext, which
has all the metadata and useful info of the ProcessorContext but without
the
forwarding APIs. This would also lets us sidestep the following issue:
2b. If you *do* want the ability to forward records, setting aside whether
that
in of itself makes sense to do, we would need to pass in either a regular
ProcessorContext or a FixedKeyProcessorContext, depending on what kind
of processor it is. I'm not quite sure how we could design a clean API here,
so I'll hold off until you clarify whether you even want forwarding or not.
We would also need to split the input record into a Record vs FixedKeyRecord

3. One notable difference between this handler and the existing ones you
pointed out, the Deserialization/ProductionExceptionHandler, is that the
records passed in to those are in serialized bytes, whereas the record
here would be POJOs. You account for this by making the parameter
type a Record, but I just wonder how users would be
able to read the key/value and figure out what type it should be. For
example, would they need to maintain a map from processor name to
input record types?

If you could provide an example of this new feature in the KIP, it would be
very helpful in understanding whether we could do something to make it
easier for users to use, for if it would be fine as-is

4. We should include all the relevant info for a new metric, such as the
metric
group and recording level. You can look at other metrics KIPs like KIP-444
and KIP-613 for an example. I suspect you intend for this to be in the
processor group and at the INFO level?

Hope that all makes sense! Thanks again for the KIP

-Sophie

On Fri, Mar 29, 2024 at 6:16 AM Damien Gasparina 
wrote:

> Hi everyone,
>
> After writing quite a few Kafka Streams applications, me and my colleagues
> just created KIP-1033 to introduce a new Exception Handler in Kafka Streams
> to simplify error handling.
> This feature would allow defining an exception handler to automatically
> catch exceptions occurring during the processing of a message.
>
> KIP link:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1033%3A+Add+Kafka+Streams+exception+handler+for+exceptions+occuring+during+processing
>
> Feedbacks and suggestions are welcome,
>
> Cheers,
> Damien, Sebastien and Loic
>

This email was screened for spam and malicious content but exercise caution 
anyway.




[jira] [Created] (KAFKA-16486) Integrate metric measurability changes in metrics collector

2024-04-08 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-16486:
-

 Summary: Integrate metric measurability changes in metrics 
collector
 Key: KAFKA-16486
 URL: https://issues.apache.org/jira/browse/KAFKA-16486
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16485) Fix broker metrics to follow kebab/hyphen case

2024-04-08 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-16485:
-

 Summary: Fix broker metrics to follow kebab/hyphen case
 Key: KAFKA-16485
 URL: https://issues.apache.org/jira/browse/KAFKA-16485
 Project: Kafka
  Issue Type: Improvement
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1031: Control offset translation in MirrorSourceConnector

2024-04-08 Thread Omnia Ibrahim
Hi Chris, 
Validation method is a good call. I updated the KIP to state that the 
checkpoint connector will fail if the configs aren’t correct. And updated the 
description of the new config to explain the impact of it on checkpoint 
connector as well. 

If there is no any other feedback from anyone I would like to start the voting 
thread in few days. 
Thanks 
Omnia

> On 8 Apr 2024, at 06:31, Chris Egerton  wrote:
> 
> Hi Omnia,
> 
> Ah, good catch. I think failing to start the checkpoint connector if offset
> syncs are disabled is fine. We'd probably want to do that via the
> Connector::validate [1] method in order to be able to catch invalid configs
> during preflight validation, but it's not necessary to get that specific in
> the KIP (especially since we may add other checks as well).
> 
> [1] -
> https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/connector/Connector.html#validate(java.util.Map)
> 
> Cheers,
> 
> Chris
> 
> On Thu, Apr 4, 2024 at 8:07 PM Omnia Ibrahim 
> wrote:
> 
>> Thanks Chris for the feedback
>>> 1. It'd be nice to mention that increasing the max offset lag to INT_MAX
>>> could work as a partial workaround for users on existing versions (though
>>> of course this wouldn't prevent creation of the syncs topic).
>> I updated the KIP
>> 
>>> 2. Will it be illegal to disable offset syncs if other features that rely
>>> on offset syncs are explicitly enabled in the connector config? If
>> they're
>>> not explicitly enabled then it should probably be fine to silently
>> disable
>>> them, but I'd be interested in your thoughts.
>> The rest of the features that relays on this is controlled by
>> emit.checkpoints.enabled (enabled by default) and
>> sync.group.offsets.enabled (disabled by default) which are part of
>> MirrorCheckpointConnector config not MirrorSourceConnector, I was thinking
>> that MirrorCheckpointConnector should fail to start if
>> emit.offset-syncs.enabled is disabled while emit.checkpoints.enabled and/or
>> sync.group.offsets.enabled are enabled as no point of creating this
>> connector if the main part is disabled. WDYT?
>> 
>> Thanks
>> Omnia
>> 
>>> On 3 Apr 2024, at 12:45, Chris Egerton  wrote:
>>> 
>>> Hi Omnia,
>>> 
>>> Thanks for the KIP! Two small things come to mind:
>>> 
>>> 1. It'd be nice to mention that increasing the max offset lag to INT_MAX
>>> could work as a partial workaround for users on existing versions (though
>>> of course this wouldn't prevent creation of the syncs topic).
>>> 
>>> 2. Will it be illegal to disable offset syncs if other features that rely
>>> on offset syncs are explicitly enabled in the connector config? If
>> they're
>>> not explicitly enabled then it should probably be fine to silently
>> disable
>>> them, but I'd be interested in your thoughts.
>>> 
>>> Cheers,
>>> 
>>> Chris
>>> 
>>> On Wed, Apr 3, 2024, 20:41 Luke Chen  wrote:
>>> 
 Hi Omnia,
 
 Thanks for the KIP!
 It LGTM!
 But I'm not an expert of MM2, it would be good to see if there is any
>> other
 comment from MM2 experts.
 
 Thanks.
 Luke
 
 On Thu, Mar 14, 2024 at 6:08 PM Omnia Ibrahim 
 wrote:
 
> Hi everyone, I would like to start a discussion thread for KIP-1031:
> Control offset translation in MirrorSourceConnector
> 
> 
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1031%3A+Control+offset+translation+in+MirrorSourceConnector
> 
> Thanks
> Omnia
> 
 
>> 
>> 



Re: [VOTE] KIP-1020 Move `window.size.ms` and `windowed.inner.serde.class` from `StreamsConfig` to TimeWindowedDe/Serializer class

2024-04-08 Thread Lucas Brutschy
+1 (binding)

Thanks, Lucia!

On Wed, Apr 3, 2024 at 11:35 PM Sophie Blee-Goldman
 wrote:
>
> +1 (binding)
>
> Thanks Lucia!
>
> On Tue, Apr 2, 2024 at 12:23 AM Matthias J. Sax  wrote:
>
> > +1 (binding)
> >
> >
> > -Matthias
> >
> > On 4/1/24 7:44 PM, Lucia Cerchie wrote:
> > > Hello everyone,
> > >
> > > I'd like to call a vote on KIP-1020
> > > <
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=290982804
> > >.
> > > It has been under discussion since Feb 15, and has received edits to the
> > > KIP and approval by discussion participants.
> > >
> > > Best,
> > > Lucia Cerchie
> > >
> >


Re: [DISCUSS] KIP-1028: Docker Official Image for Apache Kafka

2024-04-08 Thread Krish Vora
Hi Manikumar and Luke.
Thanks for the questions.

1. No, the Docker inventory files and configurations will not be the same
for Open Source Software (OSS) Images and Docker Official Images (DOI).

For OSS images, the Dockerfile located in docker/jvm/dockerfile is
utilized. This process is integrated with the existing release pipeline as
outlined in KIP-975
,
where the Kafka URL is provided as a build argument. This method allows for
building, testing, and releasing OSS images dynamically. The OSS images
will continue to be released under the standard release process .

In contrast, the release process for DOIs requires providing the Docker Hub
team with a specific directory for each version release that contains a
standalone Dockerfile. These Dockerfiles are designed to be
self-sufficient, hence require hardcoded values instead of relying on build
arguments. To accommodate this, in our proposed approach, a new directory
named docker_official_images has been created. This directory contains
version-specific directories, having Dockerfiles with hardcoded
configurations for each release, acting as the source of truth for DOI
releases. The hardcoded dockerfiles will be created using the
docker/jvm/dockerfile as a template. Thus, as part of post release we will
be creating a Dockerfile that will be reviewed by the Dockerhub community
and might need changes as per their review. This approach ensures that DOIs
are built consistently and meet the specific requirements set by Docker Hub.

2. Yes Manikumar, transitioning the release of Docker Official Images (DOI)
to a post-release activity does address the concerns about complicating the
release process. Initially, we considered incorporating DOI release
directly into Kafka's release workflow. However, this approach
significantly increased the RMs workload due to the addition of numerous
steps, complicating the process. By designating the DOI release as a
post-release task, we maintain the original release process. This
adjustment allows for the DOI release to be done after the main release. We
have revised the KIP to reflect that DOI releases will now occur after the
main release phase. Please review the updated document and provide any
feedback you might have.

Thanks,
Krish.

On Wed, Apr 3, 2024 at 3:35 PM Luke Chen  wrote:

> Hi Krishna,
>
> I also have the same question as Manikumar raised:
> 1. Will the Docker inventory files/etc are the same for OSS Image and
> Docker Official Images?
> If no, then why not? Could we make them identical so that we don't have to
> build 2 images for each release?
>
> Thank you.
> Luke
>
> On Wed, Apr 3, 2024 at 12:41 AM Manikumar 
> wrote:
>
> > Hi Krishna,
> >
> > Thanks for the KIP.
> >
> > I think Docker Official Images will be beneficial to the Kafka community.
> > Few queries below.
> >
> > 1. Will the Docker inventory files/etc are the same for OSS Image and
> > Docker Official Images
> > 2. I am a bit worried about the new steps to the release process. Maybe
> we
> > should consider Docker Official Images release as Post-Release activity.
> >
> > Thanks,
> >
> > On Fri, Mar 22, 2024 at 3:29 PM Krish Vora 
> wrote:
> >
> > > Hi Hector,
> > >
> > > Thanks for reaching out. This KIP builds on top of KIP-975
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka
> > > >
> > > and
> > > aims to introduce a JVM-based Docker Official Image (DOI
> > > ) for Apache
> > > Kafka that will be visible under Docker Official Images
> > > . Once
> > implemented
> > > for Apache Kafka, for each release, there will be one more JVM-based
> > Docker
> > > image available to users.
> > >
> > > Currently, we already have an OSS sponsored image, which was introduced
> > via
> > > KIP-975 (apache/kafka )
> > which
> > > comes under The Apache Software Foundation <
> > > https://hub.docker.com/u/apache> in
> > > Docker Hub. The new Docker Image is the Docker Official Image (DOI),
> > which
> > > will be built and maintained by Docker Community.
> > >
> > > For example, for a release version like 3.8.0 we will have two JVM
> based
> > > docker images:-
> > >
> > >- apache/kafka:3.8.0 (OSS sponsored image)
> > >- kafka:3.8.0 (Docker Official image)
> > >
> > >
> > > I have added the same in the KIP too for everyone's reference.
> > > Thanks,
> > > Krish.
> > >
> > > On Fri, Mar 22, 2024 at 2:50 AM Hector Geraldino (BLOOMBERG/ 919 3RD
> A) <
> > > hgerald...@bloomberg.net> wrote:
> > >
> > > > Hi,
> > > >
> > > > What is the difference between this KIP and KIP-975: Docker Image for
> > > > Apache Kafka?
> > > >
> > > > From: dev@kafka.apache.org At: 03/21/24 07:30:07 UTC-4:00To:
> > > > dev@kafka.apa