Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2003

2023-07-14 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2002

2023-07-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 386898 lines...]

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterInner[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterOuter[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterOuter[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithRightVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithRightVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testLeftWithLeftVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testLeftWithLeftVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithLeftVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithLeftVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskAssignorIntegrationTest > shouldProperlyConfigureTheAssignor STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskAssignorIntegrationTest > shouldProperlyConfigureTheAssignor PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectEndOffsetInformation STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectEndOffsetInformation PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectCommittedOffsetInformation 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectCommittedOffsetInformation 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldReduce(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldReduce(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldGroupByKey(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldGroupByKey(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldReduceWindowed(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldReduceWindowed(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamKStreamIntegrationTest > shouldOuterJoin() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamKStreamIntegrationTest > shouldOuterJoin() PASSED

Gradle Test Run 

[jira] [Created] (KAFKA-15192) Network thread receives exception when updating request metrics

2023-07-14 Thread David Mao (Jira)
David Mao created KAFKA-15192:
-

 Summary: Network thread receives exception when updating request 
metrics
 Key: KAFKA-15192
 URL: https://issues.apache.org/jira/browse/KAFKA-15192
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Reporter: David Mao


We noticed an exception being thrown from the network threads when updating 
some of the request histograms. Example stack trace:

 
java.util.NoSuchElementException
at 
java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2064)
at 
com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:102)
at 
com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
at com.yammer.metrics.core.Histogram.update(Histogram.java:110)
 

Searching the error I found a similar ticket resolved in Cassandra by updating 
their dropwizard dependency to pull in 
[https://github.com/dropwizard/metrics/pull/1436]. 
https://issues.apache.org/jira/browse/CASSANDRA-15472

Kafka currently still uses yammer metrics, so we would need to take 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-510%3A+Metrics+library+upgrade]
 forward to upgrade to a dropwizard version that fixes this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Philip Nee
Hey Mayank,

Thanks for the KIP. I think this is a great proposal, and I'm in favor
of this idea.  A few comments:

1. Claiming metadata refresh is done asynchronously is misleading.  The
metadata refresh requires Network Client to be physically polled, which is
done in a separate thread in Producer and Admin Client (IIRC!) but not
Consumer.
2. There are other API calls that might result in NOT_LEADER_OR_FOLLOWER
response, but it seems like this KIP only wants to update on fetch and
produce. Do we want to make the leader information available for other API
calls?
3. Do you know what would happen during a leader election? I'm not sure
about this process and I wonder if the current metadata response uses the
old leader or null as the leader isn't readily available yet.

Thanks,
P

On Fri, Jul 14, 2023 at 11:30 AM Kirk True  wrote:

> Hi Mayank,
>
> > On Jul 14, 2023, at 11:25 AM, Mayank Shekhar Narula <
> mayanks.nar...@gmail.com> wrote:
> >
> > Kirk
> >
> >
> >> Is the requested restructuring of the response “simply” to preserve
> bytes,
> >> or is it possible that the fetch response could/should/would return
> >> leadership changes for partitions that we’re specifically requested?
> >>
> >
> > Moving endpoints to top-level fields would preserve bytes, otherwise the
> > endpoint-information would be duplicated if included with the
> > partition-data in the response. Right now, the top-level field will only
> be
> > set in case leader changes for any requested partitions. But it can be
> > re-used in the future, for which Jose has a use-case in mind shared up in
> > the thread. KIP is now upto date with endpoint info being at top-level.
>
>
> I didn’t catch before that there was a separate section for the full node
> information, not just the ID and epoch.
>
> Thanks!
>
> >>> 3. In the future, I may use this information in the KRaft/Metadata
> >>> implementation of FETCH. In that implementation not all of the
> >>> replicas are brokers.
> >>
> >> Side point: any references to the change you’re referring to? The idea
> of
> >> non-brokers serving as replicas is blowing my mind a bit :)
> >>
> >>
> > Jose, I missed this as well, would love to know more about non-broker
> > serving as replica!
> > --
> > Regards,
> > Mayank Shekhar Narula
>
>


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2001

2023-07-14 Thread Apache Jenkins Server
See 




Re: KafkaConsumer refactor proposal

2023-07-14 Thread Kirk True
Hi Erik,

Regarding the consumer refactor project, we’re in the process of converting 
Philip’s design to a “proper” KIP here:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-945%3A+Update+threading+model+for+Consumer
 

It’s still very much a draft and not ready for a formal DISCUSS thread, but 
we’d welcome feedback.

That said, the callback issue being discussed here may be better served with a 
dedicated KIP so as not to entangle the fate of one with the other.

Thanks,
Kirk

> On Jul 13, 2023, at 11:44 AM, Erik van Oosten  
> wrote:
> 
> Hi Philip,
> 
> I have been scanning through 
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design
>  and KIP-848 and from this I understand that the kafka consumer API will not 
> change.
> 
> Perhaps the refactoring and/or KIP-848 is a good opportunity to improve the 
> API somewhat. In this email I explain why and also give a rough idea what 
> that could look like.
> 
> In the current API, the rebalance listener callback gives the user a chance 
> to commit all work in progress before a partition is actually revoked and 
> assigned to another consumer.
> 
> While the callback is doing all this, the main user thread is not able to 
> process new incoming data. So the rebalance listener affects latency and 
> throughput for non-revoked partitions during a rebalance.
> 
> In addition, I feel that doing a lot of stuff /in/ a callback is always quite 
> awkward. Better only use it to trigger some processing elsewhere.
> 
> Therefore, I would like to propose a new API that does not have these 
> problems and is easy to use (and I hope still easy to implement). In my ideal 
> world, poll is the only method that you need. Lets call it poll2 (to do: come 
> up with a less crappy name). Poll2 returns more than just the polled records, 
> it will also contain newly assigned partitions, partitions that will be 
> revoked during the next call to poll2, partitions that were lost, and perhaps 
> it will even contain the offsets committed so far.
> 
> The most important idea here is that partitions are not revoked immediately, 
> but in the next call to poll2.
> 
> With this API, a user can commit offsets at their own pace during a 
> rebalance. Optionally, for the case that processing of data from the 
> to-be-revoked partition is stil ongoing, we allow the user to postpone the 
> actual revocation in the next poll, so that polling can continue for other 
> partitions.
> 
> Since we are no longer blocking the main user thread, partitions that are not 
> revoked can be processed at full speed.
> 
> Removal of the rebalance listener also makes the API safer; there is no more 
> need for the thread-id check (nor KIP-944) because, concurrent invocations 
> are simply no longer needed. (Of course, if backward compatibility is a goal, 
> not all of these things can be done.)
> 
> Curious to your thoughts and kind regards,
> Erik.
> 
> -- 
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
> 



Re: KafkaConsumer refactor proposal

2023-07-14 Thread Kirk True
Hi Erik,

Thanks for your interest in the client world :)

I agree that the current implementation of how we handle callbacks is 
problematic in that it introduces race conditions and/or bottlenecks.

I don’t have as much experience with the callbacks from an application 
developer standpoint. Is it critical for the invocation of the callbacks to 
happen within the processing of the commit/rebalance/etc.? If callbacks could 
be redesigned to *not* be in the critical path, would that break user 
applications?

Thanks,
Kirk

> On Jul 13, 2023, at 11:44 AM, Erik van Oosten  
> wrote:
> 
> Hi Philip,
> 
> I have been scanning through 
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design
>  and KIP-848 and from this I understand that the kafka consumer API will not 
> change.
> 
> Perhaps the refactoring and/or KIP-848 is a good opportunity to improve the 
> API somewhat. In this email I explain why and also give a rough idea what 
> that could look like.
> 
> In the current API, the rebalance listener callback gives the user a chance 
> to commit all work in progress before a partition is actually revoked and 
> assigned to another consumer.
> 
> While the callback is doing all this, the main user thread is not able to 
> process new incoming data. So the rebalance listener affects latency and 
> throughput for non-revoked partitions during a rebalance.
> 
> In addition, I feel that doing a lot of stuff /in/ a callback is always quite 
> awkward. Better only use it to trigger some processing elsewhere.
> 
> Therefore, I would like to propose a new API that does not have these 
> problems and is easy to use (and I hope still easy to implement). In my ideal 
> world, poll is the only method that you need. Lets call it poll2 (to do: come 
> up with a less crappy name). Poll2 returns more than just the polled records, 
> it will also contain newly assigned partitions, partitions that will be 
> revoked during the next call to poll2, partitions that were lost, and perhaps 
> it will even contain the offsets committed so far.
> 
> The most important idea here is that partitions are not revoked immediately, 
> but in the next call to poll2.
> 
> With this API, a user can commit offsets at their own pace during a 
> rebalance. Optionally, for the case that processing of data from the 
> to-be-revoked partition is stil ongoing, we allow the user to postpone the 
> actual revocation in the next poll, so that polling can continue for other 
> partitions.
> 
> Since we are no longer blocking the main user thread, partitions that are not 
> revoked can be processed at full speed.
> 
> Removal of the rebalance listener also makes the API safer; there is no more 
> need for the thread-id check (nor KIP-944) because, concurrent invocations 
> are simply no longer needed. (Of course, if backward compatibility is a goal, 
> not all of these things can be done.)
> 
> Curious to your thoughts and kind regards,
> Erik.
> 
> -- 
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
> 



Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-14 Thread Kirk True
Hi Erik,

Thanks for the KIP!

I empathize with your frustration over the radio silence. It gets like that 
sometimes, and I apologize for my lack of feedback.

I’d personally like to see this lively exchange move over to the DISCUSS thread 
you’d created before.

Thanks,
Kirk

> On Jul 14, 2023, at 1:33 AM, Erik van Oosten  
> wrote:
> 
> Hi Colin,
> 
> The way I understood Philp's message is that KIP-944 also plays nice with 
> KIP-945. But I might be mistaken.
> 
> Regardless, KIP-945 does /not/ resolve the underlying problem (the need for 
> nested consumer invocations) because it has the explicit goal of not changing 
> the user facing API.
> 
> > ... KIP-945 but haven't posted a DISCUSS thread yet
> 
> There is a thread called 'KafkaConsumer refactor proposal', but indeed no 
> official discussion yet.
> 
> > I really don't want to be debugging complex interactions between Java 
> > thread-local variables and green threads.
> 
> In that email thread, I proposed an API change in which callbacks are no 
> longer needed. The proposal completely removes the need for such complex 
> interactions. In addition, it gives clients the ability to process at full 
> speed even while a coorperative rebalance is ongoing.
> 
> Regards,
> Erik.
> 
> Op 14-07-2023 om 00:36 schreef Colin McCabe:
>> HI Philip & Erik,
>> 
>> Hmm... if we agree that KIP-945 addresses this use case, I think it would be 
>> better to just focus on that KIP. Fundamentally it's a better and cleaner 
>> model than a complex scheme involving thread-local variables. I really don't 
>> want to be debugging complex interactions between Java thread-local 
>> variables and green threads.
>> 
>> It also generally helps to have some use-cases in mind when writing these 
>> things. If we get feedback about what would be useful for async runtimes, 
>> that would probably help improve and focus KIP-945. By the way, I can see 
>> you have a draft on the wiki for KIP-945 but haven't posted a DISCUSS thread 
>> yet, so I assume it's not ready for review yet ;)
>> 
>> best,
>> Colin
>> 
>> 
>> On Tue, Jul 11, 2023, at 12:24, Philip Nee wrote:
>>> Hey Erik - Another thing I want to add to my comment is.  We are in-process
>>> of re-writing the KafkaConsumer, and I think your proposal would work in
>>> the new consumer because we are going to separate the user thread and the
>>> background thread.  Here is the 1-pager, and we are in process of
>>> converting this in to KIP-945.
>>> 
>>> Thanks,
>>> P
>>> 
>>> On Tue, Jul 11, 2023 at 10:33 AM Philip Nee  wrote:
>>> 
 Hey Erik,
 
 Sorry for holding up this email for a few days since Colin's response
 includes some of my concerns.  I'm in favor of this KIP, and I think your
 approach seems safe.  Of course, I probably missed something therefore I
 think this KIP needs to cover different use cases to demonstrate it doesn't
 cause any unsafe access. I think this can be demonstrated via diagrams and
 some code in the KIP.
 
 Thanks,
 P
 
 On Sat, Jul 8, 2023 at 12:28 PM Erik van Oosten
  wrote:
 
> Hello Colin,
> 
>  >> In KIP-944, the callback thread can only delegate to another thread
> after reading from and writing to a threadlocal variable, providing the
> barriers right there.
> 
>  > I don't see any documentation that accessing thread local variables
> provides a total store or load barrier. Do you have such documentation?
> It seems like if this were the case, we could eliminate volatile
> variables from most of the code base.
> 
> Now I was imprecise. The thread-locals are only somewhat involved. In
> the KIP proposal the callback thread reads an access key from a
> thread-local variable. It then needs to pass that access key to another
> thread, which then can set it on its own thread-local variable. The act
> of passing a value from one thread to another implies that a memory
> barrier needs to be passed. However, this is all not so relevant since
> there is no need to pass the access key back when the other thread is
> done.
> 
> But now I think about it a bit more, the locking mechanism runs in a
> synchronized block. If I remember correctly this should be enough to
> pass read and write barriers.
> 
>  >> In the current implementation the consumer is also invoked from
> random threads. If it works now, it should continue to work.
>  > I'm not sure what you're referring to. Can you expand on this?
> 
> Any invocation of the consumer (e.g. method poll) is not from a thread
> managed by the consumer. This is what I was assuming you meant with the
> term 'random thread'.
> 
>  > Hmm, not sure what you mean by "cooperate with blocking code." If you
> have 10 green threads you're multiplexing on to one CPU thread, and that
> CPU thread gets blocked because of what one green thread is doing, the
> other 

Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Kirk True
Hi Mayank,

> On Jul 14, 2023, at 11:25 AM, Mayank Shekhar Narula 
>  wrote:
> 
> Kirk
> 
> 
>> Is the requested restructuring of the response “simply” to preserve bytes,
>> or is it possible that the fetch response could/should/would return
>> leadership changes for partitions that we’re specifically requested?
>> 
> 
> Moving endpoints to top-level fields would preserve bytes, otherwise the
> endpoint-information would be duplicated if included with the
> partition-data in the response. Right now, the top-level field will only be
> set in case leader changes for any requested partitions. But it can be
> re-used in the future, for which Jose has a use-case in mind shared up in
> the thread. KIP is now upto date with endpoint info being at top-level.


I didn’t catch before that there was a separate section for the full node 
information, not just the ID and epoch.

Thanks!

>>> 3. In the future, I may use this information in the KRaft/Metadata
>>> implementation of FETCH. In that implementation not all of the
>>> replicas are brokers.
>> 
>> Side point: any references to the change you’re referring to? The idea of
>> non-brokers serving as replicas is blowing my mind a bit :)
>> 
>> 
> Jose, I missed this as well, would love to know more about non-broker
> serving as replica!
> -- 
> Regards,
> Mayank Shekhar Narula



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Kirk True
Hi Andrew,

Good point.

Sorry to be dim bulb, but I’m still not sure I understand the downsides of 
bumping the version. The broker and all client implementations would have to 
change to fully support this feature anyway, but down version clients are 
handled by brokers already.

Thanks,
Kirk

> On Jul 13, 2023, at 10:30 AM, Andrew Schofield 
>  wrote:
> 
> Hi Mayank,
> If we bump the version, the broker can tell whether it’s worth providing the 
> leader
> endpoint information to the client when the leader has changed. That’s my 
> reasoning.
> 
> Thanks,
> Andrew
> 
>> On 13 Jul 2023, at 18:02, Mayank Shekhar Narula  
>> wrote:
>> 
>> Thanks both for looking into this.
>> 
>> Jose,
>> 
>> 1/2 & 4(changes for PRODUCE) & 5 makes sense, will follow
>> 
>> 3. If I understood this correctly, certain replicas "aren't" brokers, what
>> are they then?
>> 
>> Also how about replacing "Replica" with "Leader", this is more readable on
>> the client. so, how about this?
>>   { "name": "LeaderEndpoints", "type": "[]Leader", "versions": "15+",
>> "taggedVersions": "15+", "tag": 3,
>> "about": "Endpoints for all current leaders enumerated in
>> PartitionData.", "fields": [
>> { "name": "NodeId", "type": "int32", "versions": "15+",
>>   "mapKey": true, "entityType": "brokerId", "about": "The ID of the
>> associated leader"},
>> { "name": "Host", "type": "string", "versions": "15+",
>>   "about": "The leader's hostname." },
>> { "name": "Port", "type": "int32", "versions": "15+",
>>   "about": "The leader's port." },
>> { "name": "Rack", "type": "string", "versions": "15+", "ignorable":
>> true, "default": "null",
>>   "about": "The rack of the leader, or null if it has not been
>> assigned to a rack." }
>>   ]}
>> 
>> Andrew
>> 
>> 6. I wonder if non-Kafka clients might benefit from not bumping the
>> version. If versions are bumped, say for FetchResponse to 16, I believe
>> that client would have to support all versions until 16 to fully utilise
>> this feature. Whereas, if not bumped, they can simply support until version
>> 12( will change to version:12 for tagged fields ), and non-AK clients can
>> then implement this feature. What do you think? I am inclined to not bump.
>> 
>> On Thu, Jul 13, 2023 at 5:21 PM Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>> 
>>> Hi José,
>>> Thanks. Sounds good.
>>> 
>>> Andrew
>>> 
 On 13 Jul 2023, at 16:45, José Armando García Sancio
>>>  wrote:
 
 Hi Andrew,
 
 On Thu, Jul 13, 2023 at 8:35 AM Andrew Schofield
  wrote:
> I have a question about José’s comment (2). I can see that it’s
>>> possible for multiple
> partitions to change leadership to the same broker/node and it’s
>>> wasteful to repeat
> all of the connection information for each topic-partition. But, I
>>> think it’s important to
> know which partitions are now lead by which node. That information at
>>> least needs to be
> per-partition I think. I may have misunderstood, but it sounded like
>>> your comment
> suggestion lost that relationship.
 
 Each partition in both the FETCH response and the PRODUCE response
 will have the CurrentLeader, the tuple leader id and leader epoch.
 Clients can use this information to update their partition to leader
 id and leader epoch mapping.
 
 They can also use the NodeEndpoints to update their mapping from
 replica id to the tuple host, port and rack so that they can connect
 to the correct node for future FETCH requests and PRODUCE requests.
 
 Thanks,
 --
 -José
>>> 
>>> 
>> 
>> -- 
>> Regards,
>> Mayank Shekhar Narula
> 



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Mayank Shekhar Narula
Kirk


> Is the requested restructuring of the response “simply” to preserve bytes,
> or is it possible that the fetch response could/should/would return
> leadership changes for partitions that we’re specifically requested?
>

Moving endpoints to top-level fields would preserve bytes, otherwise the
endpoint-information would be duplicated if included with the
partition-data in the response. Right now, the top-level field will only be
set in case leader changes for any requested partitions. But it can be
re-used in the future, for which Jose has a use-case in mind shared up in
the thread. KIP is now upto date with endpoint info being at top-level.


> > 3. In the future, I may use this information in the KRaft/Metadata
> > implementation of FETCH. In that implementation not all of the
> > replicas are brokers.
>
> Side point: any references to the change you’re referring to? The idea of
> non-brokers serving as replicas is blowing my mind a bit :)
>
>
Jose, I missed this as well, would love to know more about non-broker
serving as replica!
-- 
Regards,
Mayank Shekhar Narula


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Kirk True
Hi Mayank,

Thanks for the KIP!

Questions:

1. From the standpoint of the client, does it matter if the cluster is running 
in KRaft mode vs. Zookeeper? Will the behavior somehow be subtlety different 
given that metadata propagation is handled differently between the two?

2. Is there anything we need to do to handle the case where the new leader 
information appears after retries? Suppose the first two attempts to send a 
produce fail, in which case we hit the backoff logic. On the third attempt, the 
broker/node returns new leader information. Would the fourth attempt (with the 
new leader) still be performed without any delay? To be honest, I’m not sure 
that case is valid, but I would assume it would retry immediately, right?

Thanks,
Kirk

> On Jul 13, 2023, at 7:15 AM, Mayank Shekhar Narula  
> wrote:
> 
> Hi everyone
> 
> Following KIP is up for discussion. Thanks for your feedback.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client
> 
> -- 
> Regards,
> Mayank Shekhar Narula



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Mayank Shekhar Narula
Jose,

Using the term "node" makes sense since it  follows convention, and then
"NodeEndpoints" can be reused in future. Update the KIP with protocol
message changes.



On Fri, Jul 14, 2023 at 5:53 PM José Armando García Sancio
 wrote:

> Hi Mayank,
>
> On Thu, Jul 13, 2023 at 10:03 AM Mayank Shekhar Narula
>  wrote:
> > 3. If I understood this correctly, certain replicas "aren't" brokers,
> what
> > are they then?
>
> In a Kafka KRaft cluster they can be either brokers, controllers or
> both. The term we use is node. A Kafka node can be either a broker,
> controller or both. For example, we have the following configuration
> documentation:
>
> node.id
> The node ID associated with the roles this process is playing when
> `process.roles` is non-empty. This is required configuration when
> running in KRaft mode.
>
> From https://kafka.apache.org/documentation/#brokerconfigs_node.id
>
> > Also how about replacing "Replica" with "Leader", this is more readable
> on
> > the client. so, how about this?
>
> I vote for "Node". Note that this array is a mapping of replica/node
> id to their endpoint. For example, in the future we may use it to send
> endpoint information when sending the replica id to fetch from a
> follower.
>
> Thanks,
> --
> -José
>


-- 
Regards,
Mayank Shekhar Narula


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Kirk True
Hi Mayank/José,

> On Jul 13, 2023, at 8:20 AM, José Armando García Sancio 
>  wrote:
> 
> Hi Mayank, thanks for the KIP. I look forward to this improvement for
> new clients.
> 
> Some comments below.
> 
> On Thu, Jul 13, 2023 at 7:15 AM Mayank Shekhar Narula
>  wrote:
>> Following KIP is up for discussion. Thanks for your feedback
> 
> Regarding the FETCH response changes:
> 1. Tagged field 2 already exists. Looks like you can use tagged field 3.
> 
> 2. The CurrentLeaderBroker should not be within PartitionData. It is
> possible to have multiple partitions that would change leadership to
> the same broker/node. We should instead move that information to a
> top-level field that is an array of (replica id, host, port, rack).

Is the requested restructuring of the response “simply” to preserve bytes, or 
is it possible that the fetch response could/should/would return leadership 
changes for partitions that we’re specifically requested?

> 3. In the future, I may use this information in the KRaft/Metadata
> implementation of FETCH. In that implementation not all of the
> replicas are brokers.

Side point: any references to the change you’re referring to? The idea of 
non-brokers serving as replicas is blowing my mind a bit :) 

> Do you mind removing all references to the word
> broker in the description and field name. Maybe you can use the word
> replica instead. How about something like this:
>{ "name": "NodeEndpoints", "type": "[]NodeEndpoint", "versions":
> "15+", "taggedVersions": "15+", "tag": 3,
>  "about": "Endpoint information for all current leaders
> enumerated in PartitionData.", "fields": [
>  { "name": "ReplicaId", "type": "int32", "versions": "15+",
> "mapKey": true, "entityType": "brokerId",
>"about": "The ID of the associated replica"},
>  { "name": "Host", "type": "string", "versions": "15+",
>"about": "The replica's hostname." },
>  { "name": "Port", "type": "int32", "versions": "15+",
>"about": "The replica's port." },
>  { "name": "Rack", "type": "string", "versions": "15+",
> "ignorable": true, "default": "null",
>"about": "The rack of the replica, or null if it has not
> been assigned to a rack." }
>  ]},
> 
> Regarding the PRODUCE response changes:
> 4. Can we make similar changes as the ones mentioned in bullet points
> 2. and 3 above?.
> 
> 5. If you make the changes enumerated in bullet point 4., you'll
> probably want to change the tag so that NodeEpoint has tag 0 while
> CurrentLeader has tag 1.
> 
> Thanks!
> -- 
> -José

Thanks!
Kirk



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread Mayank Shekhar Narula
Hi Ismael

Right now, effort is to get baseline Vs KIP-proposed changes #s.

After that will look at the alternative #s.

On Fri, Jul 14, 2023 at 12:38 AM Ismael Juma  wrote:

> Hi Mayank,
>
> See my answer below.
>
> On Thu, Jul 13, 2023 at 10:24 AM Mayank Shekhar Narula <
> mayanks.nar...@gmail.com> wrote:
>
> > re. 2 On some busy clusters a single metadata call has been observed to
> > take order of ~100 milliseconds(I think it's mentioned somewhere in this
> > motivation). So retrying immediately on the Produce path won't make
> sense,
> > as metadata would still be stale. Hence the current proposal optimises
> for
> > fetching specific-metadata about the new leader, in the produce & fetch
> > response, outside of the metadata refresh. What do you think?
>
>
> This is a bit vague and difficult to evaluate. If it's reasonably simple to
> evaluate this alternative, can we provide comparative numbers too?
>
> Ismael
>


-- 
Regards,
Mayank Shekhar Narula


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-14 Thread José Armando García Sancio
Hi Mayank,

On Thu, Jul 13, 2023 at 10:03 AM Mayank Shekhar Narula
 wrote:
> 3. If I understood this correctly, certain replicas "aren't" brokers, what
> are they then?

In a Kafka KRaft cluster they can be either brokers, controllers or
both. The term we use is node. A Kafka node can be either a broker,
controller or both. For example, we have the following configuration
documentation:

node.id
The node ID associated with the roles this process is playing when
`process.roles` is non-empty. This is required configuration when
running in KRaft mode.

>From https://kafka.apache.org/documentation/#brokerconfigs_node.id

> Also how about replacing "Replica" with "Leader", this is more readable on
> the client. so, how about this?

I vote for "Node". Note that this array is a mapping of replica/node
id to their endpoint. For example, in the future we may use it to send
endpoint information when sending the replica id to fetch from a
follower.

Thanks,
-- 
-José


[jira] [Created] (KAFKA-15191) Add support for Micrometer Observation

2023-07-14 Thread Marcin Grzejszczak (Jira)
Marcin Grzejszczak created KAFKA-15191:
--

 Summary: Add support for Micrometer Observation
 Key: KAFKA-15191
 URL: https://issues.apache.org/jira/browse/KAFKA-15191
 Project: Kafka
  Issue Type: New Feature
Reporter: Marcin Grzejszczak


I'm a co-maintainer of Spring Cloud Sleuth and Micrometer projects (together 
with Tommy Ludwig and Jonatan Ivanov).

The idea of [Micrometer Observation|https://micrometer.io/docs/observation] is 
that you instrument code once but you get multiple benefits out of it - e.g. 
you can get tracing, metrics, logging or whatever you see fit).

I was curious if there's interest in adding Micrometer Observation support so 
that automatically metrics, spans could be created and tracing context 
propagation could happen too. In other words metrics and tracing of this 
project could be created + if there are Micrometer Observation compatible 
projects, then they will join the whole graph (e.g. Spring Framework 6, Apache 
Dubbo, Resilience4j, Apache Camel etc.).

If there's interest in adding that feature, I can provide a PR.

Regardless of whether there's interest in adding this directly to Kafka I would 
like to discuss what would be the best way to add instrumentation to Kafka. 
Adding instrumentation means before the message is sent to Kafka I would like 
to access its headers and be able mutate them, and before the message is 
received from Kafka I would like to access the headers and retrieve its 
key-values to create e.g. a span.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14462) New Group Coordinator State Machine

2023-07-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14462.
-
Fix Version/s: 3.6.0
   Resolution: Fixed

> New Group Coordinator State Machine
> ---
>
> Key: KAFKA-14462
> URL: https://issues.apache.org/jira/browse/KAFKA-14462
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.5.1 RC0

2023-07-14 Thread Divij Vaidya
Thank you both for testing this.

Fede, that's a good catch. My bad on not catching it earlier during release
prep. I will create a new RC containing the fix -
https://github.com/apache/kafka/pull/14016

--
Divij Vaidya



On Fri, Jul 14, 2023 at 9:29 AM Jakub Scholz  wrote:

> +1 (non-binding). I used the staged Scala 2.13 binaries and the Maven
> artifacts for my tests and all seems to work fine. Thanks.
>
> Jakub
>
> On Wed, Jul 12, 2023 at 12:03 PM Divij Vaidya 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 3.5.1.
> >
> > This release is a security patch release. It upgrades the dependency,
> > snappy-java, to a version which is not vulnerable to CVE-2023-34455. You
> > can find more information about the CVE at Kafka CVE list
> > .
> >
> > Additionally, this releases fixes a regression introduced in 3.3.0, which
> > caused security.protocol configuration values to be restricted to upper
> > case only. With this release, security.protocol values are
> > case insensitive. See KAFKA-15053
> >  for details.
> >
> > Release notes for the 3.5.1 release:
> > https://home.apache.org/~divijv/kafka-3.5.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Tuesday, July 18, 9am PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~divijv/kafka-3.5.1-rc0/
> >
> > Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > Javadoc:
> > https://home.apache.org/~divijv/kafka-3.5.1-rc0/javadoc/
> >
> > Tag to be voted upon (off 3.5 branch) is the 3.5.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.5.1-rc0
> >
> > Documentation:
> > https://kafka.apache.org/35/documentation.html
> > Please note that documentation will be updated with upgrade notes (
> >
> >
> https://github.com/apache/kafka/commit/4c78fd64454e25e3536e8c7ed5725d3fbe944a49
> > )
> > after the release is complete.
> >
> > Protocol:
> > https://kafka.apache.org/35/protocol.html
> >
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/35/ (9
> failures).
> > I am running another couple of runs to ensure that there are no
> > consistently failing tests. I have verified that unit/integration tests
> on
> > my local machine successfully pass.
> >
> > System tests:
> > Not planning to run system tests since this is a patch release.
> >
> > Thank you.
> >
> > --
> > Divij Vaidya
> > Release Manager for Apache Kafka 3.5.1
> >
>


Re: [DISCUSS] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-14 Thread Federico Valeri
Hi Omnia and Chris, I agree with setting
"replication.policy.internal.topic.separator.enabled=true" by default,
but I was wondering if we should also deprecate and remove in Kafka 4.
If there is agreement in having the same behavior for internal and
non-internal topics, then it should be fine, and we won't need to keep
an additional configuration around. Wdyt?

Cheers
Fede

On Fri, Jul 14, 2023 at 1:51 PM Omnia Ibrahim  wrote:
>
> Hi Chris, I added a section for backport plan here 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy#KIP949:AddflagtoenabletheusageoftopicseparatorinMM2DefaultReplicationPolicy-Backportingplan
>
> Cheers,
> Omnia
>
> > On 13 Jul 2023, at 19:33, Chris Egerton  wrote:
> >
> > Hi Omnia,
> >
> > Yes, I think we ought to state the backport plan in the KIP, since it's
> > highly unusual for KIP changes or new configuration properties to be
> > backported and we should get the approval of the community (including
> > binding and non-binding voters) before implementing it.
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Jul 13, 2023 at 7:13 AM Omnia Ibrahim 
> > wrote:
> >
> >> Hi Chris,
> >> The implementation should be very small so backporting this to 3.1 and 3.2
> >> would be perfect for this case if you or any other committer are okay with
> >> approving the backporting. Do we need to state this in KIP as well or not?
> >>
> >> Also, I’ll open a vote for the KIP today and prepare the pr for it so we
> >> can merge it as soon as we can.
> >>
> >> Thanks,
> >>
> >> Omnia
> >>
> >> On Wed, Jul 12, 2023 at 4:31 PM Chris Egerton 
> >> wrote:
> >>
> >>> Hi Omnia,
> >>>
> >>> Thanks for changing the default, LGTM 
> >>>
> >>> As far as backporting goes, we probably won't be doing another release
> >> for
> >>> 3.1, and possibly not for 3.2 either; however, if we can make the
> >>> implementation focused enough (which I don't think would be too
> >> difficult,
> >>> but correct me if I'm wrong), then we can still backport through 3.1.
> >> Even
> >>> if we don't do another release it can make life easier for people who are
> >>> maintaining parallel forks. Obviously this shouldn't be taken as a
> >> blanket
> >>> precedent but in this case it seems like the benefits may outweigh the
> >>> costs. What are your thoughts?
> >>>
> >>> Cheers,
> >>>
> >>> Chris
> >>>
> >>> On Wed, Jul 12, 2023 at 9:05 AM Omnia Ibrahim 
> >>> wrote:
> >>>
>  Hi Chris, thanks for the feedback.
>  1. regarding the default value I had the same conflict of which version
> >>> to
>  break the backward compatibility with. We can just say that this KIP
> >>> gives
>  the release Pre KIP-690 the ability to keep the old behaviour with one
>  config and keep the backwards compatibility from post-KIP-690 the same
> >> so
>  we don't break at least the last 3 versions. I will update the KIP to
>  switch the default value to true.
>  2. For the backporting, which versions can we backport these to?
> >> Usually,
>  Kafka supports bugfix releases as needed for the last 3 releases. Now
> >> we
> >>> @
>  3.5 so the last 3 are 3.4, 3.3 and 3.2 is this correct?
>  3. I'll add a Jira for updating the docs for this KIP so we don't
> >> forget
>  about it.
> 
>  Thanks
>  Omnia
> 
> 
>  On Mon, Jul 10, 2023 at 5:33 PM Chris Egerton  >>>
>  wrote:
> 
> > Hi Omnia,
> >
> > Thanks for taking this on! I have some thoughts but the general
> >>> approach
> > looks good.
> >
> > 1. Default value
> >
> > One thing I'm wrestling with is what the default value of the new
>  property
> > should be. I know on the Jira ticket I proposed that it should be
> >>> false,
> > but I'm having second thoughts. Technically we'd preserve backward
> > compatibility with pre-KIP-690 releases by defaulting to false, but
> >> at
>  the
> > same time, we'd break compatibility with post-KIP-690 releases. And
> >> if
> >>> we
> > default to true, the opposite would be true: compatibility would be
>  broken
> > with pre-KIP-690 releases, but preserved with post-KIP-690 releases.
> >
> > One argument against defaulting to false (which, again, would
> >> preserve
>  the
> > behavior of MM2 before we accidentally broke compatibility with
> >>> KIP-690)
>  is
> > that this change could possibly cause a single MM2 setup to break
> > twice--once when upgrading from a pre-KIP-690 release to an existing
> > release, and again when upgrading from that existing release to a
> >>> version
> > that reverted (by default) to pre-KIP-690 behavior. On the other
> >> hand,
> >>> if
> > we default to true (which would preserve the existing behavior that
>  breaks
> > compatibility with pre-KIP-690 releases), then any given setup will
> >>> only
>  be
> > broken once.
> >
> > In 

Re: [DISCUSS] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-14 Thread Omnia Ibrahim
Hi Chris, I added a section for backport plan here 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy#KIP949:AddflagtoenabletheusageoftopicseparatorinMM2DefaultReplicationPolicy-Backportingplan

Cheers, 
Omnia 

> On 13 Jul 2023, at 19:33, Chris Egerton  wrote:
> 
> Hi Omnia,
> 
> Yes, I think we ought to state the backport plan in the KIP, since it's
> highly unusual for KIP changes or new configuration properties to be
> backported and we should get the approval of the community (including
> binding and non-binding voters) before implementing it.
> 
> Cheers,
> 
> Chris
> 
> On Thu, Jul 13, 2023 at 7:13 AM Omnia Ibrahim 
> wrote:
> 
>> Hi Chris,
>> The implementation should be very small so backporting this to 3.1 and 3.2
>> would be perfect for this case if you or any other committer are okay with
>> approving the backporting. Do we need to state this in KIP as well or not?
>> 
>> Also, I’ll open a vote for the KIP today and prepare the pr for it so we
>> can merge it as soon as we can.
>> 
>> Thanks,
>> 
>> Omnia
>> 
>> On Wed, Jul 12, 2023 at 4:31 PM Chris Egerton 
>> wrote:
>> 
>>> Hi Omnia,
>>> 
>>> Thanks for changing the default, LGTM 
>>> 
>>> As far as backporting goes, we probably won't be doing another release
>> for
>>> 3.1, and possibly not for 3.2 either; however, if we can make the
>>> implementation focused enough (which I don't think would be too
>> difficult,
>>> but correct me if I'm wrong), then we can still backport through 3.1.
>> Even
>>> if we don't do another release it can make life easier for people who are
>>> maintaining parallel forks. Obviously this shouldn't be taken as a
>> blanket
>>> precedent but in this case it seems like the benefits may outweigh the
>>> costs. What are your thoughts?
>>> 
>>> Cheers,
>>> 
>>> Chris
>>> 
>>> On Wed, Jul 12, 2023 at 9:05 AM Omnia Ibrahim 
>>> wrote:
>>> 
 Hi Chris, thanks for the feedback.
 1. regarding the default value I had the same conflict of which version
>>> to
 break the backward compatibility with. We can just say that this KIP
>>> gives
 the release Pre KIP-690 the ability to keep the old behaviour with one
 config and keep the backwards compatibility from post-KIP-690 the same
>> so
 we don't break at least the last 3 versions. I will update the KIP to
 switch the default value to true.
 2. For the backporting, which versions can we backport these to?
>> Usually,
 Kafka supports bugfix releases as needed for the last 3 releases. Now
>> we
>>> @
 3.5 so the last 3 are 3.4, 3.3 and 3.2 is this correct?
 3. I'll add a Jira for updating the docs for this KIP so we don't
>> forget
 about it.
 
 Thanks
 Omnia
 
 
 On Mon, Jul 10, 2023 at 5:33 PM Chris Egerton >> 
 wrote:
 
> Hi Omnia,
> 
> Thanks for taking this on! I have some thoughts but the general
>>> approach
> looks good.
> 
> 1. Default value
> 
> One thing I'm wrestling with is what the default value of the new
 property
> should be. I know on the Jira ticket I proposed that it should be
>>> false,
> but I'm having second thoughts. Technically we'd preserve backward
> compatibility with pre-KIP-690 releases by defaulting to false, but
>> at
 the
> same time, we'd break compatibility with post-KIP-690 releases. And
>> if
>>> we
> default to true, the opposite would be true: compatibility would be
 broken
> with pre-KIP-690 releases, but preserved with post-KIP-690 releases.
> 
> One argument against defaulting to false (which, again, would
>> preserve
 the
> behavior of MM2 before we accidentally broke compatibility with
>>> KIP-690)
 is
> that this change could possibly cause a single MM2 setup to break
> twice--once when upgrading from a pre-KIP-690 release to an existing
> release, and again when upgrading from that existing release to a
>>> version
> that reverted (by default) to pre-KIP-690 behavior. On the other
>> hand,
>>> if
> we default to true (which would preserve the existing behavior that
 breaks
> compatibility with pre-KIP-690 releases), then any given setup will
>>> only
 be
> broken once.
> 
> In addition, if we default to true right now, then we don't have to
>>> worry
> about changing that default in 4.0 to a more intuitive value (I hope
>> we
 can
> all agree that, for new clusters, it makes sense to set this property
>>> to
> true and not to distinguish between internal and non-internal
>> topics).
> 
> With that in mind, I'm now leaning more towards defaulting to true,
>> but
> would be interested in your thoughts.
> 
> 
> 2. Backport?
> 
> It's highly unlikely to backport changes for a KIP, but given the
>>> impact
 of
> the compatibility break that we're trying to address here, and the
> 

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-14 Thread Divij Vaidya
Thank you for spotting that Luke. I have fixed the snippet now.

*Satish*, I am waiting for your review on this one since you provided some
comments earlier in this discussion. Please check the KIP once again when
you get a chance and vote at
https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk if you
don't have any further concerns. Thank you!!

--
Divij Vaidya



On Thu, Jul 13, 2023 at 2:39 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Divij.
>
> I was confusing with the metric tags used by clients that are based on
> topic and partition. Ideally partition label could be at a DEBUG recording
> level, but that's outside the scope of this KIP.
>
> Looks good to me, thanks again!
>
> Jorge.
>
> On Wed, 12 Jul 2023 at 15:55, Divij Vaidya 
> wrote:
>
> > Jorge,
> > About API name: Good point. I have changed it to remoteLogSize instead of
> > getRemoteLogSize
> >
> > About partition tag in the metric: We don't use partition tag across any
> of
> > the RemoteStorage metrics and I would like to keep this metric aligned
> with
> > the rest. I will change the metric though to type=BrokerTopicMetrics
> > instead of type=RemoteLogManager, since this is topic level information
> and
> > not specific to RemoteLogManager.
> >
> >
> > Satish,
> > Ah yes! Updated from "This would increase the broker start-up time." to
> > "This would increase the bootstrap time for the remote storage thread
> pool
> > before the first eligible segment is archived."
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Jul 3, 2023 at 2:07 PM Satish Duggana 
> > wrote:
> >
> > > Thanks Divij for taking the feedback and updating the motivation
> > > section in the KIP.
> > >
> > > One more comment on Alternative solution-3, The con is not valid as
> > > that will not affect the broker restart times as discussed in the
> > > earlier email in this thread. You may want to update that.
> > >
> > > ~Satish.
> > >
> > > On Sun, 2 Jul 2023 at 01:03, Divij Vaidya 
> > wrote:
> > > >
> > > > Thank you folks for reviewing this KIP.
> > > >
> > > > Satish, I have modified the motivation to make it more clear. Now it
> > > says,
> > > > "Since the main feature of tiered storage is storing a large amount
> of
> > > > data, we expect num_remote_segments to be large. A frequent linear
> scan
> > > > (i.e. listing all segment metadata) could be expensive/slower because
> > of
> > > > the underlying storage used by RemoteLogMetadataManager. This
> slowness
> > to
> > > > list all segment metadata could result in the loss of
> availability"
> > > >
> > > > Jun, Kamal, Satish, if you don't have any further concerns, I would
> > > > appreciate a vote for this KIP in the voting thread -
> > > > https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> > > > kamal.chandraprak...@gmail.com> wrote:
> > > >
> > > > > Hi Divij,
> > > > >
> > > > > Thanks for the explanation. LGTM.
> > > > >
> > > > > --
> > > > > Kamal
> > > > >
> > > > > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Divij,
> > > > > > I am fine with having an API to compute the size as I mentioned
> in
> > my
> > > > > > earlier reply in this mail thread. But I have the below comment
> for
> > > > > > the motivation for this KIP.
> > > > > >
> > > > > > As you discussed offline, the main issue here is listing calls
> for
> > > > > > remote log segment metadata is slower because of the storage used
> > for
> > > > > > RLMM. These can be avoided with this new API.
> > > > > >
> > > > > > Please add this in the motivation section as it is one of the
> main
> > > > > > motivations for the KIP.
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Sat, 1 Jul 2023 at 01:43, Jun Rao 
> > > wrote:
> > > > > > >
> > > > > > > Hi, Divij,
> > > > > > >
> > > > > > > Sorry for the late reply.
> > > > > > >
> > > > > > > Given your explanation, the new API sounds reasonable to me. Is
> > > that
> > > > > > enough
> > > > > > > to build the external metadata layer for the remote segments or
> > do
> > > you
> > > > > > need
> > > > > > > some additional API changes?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya <
> > > divijvaidy...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you for looking into this Kamal.
> > > > > > > >
> > > > > > > > You are right in saying that a cold start (i.e. leadership
> > > failover
> > > > > or
> > > > > > > > broker startup) does not impact the broker startup duration.
> > But
> > > it
> > > > > > does
> > > > > > > > have the following impact:
> > > > > > > > 1. It leads to a burst of full-scan requests to RLMM in case
> > > multiple
> > > > > > > > leadership failovers occur at the same 

[jira] [Created] (KAFKA-15190) Allow configuring a streams process ID

2023-07-14 Thread Joe Wreschnig (Jira)
Joe Wreschnig created KAFKA-15190:
-

 Summary: Allow configuring a streams process ID
 Key: KAFKA-15190
 URL: https://issues.apache.org/jira/browse/KAFKA-15190
 Project: Kafka
  Issue Type: Wish
  Components: streams
Reporter: Joe Wreschnig


We run our Kafka Streams applications in containers with no persistent storage, 
and therefore the mitigation of persisting process ID the state directly in 
KAFKA-10716 does not help us avoid shuffling lots of tasks during restarts.

However, we do have a persistent container ID (from a Kubernetes StatefulSet). 
Would it be possible to expose a configuration option to let us set the streams 
process ID ourselves?

We are already using this ID as our group.instance.id - would it make sense to 
have the process ID be automatically derived from this (plus application/client 
IDs) if it's set? The two IDs seem to have overlapping goals of identifying 
"this consumer" across restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15189) Do not initialize RemoteStorage related metrics when disabled at cluster

2023-07-14 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-15189:


 Summary: Do not initialize RemoteStorage related metrics when 
disabled at cluster
 Key: KAFKA-15189
 URL: https://issues.apache.org/jira/browse/KAFKA-15189
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Divij Vaidya


context: https://github.com/apache/kafka/pull/13944#discussion_r1251099820



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-14 Thread Erik van Oosten

Hi Colin,

The way I understood Philp's message is that KIP-944 also plays nice 
with KIP-945. But I might be mistaken.


Regardless, KIP-945 does /not/ resolve the underlying problem (the need 
for nested consumer invocations) because it has the explicit goal of not 
changing the user facing API.


> ... KIP-945 but haven't posted a DISCUSS thread yet

There is a thread called 'KafkaConsumer refactor proposal', but indeed 
no official discussion yet.


> I really don't want to be debugging complex interactions between Java 
thread-local variables and green threads.


In that email thread, I proposed an API change in which callbacks are no 
longer needed. The proposal completely removes the need for such complex 
interactions. In addition, it gives clients the ability to process at 
full speed even while a coorperative rebalance is ongoing.


Regards,
    Erik.

Op 14-07-2023 om 00:36 schreef Colin McCabe:

HI Philip & Erik,

Hmm... if we agree that KIP-945 addresses this use case, I think it would be 
better to just focus on that KIP. Fundamentally it's a better and cleaner model 
than a complex scheme involving thread-local variables. I really don't want to 
be debugging complex interactions between Java thread-local variables and green 
threads.

It also generally helps to have some use-cases in mind when writing these 
things. If we get feedback about what would be useful for async runtimes, that 
would probably help improve and focus KIP-945. By the way, I can see you have a 
draft on the wiki for KIP-945 but haven't posted a DISCUSS thread yet, so I 
assume it's not ready for review yet ;)

best,
Colin


On Tue, Jul 11, 2023, at 12:24, Philip Nee wrote:

Hey Erik - Another thing I want to add to my comment is.  We are in-process
of re-writing the KafkaConsumer, and I think your proposal would work in
the new consumer because we are going to separate the user thread and the
background thread.  Here is the 1-pager, and we are in process of
converting this in to KIP-945.

Thanks,
P

On Tue, Jul 11, 2023 at 10:33 AM Philip Nee  wrote:


Hey Erik,

Sorry for holding up this email for a few days since Colin's response
includes some of my concerns.  I'm in favor of this KIP, and I think your
approach seems safe.  Of course, I probably missed something therefore I
think this KIP needs to cover different use cases to demonstrate it doesn't
cause any unsafe access. I think this can be demonstrated via diagrams and
some code in the KIP.

Thanks,
P

On Sat, Jul 8, 2023 at 12:28 PM Erik van Oosten
 wrote:


Hello Colin,

  >> In KIP-944, the callback thread can only delegate to another thread
after reading from and writing to a threadlocal variable, providing the
barriers right there.

  > I don't see any documentation that accessing thread local variables
provides a total store or load barrier. Do you have such documentation?
It seems like if this were the case, we could eliminate volatile
variables from most of the code base.

Now I was imprecise. The thread-locals are only somewhat involved. In
the KIP proposal the callback thread reads an access key from a
thread-local variable. It then needs to pass that access key to another
thread, which then can set it on its own thread-local variable. The act
of passing a value from one thread to another implies that a memory
barrier needs to be passed. However, this is all not so relevant since
there is no need to pass the access key back when the other thread is
done.

But now I think about it a bit more, the locking mechanism runs in a
synchronized block. If I remember correctly this should be enough to
pass read and write barriers.

  >> In the current implementation the consumer is also invoked from
random threads. If it works now, it should continue to work.
  > I'm not sure what you're referring to. Can you expand on this?

Any invocation of the consumer (e.g. method poll) is not from a thread
managed by the consumer. This is what I was assuming you meant with the
term 'random thread'.

  > Hmm, not sure what you mean by "cooperate with blocking code." If you
have 10 green threads you're multiplexing on to one CPU thread, and that
CPU thread gets blocked because of what one green thread is doing, the
other 9 green threads are blocked too, right? I guess it's "just" a
performance problem, but it still seems like it could be a serious one.

There are several ways to deal with this. All async runtimes I know
(Akka, Zio, Cats-effects) support this by letting you mark a task as
blocking. The runtime will then either schedule it to another
thread-pool, or it will grow the thread-pool to accommodate. In any case
'the other 9 green threads' will simply be scheduled to another real
thread. In addition, some of these runtimes detect long running tasks
and will reschedule waiting tasks to another thread. This is all a bit
off topic though.

  > I don't see why this has to be "inherently multi-threaded." Why can't
we have the other threads report back what messages they've 

RE: Testing FixedKeyProcessor implementation using unit tests

2023-07-14 Thread EXT . Zlatibor . Veljkovic
Hi Matthias,

Here's the repro of the project that has these issues 
https://github.com/zveljkovic/kafka-repro.

Please look at the:
Topology definition: 
https://github.com/zveljkovic/kafka-repro/blob/master/src/main/java/com/example/demo/DemoApplication.java
FixedKeyProcessor: 
https://github.com/zveljkovic/kafka-repro/blob/master/src/main/java/com/example/demo/MyFixedKeyProcessor.java
Test of FixedKeyProcessor: 
https://github.com/zveljkovic/kafka-repro/blob/master/src/test/java/com/example/demo/MyFixedKeyProcessorTest.java

Test is where I am having issues.

Thanks,
Zed


-Original Message-
From: Matthias J. Sax 
Sent: Tuesday, July 11, 2023 1:13 AM
To: dev@kafka.apache.org
Subject: Re: Testing FixedKeyProcessor implementation using unit tests

External email:Be careful with links and attachments


Not sure right now, but could be a bug.

Can you maybe share the full stack trace and the test program?

-Matthias

On 7/10/23 3:47 AM, EXT.Zlatibor.Veljkovic wrote:
> Hi, I am using kafka-streams-test-utils and have problem with testing 
> FixedKeyProcessor [KIP-820 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-820%3A+Extend+KStream+process+with+new+Processor+API#KIP820:ExtendKStreamprocesswithnewProcessorAPI-InfrastructureforFixedKeyRecords].
>
> Using mock processor context to get the forwarded message doesn't work.
>
> class org.apache.kafka.streams.processor.api.MockProcessorContext cannot be 
> cast to class org.apache.kafka.streams.processor.api.FixedKeyProcessorContext
>
> Anything I can do to get forwarded records?
>
> Thanks,
> Zed
>


Re: [VOTE] 3.5.1 RC0

2023-07-14 Thread Jakub Scholz
+1 (non-binding). I used the staged Scala 2.13 binaries and the Maven
artifacts for my tests and all seems to work fine. Thanks.

Jakub

On Wed, Jul 12, 2023 at 12:03 PM Divij Vaidya 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 3.5.1.
>
> This release is a security patch release. It upgrades the dependency,
> snappy-java, to a version which is not vulnerable to CVE-2023-34455. You
> can find more information about the CVE at Kafka CVE list
> .
>
> Additionally, this releases fixes a regression introduced in 3.3.0, which
> caused security.protocol configuration values to be restricted to upper
> case only. With this release, security.protocol values are
> case insensitive. See KAFKA-15053
>  for details.
>
> Release notes for the 3.5.1 release:
> https://home.apache.org/~divijv/kafka-3.5.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, July 18, 9am PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~divijv/kafka-3.5.1-rc0/
>
> Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> Javadoc:
> https://home.apache.org/~divijv/kafka-3.5.1-rc0/javadoc/
>
> Tag to be voted upon (off 3.5 branch) is the 3.5.1 tag:
> https://github.com/apache/kafka/releases/tag/3.5.1-rc0
>
> Documentation:
> https://kafka.apache.org/35/documentation.html
> Please note that documentation will be updated with upgrade notes (
>
> https://github.com/apache/kafka/commit/4c78fd64454e25e3536e8c7ed5725d3fbe944a49
> )
> after the release is complete.
>
> Protocol:
> https://kafka.apache.org/35/protocol.html
>
> Unit/integration tests:
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/35/ (9 failures).
> I am running another couple of runs to ensure that there are no
> consistently failing tests. I have verified that unit/integration tests on
> my local machine successfully pass.
>
> System tests:
> Not planning to run system tests since this is a patch release.
>
> Thank you.
>
> --
> Divij Vaidya
> Release Manager for Apache Kafka 3.5.1
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1999

2023-07-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 379456 lines...]
> Task :connect:api:compileTestJava UP-TO-DATE
> Task :connect:api:testClasses UP-TO-DATE
> Task :connect:json:generateMetadataFileForMavenJavaPublication
> Task :connect:json:testSrcJar
> Task :connect:api:testJar
> Task :connect:api:testSrcJar
> Task :clients:generateMetadataFileForMavenJavaPublication
> Task :connect:api:publishMavenJavaPublicationToMavenLocal
> Task :connect:api:publishToMavenLocal
> Task :connect:json:publishMavenJavaPublicationToMavenLocal
> Task :connect:json:publishToMavenLocal
> Task :storage:api:compileTestJava
> Task :storage:api:testClasses
> Task :server-common:compileTestJava
> Task :server-common:testClasses
> Task :raft:compileTestJava
> Task :raft:testClasses
> Task :group-coordinator:compileTestJava
> Task :group-coordinator:testClasses

> Task :clients:javadoc
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java:81:
 warning - Tag @link:illegal character: "60" in "#define(String, Type, 
Importance, String, String, int, Width, String, List)"
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java:81:
 warning - Tag @link:illegal character: "62" in "#define(String, Type, 
Importance, String, String, int, Width, String, List)"
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java:81:
 warning - Tag @link: can't find define(String, Type, Importance, String, 
String, int, Width, String, List) in 
org.apache.kafka.common.config.ConfigDef
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/clients/src/main/java/org/apache/kafka/clients/admin/ScramMechanism.java:32:
 warning - Tag @see: missing final '>': "https://cwiki.apache.org/confluence/display/KAFKA/KIP-554%3A+Add+Broker-side+SCRAM+Config+API;>KIP-554:
 Add Broker-side SCRAM Config API

 This code is duplicated in 
org.apache.kafka.common.security.scram.internals.ScramMechanism.
 The type field in both files must match and must not change. The type field
 is used both for passing ScramCredentialUpsertion and for the internal
 UserScramCredentialRecord. Do not change the type field."
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/secured/package-info.java:21:
 warning - Tag @link: reference not found: 
org.apache.kafka.common.security.oauthbearer

> Task :metadata:compileTestJava
> Task :metadata:testClasses

> Task :clients:javadoc
5 warnings

> Task :clients:javadocJar
> Task :core:compileScala
> Task :clients:srcJar
> Task :clients:testJar
> Task :clients:testSrcJar
> Task :clients:publishMavenJavaPublicationToMavenLocal
> Task :clients:publishToMavenLocal
> Task :core:classes
> Task :core:compileTestJava NO-SOURCE
> Task :core:compileTestScala
> Task :core:testClasses
> Task :streams:compileTestJava
> Task :streams:testClasses
> Task :streams:testJar
> Task :streams:testSrcJar
> Task :streams:publishMavenJavaPublicationToMavenLocal
> Task :streams:publishToMavenLocal

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

See 
https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 5m
89 actionable tasks: 41 executed, 48 up-to-date

Publishing build scan...
https://ge.apache.org/s/k64yyn6ckkkog

[Pipeline] sh
+ grep ^version= gradle.properties
+ cut -d= -f 2
[Pipeline] dir
Running in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/streams/quickstart
[Pipeline] {
[Pipeline] sh
+ mvn clean install -Dgpg.skip
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Kafka Streams :: Quickstart[pom]
[INFO] streams-quickstart-java[maven-archetype]
[INFO] 
[INFO] < org.apache.kafka:streams-quickstart >-
[INFO] Building Kafka Streams :: Quickstart 3.6.0-SNAPSHOT[1/2]
[INFO]   from pom.xml
[INFO] [ pom ]-
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ streams-quickstart ---
[INFO] 
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
streams-quickstart ---
[INFO] 
[INFO] --- site:3.5.1:attach-descriptor (attach-descriptor) @ 
streams-quickstart ---
[INFO] 
[INFO] --- gpg:1.6:sign (sign-artifacts) @ streams-quickstart ---
[INFO] 
[INFO] --- install:2.5.2:install (default-install) @ streams-quickstart ---
[INFO]