Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1998

2023-07-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2594 lines...]
> Task :connect:transforms:compileTestJava
> Task :connect:transforms:testClasses
> Task :connect:transforms:spotbugsTest SKIPPED
> Task :streams:examples:checkstyleTest
> Task :streams:examples:check
> Task :streams:test-utils:checkstyleMain
> Task :raft:checkstyleTest
> Task :streams:test-utils:compileTestJava
> Task :streams:test-utils:testClasses
> Task :streams:test-utils:spotbugsTest SKIPPED
> Task :raft:check
> Task :group-coordinator:checkstyleTest
> Task :connect:transforms:checkstyleTest
> Task :connect:transforms:check
> Task :group-coordinator:check

> Task :core:compileScala
[Warn] /home/jenkins/.gradle/workers/warning:[options] bootstrap class path not 
set in conjunction with -source 8
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java:80:
  [removal] AccessController in java.security has been deprecated and marked 
for removal
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java:196:
  [removal] AccessController in java.security has been deprecated and marked 
for removal

> Task :core:classes
> Task :core:checkstyleMain
> Task :shell:compileJava
> Task :shell:classes
> Task :shell:compileTestJava
> Task :shell:testClasses
> Task :shell:spotbugsTest SKIPPED
> Task :shell:checkstyleMain
> Task :shell:checkstyleTest
> Task :streams:test-utils:checkstyleTest

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata:compileTestJava'.
> Compilation failed; see the compiler error output for details.

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

See 
https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 5m 42s
242 actionable tasks: 194 executed, 48 up-to-date

Publishing build scan...
https://ge.apache.org/s/zeh2qs6kdtpeo


See the profiling report at: 
file:///home/jenkins/workspace/Kafka_kafka_trunk/build/reports/profile/profile-2023-07-14-03-15-10.html
A fine-grained performance profile is available: use the --scan option.

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata:compileTestJava'.
> Compilation failed; see the compiler error output for details.

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

See 
https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 5m 50s
239 actionable tasks: 190 executed, 49 up-to-date

Publishing build scan...
https://ge.apache.org/s/2fytogqecwmz2


See the profiling report at: 
file:///home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/build/reports/profile/profile-2023-07-14-03-15-04.html
A fine-grained performance profile is available: use the --scan option.
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
Failed in branch JDK 20 and Scala 2.13
> Task :shell:spotbugsMain
> Task :shell:check
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
Failed in branch JDK 8 and Scala 2.12
> Task :clients:check
> Task :streams:test-utils:spotbugsMain
> Task :core:spotbugsMain
> Task :streams:streams-scala:classes
> Task :streams:streams-scala:checkstyleMain NO-SOURCE
> Task :streams:test-utils:check
> Task :streams:streams-scala:spotbugsMain

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata:compileTestJava'.
> Compilation failed; see the compiler error output for details.

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 

[jira] [Resolved] (KAFKA-15185) Consumers using the latest strategy may lose data after the topic adds partitions

2023-07-13 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-15185.
---
Resolution: Duplicate

> Consumers using the latest strategy may lose data after the topic adds 
> partitions
> -
>
> Key: KAFKA-15185
> URL: https://issues.apache.org/jira/browse/KAFKA-15185
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 3.4.1
>Reporter: RivenSun
>Assignee: Luke Chen
>Priority: Major
>
> h2. condition:
> 1. Business topic adds partition
> 2. The configuration metadata.max.age.ms of producers and consumers is set to 
> five minutes.
> But the producer discovered the new partition before the consumer, and 
> generated 100 messages to the new partition.
> 3. The consumer parameter auto.offset.reset is set to *latest*
> h2. result:
> Consumers will lose these 100 messages
> First of all, we cannot directly set auto.offset.reset to {*}earliest{*}.
> Because the user's demand is that a newly subscribed group can discard all 
> old messages of the topic.
> However, after the group is subscribed, the message generated by the expanded 
> partition {*}must be guaranteed not to be lost{*}, similar to starting 
> consumption from the earliest.
> h2.  
> h2. suggestion:
> We have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the 
> producer's metadata.max.age.ms configuration.
> But this still can't solve the problem, because in many cases, the producer 
> may force refresh the metadata.
> Secondly, a smaller metadata.max.age.ms value will bring more metadata 
> refresh requests, which will increase the burden on the broker.
> So can we add a parameter to control how the consumer determines whether to 
> start consumption from the earliest or latest for the newly added partition.
> Perhaps during the rebalance process, the leaderConsumer needs to mark which 
> partitions are newly added when calculating the assignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Ismael Juma
Hi Mayank,

See my answer below.

On Thu, Jul 13, 2023 at 10:24 AM Mayank Shekhar Narula <
mayanks.nar...@gmail.com> wrote:

> re. 2 On some busy clusters a single metadata call has been observed to
> take order of ~100 milliseconds(I think it's mentioned somewhere in this
> motivation). So retrying immediately on the Produce path won't make sense,
> as metadata would still be stale. Hence the current proposal optimises for
> fetching specific-metadata about the new leader, in the produce & fetch
> response, outside of the metadata refresh. What do you think?


This is a bit vague and difficult to evaluate. If it's reasonably simple to
evaluate this alternative, can we provide comparative numbers too?

Ismael


Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-13 Thread Colin McCabe
HI Philip & Erik,

Hmm... if we agree that KIP-945 addresses this use case, I think it would be 
better to just focus on that KIP. Fundamentally it's a better and cleaner model 
than a complex scheme involving thread-local variables. I really don't want to 
be debugging complex interactions between Java thread-local variables and green 
threads.

It also generally helps to have some use-cases in mind when writing these 
things. If we get feedback about what would be useful for async runtimes, that 
would probably help improve and focus KIP-945. By the way, I can see you have a 
draft on the wiki for KIP-945 but haven't posted a DISCUSS thread yet, so I 
assume it's not ready for review yet ;)

best,
Colin


On Tue, Jul 11, 2023, at 12:24, Philip Nee wrote:
> Hey Erik - Another thing I want to add to my comment is.  We are in-process
> of re-writing the KafkaConsumer, and I think your proposal would work in
> the new consumer because we are going to separate the user thread and the
> background thread.  Here is the 1-pager, and we are in process of
> converting this in to KIP-945.
>
> Thanks,
> P
>
> On Tue, Jul 11, 2023 at 10:33 AM Philip Nee  wrote:
>
>> Hey Erik,
>>
>> Sorry for holding up this email for a few days since Colin's response
>> includes some of my concerns.  I'm in favor of this KIP, and I think your
>> approach seems safe.  Of course, I probably missed something therefore I
>> think this KIP needs to cover different use cases to demonstrate it doesn't
>> cause any unsafe access. I think this can be demonstrated via diagrams and
>> some code in the KIP.
>>
>> Thanks,
>> P
>>
>> On Sat, Jul 8, 2023 at 12:28 PM Erik van Oosten
>>  wrote:
>>
>>> Hello Colin,
>>>
>>>  >> In KIP-944, the callback thread can only delegate to another thread
>>> after reading from and writing to a threadlocal variable, providing the
>>> barriers right there.
>>>
>>>  > I don't see any documentation that accessing thread local variables
>>> provides a total store or load barrier. Do you have such documentation?
>>> It seems like if this were the case, we could eliminate volatile
>>> variables from most of the code base.
>>>
>>> Now I was imprecise. The thread-locals are only somewhat involved. In
>>> the KIP proposal the callback thread reads an access key from a
>>> thread-local variable. It then needs to pass that access key to another
>>> thread, which then can set it on its own thread-local variable. The act
>>> of passing a value from one thread to another implies that a memory
>>> barrier needs to be passed. However, this is all not so relevant since
>>> there is no need to pass the access key back when the other thread is
>>> done.
>>>
>>> But now I think about it a bit more, the locking mechanism runs in a
>>> synchronized block. If I remember correctly this should be enough to
>>> pass read and write barriers.
>>>
>>>  >> In the current implementation the consumer is also invoked from
>>> random threads. If it works now, it should continue to work.
>>>  > I'm not sure what you're referring to. Can you expand on this?
>>>
>>> Any invocation of the consumer (e.g. method poll) is not from a thread
>>> managed by the consumer. This is what I was assuming you meant with the
>>> term 'random thread'.
>>>
>>>  > Hmm, not sure what you mean by "cooperate with blocking code." If you
>>> have 10 green threads you're multiplexing on to one CPU thread, and that
>>> CPU thread gets blocked because of what one green thread is doing, the
>>> other 9 green threads are blocked too, right? I guess it's "just" a
>>> performance problem, but it still seems like it could be a serious one.
>>>
>>> There are several ways to deal with this. All async runtimes I know
>>> (Akka, Zio, Cats-effects) support this by letting you mark a task as
>>> blocking. The runtime will then either schedule it to another
>>> thread-pool, or it will grow the thread-pool to accommodate. In any case
>>> 'the other 9 green threads' will simply be scheduled to another real
>>> thread. In addition, some of these runtimes detect long running tasks
>>> and will reschedule waiting tasks to another thread. This is all a bit
>>> off topic though.
>>>
>>>  > I don't see why this has to be "inherently multi-threaded." Why can't
>>> we have the other threads report back what messages they've processed to
>>> the worker thread. Then it will be able to handle these callbacks
>>> without involving the other threads.
>>>
>>> Please consider the context which is that we are running inside the
>>> callback of the rebalance listener. The only way to execute something
>>> and also have a timeout on it is to run the something on another thread.
>>>
>>> Kind regards,
>>>  Erik.
>>>
>>>
>>> Op 08-07-2023 om 19:17 schreef Colin McCabe:
>>> > On Sat, Jul 8, 2023, at 02:41, Erik van Oosten wrote:
>>> >> Hi Colin,
>>> >>
>>> >> Thanks for your thoughts and taking the time to reply.
>>> >>
>>> >> Let me take away your concerns. None of your worries are an issue with
>>> >> 

[jira] [Created] (KAFKA-15188) Implement more of the remaining PrototypeAsyncConsumer APIs

2023-07-13 Thread Kirk True (Jira)
Kirk True created KAFKA-15188:
-

 Summary: Implement more of the remaining PrototypeAsyncConsumer 
APIs
 Key: KAFKA-15188
 URL: https://issues.apache.org/jira/browse/KAFKA-15188
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True


There are several Consumer APIs that only touch the ConsumerMetadata and/or 
SubscriptionState classes; they do not perform network I/O or otherwise block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1997

2023-07-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2641 lines...]
* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

See 
https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 6m 29s
239 actionable tasks: 190 executed, 49 up-to-date

Publishing build scan...
https://ge.apache.org/s/bqm6pp2id6jps


See the profiling report at: 
file:///home/jenkins/workspace/Kafka_kafka_trunk/build/reports/profile/profile-2023-07-13-20-32-51.html
A fine-grained performance profile is available: use the --scan option.
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
Failed in branch JDK 8 and Scala 2.12

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata:compileTestJava'.
> Compilation failed; see the compiler error output for details.

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

See 
https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 6m 40s
242 actionable tasks: 194 executed, 48 up-to-date

Publishing build scan...
https://ge.apache.org/s/rw25f7mck7hb4


See the profiling report at: 
file:///home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/build/reports/profile/profile-2023-07-13-20-33-05.html
A fine-grained performance profile is available: use the --scan option.
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
Failed in branch JDK 11 and Scala 2.13

> Task :core:compileScala
Unexpected javac output: warning: [options] bootstrap class path not set in 
conjunction with -source 8
warning: [options] source value 8 is obsolete and will be removed in a future 
release
warning: [options] target value 8 is obsolete and will be removed in a future 
release
warning: [options] To suppress warnings about obsolete options, use 
-Xlint:-options.
/home/jenkins/workspace/Kafka_kafka_trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java:80:
 warning: [removal] AccessController in java.security has been deprecated and 
marked for removal
import java.security.AccessController;
^
/home/jenkins/workspace/Kafka_kafka_trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java:196:
 warning: [removal] AccessController in java.security has been deprecated and 
marked for removal
return AccessController.doPrivileged(new 
PrivilegedAction() {
   ^
/home/jenkins/workspace/Kafka_kafka_trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java:218:
 warning: [removal] AccessController in java.security has been deprecated and 
marked for removal
return AccessController.doPrivileged(new 
PrivilegedAction() {
   ^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: 
/home/jenkins/workspace/Kafka_kafka_trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java
 uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
7 warnings.

> Task :core:classes
> Task :core:checkstyleMain
> Task :shell:compileJava
> Task :shell:classes
> Task :shell:compileTestJava
> Task :shell:testClasses
> Task :shell:spotbugsTest SKIPPED
> Task :shell:checkstyleMain
> Task :shell:checkstyleTest
> Task :clients:check
> Task :shell:spotbugsMain
> Task :shell:check
> Task :core:spotbugsMain

> Task :core:compileScala
[Warn] /home/jenkins/.gradle/workers/warning:[options] bootstrap class path not 
set in conjunction with -source 8
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/core/src/main/java/kafka/log/remote/RemoteLogManager.java:80:
  [removal] AccessController in java.security has been deprecated and marked 
for removal
[Warn] 

Re: [DISCUSS] KIP-943: Add independent "offset.storage.segment.bytes" for connect-distributed.properties

2023-07-13 Thread Greg Harris
Hey hudeqi,

Thanks for the KIP! I did not know about the existing segment.bytes
default value, and it does seem rather high in the context of the
Connect internal topics.
If I think about the segment.size as a "minimum per-partition data
transfer on startup", 1GB is certainly not appropriate for even the
single-partition config topic.

1. I have a concern about changing the topic configuration on startup.

In the existing codebase, the *.storage.* worker configurations appear
to only have an effect for newly created topics. If the topics were
manually created before a Connect cluster starts, or were created by a
previous Connect instance, then the Connect worker configuration could
have arbitrary contents that have no effect. Updating the topic
configurations after creation would be a new capability.
Consider the situation where someone were to notice this log.segment
problem, where a natural response would be to reconfigure the topic,
diverging from the two configurations. When the worker can change the
topic configuration after creation, that has the potential to roll
back topic configurations that are managed externally.
Do you think that changing the default for new Connect clusters, and
emitting a startup warning for excessive segment.bytes is reasonable?
We have other startup assertions that fail the startup of a worker
based on partition and compaction requirements, and this would be
similar in that it alerts the user to reconfigure the internal topics,
but with a lesser severity.

2. I'm also interested to know what a reasonable value for this
configuration would be. I did find the __consumer_offsets topic uses
104857600 (100 MiB) as defined in OffsetConfig.scala, so there is
precedent for having a smaller segment.size for internal topics.

3. I believe there's a potential bug where compaction can happen
before a worker reads a tombstone, leading the KafkaBasedLog to
produce inconsistent in-memory states across multiple workers. Since
the segment.size is so large, it makes me think that compaction has
been wholly ineffective so far, and has prevented this bug from
manifesting. By lowering the segment.size, we're increasing the
likelihood of this failure, so it may need to finally be addressed.

Thanks,
Greg




On Thu, Jul 6, 2023 at 5:39 AM Yash Mayya  wrote:
>
> Also, just adding to the above point - we don't necessarily need to
> explicitly add a new worker configuration right? Instead, we could
> potentially just use the new proposed default value internally which can be
> overridden by users through setting a value for
> "offset.storage.segment.bytes" (via the existing KIP-605 based mechanism).
>
> On Thu, Jul 6, 2023 at 6:04 PM Yash Mayya  wrote:
>
> > Hi hudeqi,
> >
> > Thanks for the KIP! Just to clarify - since KIP-605 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-605%3A+Expand+Connect+Worker+Internal+Topic+Settings)
> > already allows configuring "segment.bytes" for the Connect cluster's
> > offsets topic via a worker configuration ("offset.storage.segment.bytes",
> > same as what is being proposed in this KIP), the primary motivation of this
> > KIP is essentially to override the default value for that topic
> > configuration to a smaller value and decouple it from the backing Kafka
> > cluster's "log.segment.bytes" configuration? Also, I'm curious about how
> > the new default value of 50 MB was chosen (if there were any experiments
> > that were run etc.)?
> >
> > Thanks,
> > Yash
> >
> > On Mon, Jul 3, 2023 at 6:08 PM hudeqi <16120...@bjtu.edu.cn> wrote:
> >
> >> Is anyone following this KIP? Bump this thread.
> >
> >


Re: [VOTE] 3.5.1 RC0

2023-07-13 Thread Federico Valeri
Hi Divij, thanks for running the release.

I found a couple of issues with licenses. We include
plexus-utils-3.3.1, but the LICENSE file has plexus-utils-3.3.0. The
other issue is that we have classgraph-4.8.138, but we don't include
this library for a few releases (this is non blocking but it would be
good to fix).

Br
Fede




On Wed, Jul 12, 2023 at 12:14 PM Divij Vaidya  wrote:
>
> + kafka-clie...@googlegroups.com
>
> --
> Divij Vaidya
>
>
>
> On Wed, Jul 12, 2023 at 12:03 PM Divij Vaidya 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 3.5.1.
> >
> > This release is a security patch release. It upgrades the dependency,
> > snappy-java, to a version which is not vulnerable to CVE-2023-34455. You
> > can find more information about the CVE at Kafka CVE list
> > .
> >
> > Additionally, this releases fixes a regression introduced in 3.3.0, which
> > caused security.protocol configuration values to be restricted to upper
> > case only. With this release, security.protocol values are
> > case insensitive. See KAFKA-15053
> >  for details.
> >
> > Release notes for the 3.5.1 release:
> > https://home.apache.org/~divijv/kafka-3.5.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Tuesday, July 18, 9am PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~divijv/kafka-3.5.1-rc0/
> >
> > Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > Javadoc:
> > https://home.apache.org/~divijv/kafka-3.5.1-rc0/javadoc/
> >
> > Tag to be voted upon (off 3.5 branch) is the 3.5.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.5.1-rc0
> >
> > Documentation:
> > https://kafka.apache.org/35/documentation.html
> > Please note that documentation will be updated with upgrade notes (
> > https://github.com/apache/kafka/commit/4c78fd64454e25e3536e8c7ed5725d3fbe944a49)
> > after the release is complete.
> >
> > Protocol:
> > https://kafka.apache.org/35/protocol.html
> >
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/35/ (9
> > failures). I am running another couple of runs to ensure that there are no
> > consistently failing tests. I have verified that unit/integration tests on
> > my local machine successfully pass.
> >
> > System tests:
> > Not planning to run system tests since this is a patch release.
> >
> > Thank you.
> >
> > --
> > Divij Vaidya
> > Release Manager for Apache Kafka 3.5.1
> >
> >


Re: KafkaConsumer refactor proposal

2023-07-13 Thread Erik van Oosten

Hi Philip,

I have been scanning through 
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design 
and KIP-848 and from this I understand that the kafka consumer API will 
not change.


Perhaps the refactoring and/or KIP-848 is a good opportunity to improve 
the API somewhat. In this email I explain why and also give a rough idea 
what that could look like.


In the current API, the rebalance listener callback gives the user a 
chance to commit all work in progress before a partition is actually 
revoked and assigned to another consumer.


While the callback is doing all this, the main user thread is not able 
to process new incoming data. So the rebalance listener affects latency 
and throughput for non-revoked partitions during a rebalance.


In addition, I feel that doing a lot of stuff /in/ a callback is always 
quite awkward. Better only use it to trigger some processing elsewhere.


Therefore, I would like to propose a new API that does not have these 
problems and is easy to use (and I hope still easy to implement). In my 
ideal world, poll is the only method that you need. Lets call it poll2 
(to do: come up with a less crappy name). Poll2 returns more than just 
the polled records, it will also contain newly assigned partitions, 
partitions that will be revoked during the next call to poll2, 
partitions that were lost, and perhaps it will even contain the offsets 
committed so far.


The most important idea here is that partitions are not revoked 
immediately, but in the next call to poll2.


With this API, a user can commit offsets at their own pace during a 
rebalance. Optionally, for the case that processing of data from the 
to-be-revoked partition is stil ongoing, we allow the user to postpone 
the actual revocation in the next poll, so that polling can continue for 
other partitions.


Since we are no longer blocking the main user thread, partitions that 
are not revoked can be processed at full speed.


Removal of the rebalance listener also makes the API safer; there is no 
more need for the thread-id check (nor KIP-944) because, concurrent 
invocations are simply no longer needed. (Of course, if backward 
compatibility is a goal, not all of these things can be done.)


Curious to your thoughts and kind regards,
    Erik.

--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [DISCUSS] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-13 Thread Chris Egerton
Hi Omnia,

Yes, I think we ought to state the backport plan in the KIP, since it's
highly unusual for KIP changes or new configuration properties to be
backported and we should get the approval of the community (including
binding and non-binding voters) before implementing it.

Cheers,

Chris

On Thu, Jul 13, 2023 at 7:13 AM Omnia Ibrahim 
wrote:

> Hi Chris,
> The implementation should be very small so backporting this to 3.1 and 3.2
> would be perfect for this case if you or any other committer are okay with
> approving the backporting. Do we need to state this in KIP as well or not?
>
> Also, I’ll open a vote for the KIP today and prepare the pr for it so we
> can merge it as soon as we can.
>
> Thanks,
>
>  Omnia
>
> On Wed, Jul 12, 2023 at 4:31 PM Chris Egerton 
> wrote:
>
> > Hi Omnia,
> >
> > Thanks for changing the default, LGTM 
> >
> > As far as backporting goes, we probably won't be doing another release
> for
> > 3.1, and possibly not for 3.2 either; however, if we can make the
> > implementation focused enough (which I don't think would be too
> difficult,
> > but correct me if I'm wrong), then we can still backport through 3.1.
> Even
> > if we don't do another release it can make life easier for people who are
> > maintaining parallel forks. Obviously this shouldn't be taken as a
> blanket
> > precedent but in this case it seems like the benefits may outweigh the
> > costs. What are your thoughts?
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Jul 12, 2023 at 9:05 AM Omnia Ibrahim 
> > wrote:
> >
> > > Hi Chris, thanks for the feedback.
> > > 1. regarding the default value I had the same conflict of which version
> > to
> > > break the backward compatibility with. We can just say that this KIP
> > gives
> > > the release Pre KIP-690 the ability to keep the old behaviour with one
> > > config and keep the backwards compatibility from post-KIP-690 the same
> so
> > > we don't break at least the last 3 versions. I will update the KIP to
> > > switch the default value to true.
> > > 2. For the backporting, which versions can we backport these to?
> Usually,
> > > Kafka supports bugfix releases as needed for the last 3 releases. Now
> we
> > @
> > > 3.5 so the last 3 are 3.4, 3.3 and 3.2 is this correct?
> > > 3. I'll add a Jira for updating the docs for this KIP so we don't
> forget
> > > about it.
> > >
> > > Thanks
> > > Omnia
> > >
> > >
> > > On Mon, Jul 10, 2023 at 5:33 PM Chris Egerton  >
> > > wrote:
> > >
> > > > Hi Omnia,
> > > >
> > > > Thanks for taking this on! I have some thoughts but the general
> > approach
> > > > looks good.
> > > >
> > > > 1. Default value
> > > >
> > > > One thing I'm wrestling with is what the default value of the new
> > > property
> > > > should be. I know on the Jira ticket I proposed that it should be
> > false,
> > > > but I'm having second thoughts. Technically we'd preserve backward
> > > > compatibility with pre-KIP-690 releases by defaulting to false, but
> at
> > > the
> > > > same time, we'd break compatibility with post-KIP-690 releases. And
> if
> > we
> > > > default to true, the opposite would be true: compatibility would be
> > > broken
> > > > with pre-KIP-690 releases, but preserved with post-KIP-690 releases.
> > > >
> > > > One argument against defaulting to false (which, again, would
> preserve
> > > the
> > > > behavior of MM2 before we accidentally broke compatibility with
> > KIP-690)
> > > is
> > > > that this change could possibly cause a single MM2 setup to break
> > > > twice--once when upgrading from a pre-KIP-690 release to an existing
> > > > release, and again when upgrading from that existing release to a
> > version
> > > > that reverted (by default) to pre-KIP-690 behavior. On the other
> hand,
> > if
> > > > we default to true (which would preserve the existing behavior that
> > > breaks
> > > > compatibility with pre-KIP-690 releases), then any given setup will
> > only
> > > be
> > > > broken once.
> > > >
> > > > In addition, if we default to true right now, then we don't have to
> > worry
> > > > about changing that default in 4.0 to a more intuitive value (I hope
> we
> > > can
> > > > all agree that, for new clusters, it makes sense to set this property
> > to
> > > > true and not to distinguish between internal and non-internal
> topics).
> > > >
> > > > With that in mind, I'm now leaning more towards defaulting to true,
> but
> > > > would be interested in your thoughts.
> > > >
> > > >
> > > > 2. Backport?
> > > >
> > > > It's highly unlikely to backport changes for a KIP, but given the
> > impact
> > > of
> > > > the compatibility break that we're trying to address here, and the
> > > > extremely low risk of the proposed changes, I think we should
> consider
> > > > backporting the proposed fix to all affected release branches (i.e.,
> > 3.1
> > > > through 3.5).
> > > >
> > > >
> > > > 3. Extra steps
> > > >
> > > > I also think we can take these additional steps to try to help
> prevent
> > > > users from being 

Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Andrew Schofield
Hi Mayank,
If we bump the version, the broker can tell whether it’s worth providing the 
leader
endpoint information to the client when the leader has changed. That’s my 
reasoning.

Thanks,
Andrew

> On 13 Jul 2023, at 18:02, Mayank Shekhar Narula  
> wrote:
> 
> Thanks both for looking into this.
> 
> Jose,
> 
> 1/2 & 4(changes for PRODUCE) & 5 makes sense, will follow
> 
> 3. If I understood this correctly, certain replicas "aren't" brokers, what
> are they then?
> 
> Also how about replacing "Replica" with "Leader", this is more readable on
> the client. so, how about this?
>{ "name": "LeaderEndpoints", "type": "[]Leader", "versions": "15+",
> "taggedVersions": "15+", "tag": 3,
>  "about": "Endpoints for all current leaders enumerated in
> PartitionData.", "fields": [
>  { "name": "NodeId", "type": "int32", "versions": "15+",
>"mapKey": true, "entityType": "brokerId", "about": "The ID of the
> associated leader"},
>  { "name": "Host", "type": "string", "versions": "15+",
>"about": "The leader's hostname." },
>  { "name": "Port", "type": "int32", "versions": "15+",
>"about": "The leader's port." },
>  { "name": "Rack", "type": "string", "versions": "15+", "ignorable":
> true, "default": "null",
>"about": "The rack of the leader, or null if it has not been
> assigned to a rack." }
>]}
> 
> Andrew
> 
> 6. I wonder if non-Kafka clients might benefit from not bumping the
> version. If versions are bumped, say for FetchResponse to 16, I believe
> that client would have to support all versions until 16 to fully utilise
> this feature. Whereas, if not bumped, they can simply support until version
> 12( will change to version:12 for tagged fields ), and non-AK clients can
> then implement this feature. What do you think? I am inclined to not bump.
> 
> On Thu, Jul 13, 2023 at 5:21 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
> 
>> Hi José,
>> Thanks. Sounds good.
>> 
>> Andrew
>> 
>>> On 13 Jul 2023, at 16:45, José Armando García Sancio
>>  wrote:
>>> 
>>> Hi Andrew,
>>> 
>>> On Thu, Jul 13, 2023 at 8:35 AM Andrew Schofield
>>>  wrote:
 I have a question about José’s comment (2). I can see that it’s
>> possible for multiple
 partitions to change leadership to the same broker/node and it’s
>> wasteful to repeat
 all of the connection information for each topic-partition. But, I
>> think it’s important to
 know which partitions are now lead by which node. That information at
>> least needs to be
 per-partition I think. I may have misunderstood, but it sounded like
>> your comment
 suggestion lost that relationship.
>>> 
>>> Each partition in both the FETCH response and the PRODUCE response
>>> will have the CurrentLeader, the tuple leader id and leader epoch.
>>> Clients can use this information to update their partition to leader
>>> id and leader epoch mapping.
>>> 
>>> They can also use the NodeEndpoints to update their mapping from
>>> replica id to the tuple host, port and rack so that they can connect
>>> to the correct node for future FETCH requests and PRODUCE requests.
>>> 
>>> Thanks,
>>> --
>>> -José
>> 
>> 
> 
> -- 
> Regards,
> Mayank Shekhar Narula



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Mayank Shekhar Narula
Ismael

Thanks for the feedback. 1/3 are in the pipeline.

re. 2 On some busy clusters a single metadata call has been observed to
take order of ~100 milliseconds(I think it's mentioned somewhere in this
motivation). So retrying immediately on the Produce path won't make sense,
as metadata would still be stale. Hence the current proposal optimises for
fetching specific-metadata about the new leader, in the produce & fetch
response, outside of the metadata refresh. What do you think?

On Thu, Jul 13, 2023 at 5:49 PM Ismael Juma  wrote:

> Thanks for the KIP. A couple of high level points and a more specific one:
>
> 1. Given the motivation, the performance results are essential for proper
> evaluation, so I am looking forward to those.
> 2. The reasoning given for the rejected alternative seems weak to me. It
> would be useful to have numbers for that rejected alternative too.
> 3, It's wasteful to repeat the leader node information for every partition
> - we should probably have a separate list or map with the node information.
> Looks like Jose suggested the same/similar.
>
> Ismael
>
> On Thu, Jul 13, 2023 at 7:16 AM Mayank Shekhar Narula <
> mayanks.nar...@gmail.com> wrote:
>
> > Hi everyone
> >
> > Following KIP is up for discussion. Thanks for your feedback.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client
> >
> > --
> > Regards,
> > Mayank Shekhar Narula
> >
>


-- 
Regards,
Mayank Shekhar Narula


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Mayank Shekhar Narula
Thanks both for looking into this.

Jose,

1/2 & 4(changes for PRODUCE) & 5 makes sense, will follow

3. If I understood this correctly, certain replicas "aren't" brokers, what
are they then?

Also how about replacing "Replica" with "Leader", this is more readable on
the client. so, how about this?
{ "name": "LeaderEndpoints", "type": "[]Leader", "versions": "15+",
"taggedVersions": "15+", "tag": 3,
  "about": "Endpoints for all current leaders enumerated in
PartitionData.", "fields": [
  { "name": "NodeId", "type": "int32", "versions": "15+",
"mapKey": true, "entityType": "brokerId", "about": "The ID of the
associated leader"},
  { "name": "Host", "type": "string", "versions": "15+",
"about": "The leader's hostname." },
  { "name": "Port", "type": "int32", "versions": "15+",
"about": "The leader's port." },
  { "name": "Rack", "type": "string", "versions": "15+", "ignorable":
true, "default": "null",
"about": "The rack of the leader, or null if it has not been
assigned to a rack." }
]}

Andrew

6. I wonder if non-Kafka clients might benefit from not bumping the
version. If versions are bumped, say for FetchResponse to 16, I believe
that client would have to support all versions until 16 to fully utilise
this feature. Whereas, if not bumped, they can simply support until version
12( will change to version:12 for tagged fields ), and non-AK clients can
then implement this feature. What do you think? I am inclined to not bump.

On Thu, Jul 13, 2023 at 5:21 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi José,
> Thanks. Sounds good.
>
> Andrew
>
> > On 13 Jul 2023, at 16:45, José Armando García Sancio
>  wrote:
> >
> > Hi Andrew,
> >
> > On Thu, Jul 13, 2023 at 8:35 AM Andrew Schofield
> >  wrote:
> >> I have a question about José’s comment (2). I can see that it’s
> possible for multiple
> >> partitions to change leadership to the same broker/node and it’s
> wasteful to repeat
> >> all of the connection information for each topic-partition. But, I
> think it’s important to
> >> know which partitions are now lead by which node. That information at
> least needs to be
> >> per-partition I think. I may have misunderstood, but it sounded like
> your comment
> >> suggestion lost that relationship.
> >
> > Each partition in both the FETCH response and the PRODUCE response
> > will have the CurrentLeader, the tuple leader id and leader epoch.
> > Clients can use this information to update their partition to leader
> > id and leader epoch mapping.
> >
> > They can also use the NodeEndpoints to update their mapping from
> > replica id to the tuple host, port and rack so that they can connect
> > to the correct node for future FETCH requests and PRODUCE requests.
> >
> > Thanks,
> > --
> > -José
>
>

-- 
Regards,
Mayank Shekhar Narula


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Ismael Juma
Thanks for the KIP. A couple of high level points and a more specific one:

1. Given the motivation, the performance results are essential for proper
evaluation, so I am looking forward to those.
2. The reasoning given for the rejected alternative seems weak to me. It
would be useful to have numbers for that rejected alternative too.
3, It's wasteful to repeat the leader node information for every partition
- we should probably have a separate list or map with the node information.
Looks like Jose suggested the same/similar.

Ismael

On Thu, Jul 13, 2023 at 7:16 AM Mayank Shekhar Narula <
mayanks.nar...@gmail.com> wrote:

> Hi everyone
>
> Following KIP is up for discussion. Thanks for your feedback.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client
>
> --
> Regards,
> Mayank Shekhar Narula
>


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1996

2023-07-13 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Andrew Schofield
Hi José,
Thanks. Sounds good.

Andrew

> On 13 Jul 2023, at 16:45, José Armando García Sancio 
>  wrote:
> 
> Hi Andrew,
> 
> On Thu, Jul 13, 2023 at 8:35 AM Andrew Schofield
>  wrote:
>> I have a question about José’s comment (2). I can see that it’s possible for 
>> multiple
>> partitions to change leadership to the same broker/node and it’s wasteful to 
>> repeat
>> all of the connection information for each topic-partition. But, I think 
>> it’s important to
>> know which partitions are now lead by which node. That information at least 
>> needs to be
>> per-partition I think. I may have misunderstood, but it sounded like your 
>> comment
>> suggestion lost that relationship.
> 
> Each partition in both the FETCH response and the PRODUCE response
> will have the CurrentLeader, the tuple leader id and leader epoch.
> Clients can use this information to update their partition to leader
> id and leader epoch mapping.
> 
> They can also use the NodeEndpoints to update their mapping from
> replica id to the tuple host, port and rack so that they can connect
> to the correct node for future FETCH requests and PRODUCE requests.
> 
> Thanks,
> -- 
> -José



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread José Armando García Sancio
Hi Andrew,

On Thu, Jul 13, 2023 at 8:35 AM Andrew Schofield
 wrote:
> I have a question about José’s comment (2). I can see that it’s possible for 
> multiple
> partitions to change leadership to the same broker/node and it’s wasteful to 
> repeat
> all of the connection information for each topic-partition. But, I think it’s 
> important to
> know which partitions are now lead by which node. That information at least 
> needs to be
> per-partition I think. I may have misunderstood, but it sounded like your 
> comment
> suggestion lost that relationship.

Each partition in both the FETCH response and the PRODUCE response
will have the CurrentLeader, the tuple leader id and leader epoch.
Clients can use this information to update their partition to leader
id and leader epoch mapping.

They can also use the NodeEndpoints to update their mapping from
replica id to the tuple host, port and rack so that they can connect
to the correct node for future FETCH requests and PRODUCE requests.

Thanks,
-- 
-José


Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Andrew Schofield
Hi,
Thanks for the KIP, Mayank.

I have a question about José’s comment (2). I can see that it’s possible for 
multiple
partitions to change leadership to the same broker/node and it’s wasteful to 
repeat
all of the connection information for each topic-partition. But, I think it’s 
important to
know which partitions are now lead by which node. That information at least 
needs to be
per-partition I think. I may have misunderstood, but it sounded like your 
comment
suggestion lost that relationship.

I also have a comment.

6. Sometimes I believe that the version number of an RPC has been incremented
to reflect a change in behaviour even when there’s not been a change in 
structure.
Given that this is likely to be in a new version of Kafka and it’s new 
behaviour to
return this optional information, I wonder whether it’s worthwhile to increment 
the
version of the Fetch RPC anyway.

Thanks,
Andrew

> On 13 Jul 2023, at 16:20, José Armando García Sancio 
>  wrote:
> 
> Hi Mayank, thanks for the KIP. I look forward to this improvement for
> new clients.
> 
> Some comments below.
> 
> On Thu, Jul 13, 2023 at 7:15 AM Mayank Shekhar Narula
>  wrote:
>> Following KIP is up for discussion. Thanks for your feedback
> 
> Regarding the FETCH response changes:
> 1. Tagged field 2 already exists. Looks like you can use tagged field 3.
> 
> 2. The CurrentLeaderBroker should not be within PartitionData. It is
> possible to have multiple partitions that would change leadership to
> the same broker/node. We should instead move that information to a
> top-level field that is an array of (replica id, host, port, rack).
> 
> 3. In the future, I may use this information in the KRaft/Metadata
> implementation of FETCH. In that implementation not all of the
> replicas are brokers. Do you mind removing all references to the word
> broker in the description and field name. Maybe you can use the word
> replica instead. How about something like this:
>{ "name": "NodeEndpoints", "type": "[]NodeEndpoint", "versions":
> "15+", "taggedVersions": "15+", "tag": 3,
>  "about": "Endpoint information for all current leaders
> enumerated in PartitionData.", "fields": [
>  { "name": "ReplicaId", "type": "int32", "versions": "15+",
> "mapKey": true, "entityType": "brokerId",
>"about": "The ID of the associated replica"},
>  { "name": "Host", "type": "string", "versions": "15+",
>"about": "The replica's hostname." },
>  { "name": "Port", "type": "int32", "versions": "15+",
>"about": "The replica's port." },
>  { "name": "Rack", "type": "string", "versions": "15+",
> "ignorable": true, "default": "null",
>"about": "The rack of the replica, or null if it has not
> been assigned to a rack." }
>  ]},
> 
> Regarding the PRODUCE response changes:
> 4. Can we make similar changes as the ones mentioned in bullet points
> 2. and 3 above?.
> 
> 5. If you make the changes enumerated in bullet point 4., you'll
> probably want to change the tag so that NodeEpoint has tag 0 while
> CurrentLeader has tag 1.
> 
> Thanks!
> -- 
> -José



Re: [VOTE] KIP-941 Range queries to accept null lower and upper bounds

2023-07-13 Thread Lucia Cerchie
Hi all,

Thanks to everyone who participated in the voting and the discussion. I'll
close it since it has been open for over 72 hours, and we have
sufficient votes. KIP-941 has been accepted with the following +1
votes (and no +0 or -1 votes):

- Sophie Blee-Goldman (binding)
- Matthias J. Sax (binding)
- Bill Bejeck (binding)

Thanks, Lucia


On Mon, Jul 10, 2023 at 5:40 PM Sophie Blee-Goldman 
wrote:

> Thanks for the KIP!
>
> +1 (binding)
>
> On Mon, Jul 10, 2023 at 4:30 PM Matthias J. Sax  wrote:
>
> > +1 (binding)
> >
> > On 7/10/23 12:13 PM, Bill Bejeck wrote:
> > > Hi Lucia,
> > >
> > > Thanks for the KIP! It will be a welcomed improvement.
> > >
> > > +1(binding)
> > >
> > > -Bill
> > >
> > > On Mon, Jul 10, 2023 at 2:40 PM Lucia Cerchie
> > 
> > > wrote:
> > >
> > >> Hello everyone,
> > >>
> > >> I'd like to call a vote on KIP-941
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-941%3A+Range+queries+to+accept+null+lower+and+upper+bounds
> > >>> .
> > >> It has been under discussion since June 26, and has received edits to
> > the
> > >> KIP and approval by discussion participants.
> > >>
> > >> Best,
> > >> Lucia
> > >>
> > >> --
> > >>
> > >> [image: Confluent] 
> > >> Lucia Cerchie
> > >> Developer Advocate
> > >> Follow us: [image: Blog]
> > >> <
> > >>
> >
> https://www.confluent.io/blog?utm_source=footer_medium=email_campaign=ch.email-signature_type.community_content.blog
> > >>> [image:
> > >> Twitter] [image: Slack]
> > >> [image: YouTube]
> > >> 
> > >>
> > >> [image: Try Confluent Cloud for Free]
> > >> <
> > >>
> >
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound_source=gmail_medium=organic
> > >>>
> > >>
> > >
> >
>


-- 

[image: Confluent] 
Lucia Cerchie
Developer Advocate
Follow us: [image: Blog]
[image:
Twitter] [image: Slack]
[image: YouTube]


[image: Try Confluent Cloud for Free]



Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread José Armando García Sancio
Hi Mayank, thanks for the KIP. I look forward to this improvement for
new clients.

Some comments below.

On Thu, Jul 13, 2023 at 7:15 AM Mayank Shekhar Narula
 wrote:
> Following KIP is up for discussion. Thanks for your feedback

Regarding the FETCH response changes:
1. Tagged field 2 already exists. Looks like you can use tagged field 3.

2. The CurrentLeaderBroker should not be within PartitionData. It is
possible to have multiple partitions that would change leadership to
the same broker/node. We should instead move that information to a
top-level field that is an array of (replica id, host, port, rack).

3. In the future, I may use this information in the KRaft/Metadata
implementation of FETCH. In that implementation not all of the
replicas are brokers. Do you mind removing all references to the word
broker in the description and field name. Maybe you can use the word
replica instead. How about something like this:
{ "name": "NodeEndpoints", "type": "[]NodeEndpoint", "versions":
"15+", "taggedVersions": "15+", "tag": 3,
  "about": "Endpoint information for all current leaders
enumerated in PartitionData.", "fields": [
  { "name": "ReplicaId", "type": "int32", "versions": "15+",
"mapKey": true, "entityType": "brokerId",
"about": "The ID of the associated replica"},
  { "name": "Host", "type": "string", "versions": "15+",
"about": "The replica's hostname." },
  { "name": "Port", "type": "int32", "versions": "15+",
"about": "The replica's port." },
  { "name": "Rack", "type": "string", "versions": "15+",
"ignorable": true, "default": "null",
"about": "The rack of the replica, or null if it has not
been assigned to a rack." }
  ]},

Regarding the PRODUCE response changes:
4. Can we make similar changes as the ones mentioned in bullet points
2. and 3 above?.

5. If you make the changes enumerated in bullet point 4., you'll
probably want to change the tag so that NodeEpoint has tag 0 while
CurrentLeader has tag 1.

Thanks!
-- 
-José


Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

2023-07-13 Thread Viktor Somogyi-Vass
Mickael, have you had some time to review this by any chance?

On Tue, Jun 20, 2023 at 5:23 PM Viktor Somogyi-Vass <
viktor.somo...@cloudera.com> wrote:

> Hey all,
>
> I'd like to revive this discussion. I've created
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-879%3A+Multi-level+Rack+Awareness
> last November and it seems to be that there is a nice overlap between the
> two and would be good to merge. Should we revive KIP-660 and merge the two
> KIPs?
> If you don't have time for this Mickael currently, I'm happy to take it
> over from you and merge the two interfaces, it seems like they're somewhat
> similar (and also with the current internal interface).
>
> Best,
> Viktor
>
> On Tue, May 31, 2022 at 3:57 PM Mickael Maison 
> wrote:
>
>> Hi Vikas,
>>
>> You make some very good points and most importantly I agree that being
>> able to prevent putting new partitions on a broker should be part of
>> Kafka itself and not require a plugin.
>>
>> This feature would addresses 2 out of the 3 scenarios mentioned in the
>> motivation section. The last one "When adding brokers to a cluster,
>> Kafka currently does not necessarily place new partitions on new
>> brokers" is clearly less important.
>>
>> So I think I'll retire this KIP and I'll follow up with a new KIP to
>> focus on that feature.
>>
>> Thanks,
>> Mickael
>>
>>
>> On Mon, May 9, 2022 at 8:11 PM Vikas Singh 
>> wrote:
>> >
>> > Hi Mickael,
>> >
>> > It's a nice proposal. It's appealing to have a pluggable way to override
>> > default kafka placement decisions, and the motivation section lists
>> some of
>> > them. Here are few comments:
>> >
>> > * The motivation section has "When adding brokers to a cluster, Kafka
>> > currently does not necessarily place new partitions on new brokers". I
>> am
>> > not sure how valuable doing this will be. A newly created kafka topic
>> takes
>> > time to reach the same usage level as existing topics, say because the
>> > topic created by a new workload that is getting onboarded, or the
>> expansion
>> > was done to relieve disk pressure on existing nodes etc. While new
>> topics
>> > catch up to existing workload, the new brokers are not sharing equal
>> load
>> > in the cluster, which probably defeats the purpose of adding new
>> brokers.
>> > In addition to that clustering new topics like this on new brokers have
>> > implications from fault domain perspective. A reasonable way to
>> approach it
>> > is to indeed use CruiseControl to move things around so that the newly
>> > added nodes become immediately involved and share cluster load.
>> > * Regarding "When administrators want to remove brokers from a cluster,
>> > there is no way to prevent Kafka from placing partitions on them", this
>> is
>> > indeed an issue. I would argue that this is needed by everyone and
>> should
>> > be part of Kafka, instead of being implemented as part of a plugin
>> > interface by multiple teams.
>> > * For "When some brokers are near their storage/throughput limit, Kafka
>> > could avoid putting new partitions on them", while this can help relieve
>> > short term overload I think again the correct solution here is something
>> > like CruiseControl where the system is monitored and things moved
>> around to
>> > maintain a balanced cluster. A new topic will not take any disk space,
>> so
>> > placing them anywhere normally isn't going to add to the storage
>> overload.
>> > Similar to the previous case, maybe a mechanism in Kafka to put nodes
>> in a
>> > quarantine state is a better way to approach this.
>> >
>> > In terms of the proposed api, I have a couple of comments:
>> >
>> > * It is not clear if the proposal applies to partitions of new topics or
>> > addition on partitions to an existing topic. Explicitly stating that
>> will
>> > be helpful.
>> > * Regarding part "To address the use cases identified in the motivation
>> > section, some knowledge about the current state of the cluster is
>> > necessary. Details whether a new broker has just been added or is being
>> > decommissioned are not part of the cluster metadata. Therefore such
>> > knowledge has to be provided via an external means to the ReplicaPlacer,
>> > for example via the configuration". It's not clear how this will be
>> done.
>> > If I have to implement this interface, it will be helpful to have clear
>> > guidance/examples here which hopefully ties to the use cases in the
>> > motivation section. It also allows us to figure out if the proposed
>> > interface is complete and helps future implementers of the interface.
>> >
>> > Couple of minor comments:
>> > * The KIP is not listed in the main KIP page (
>> >
>> https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> ).
>> > Can you please add it there.
>> > * The page has "This is especially true for the 4 scenarios listed in
>> the
>> > Motivation section", but there are only 3 scenarios listed.
>> >
>> > Regards,
>> > Vikas
>> >
>> >
>> > On Tue, May 3, 2022 at 

[DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-13 Thread Mayank Shekhar Narula
Hi everyone

Following KIP is up for discussion. Thanks for your feedback.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

-- 
Regards,
Mayank Shekhar Narula


[Vote] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-13 Thread Omnia Ibrahim
Hi Everyone! I would like to start the vote on KIP-949 details is here  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy

Thanks
Omnia

[jira] [Created] (KAFKA-15187) Add headers to partition method.

2023-07-13 Thread Jacob Tomy (Jira)
Jacob Tomy created KAFKA-15187:
--

 Summary: Add headers to partition method.
 Key: KAFKA-15187
 URL: https://issues.apache.org/jira/browse/KAFKA-15187
 Project: Kafka
  Issue Type: New Feature
Reporter: Jacob Tomy


Add headers to partition method.

 

This will enable selecting partitions based on header values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-852: Optimize calculation of size for log in remote tier

2023-07-13 Thread Jorge Esteban Quilcate Otoya
+1 (non-binding)

Thanks for the KIP!

Jorge.

On Thu, 13 Jul 2023 at 12:26, Luke Chen  wrote:

> +1 (binding) from me.
>
> Thanks for the KIP!
>
> Luke
>
> On Sun, Jul 2, 2023 at 11:49 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > +1 (non-binding). Thanks for the KIP!
> >
> > —
> > Kamal
> >
> > On Mon, 7 Nov 2022 at 2:20 AM, John Roesler  wrote:
> >
> > > Hi Divij,
> > >
> > > Thanks for the KIP!
> > >
> > > I’ve read through your write-up, and it sounds reasonable to me.
> > >
> > > I’m +1 (binding)
> > >
> > > Thanks,
> > > John
> > >
> > > On Tue, Nov 1, 2022, at 05:03, Divij Vaidya wrote:
> > > > Hey folks
> > > >
> > > > The discuss thread for this KIP has been open for a few months with
> no
> > > > concerns being surfaced. I would like to start a vote for the
> > > > implementation of this KIP.
> > > >
> > > > The KIP is available at
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > >
> > > >
> > > > Regards
> > > > Divij Vaidya
> > >
> >
>


Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-13 Thread Jorge Esteban Quilcate Otoya
Thanks Divij.

I was confusing with the metric tags used by clients that are based on
topic and partition. Ideally partition label could be at a DEBUG recording
level, but that's outside the scope of this KIP.

Looks good to me, thanks again!

Jorge.

On Wed, 12 Jul 2023 at 15:55, Divij Vaidya  wrote:

> Jorge,
> About API name: Good point. I have changed it to remoteLogSize instead of
> getRemoteLogSize
>
> About partition tag in the metric: We don't use partition tag across any of
> the RemoteStorage metrics and I would like to keep this metric aligned with
> the rest. I will change the metric though to type=BrokerTopicMetrics
> instead of type=RemoteLogManager, since this is topic level information and
> not specific to RemoteLogManager.
>
>
> Satish,
> Ah yes! Updated from "This would increase the broker start-up time." to
> "This would increase the bootstrap time for the remote storage thread pool
> before the first eligible segment is archived."
>
> --
> Divij Vaidya
>
>
>
> On Mon, Jul 3, 2023 at 2:07 PM Satish Duggana 
> wrote:
>
> > Thanks Divij for taking the feedback and updating the motivation
> > section in the KIP.
> >
> > One more comment on Alternative solution-3, The con is not valid as
> > that will not affect the broker restart times as discussed in the
> > earlier email in this thread. You may want to update that.
> >
> > ~Satish.
> >
> > On Sun, 2 Jul 2023 at 01:03, Divij Vaidya 
> wrote:
> > >
> > > Thank you folks for reviewing this KIP.
> > >
> > > Satish, I have modified the motivation to make it more clear. Now it
> > says,
> > > "Since the main feature of tiered storage is storing a large amount of
> > > data, we expect num_remote_segments to be large. A frequent linear scan
> > > (i.e. listing all segment metadata) could be expensive/slower because
> of
> > > the underlying storage used by RemoteLogMetadataManager. This slowness
> to
> > > list all segment metadata could result in the loss of availability"
> > >
> > > Jun, Kamal, Satish, if you don't have any further concerns, I would
> > > appreciate a vote for this KIP in the voting thread -
> > > https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > Thanks for the explanation. LGTM.
> > > >
> > > > --
> > > > Kamal
> > > >
> > > > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Divij,
> > > > > I am fine with having an API to compute the size as I mentioned in
> my
> > > > > earlier reply in this mail thread. But I have the below comment for
> > > > > the motivation for this KIP.
> > > > >
> > > > > As you discussed offline, the main issue here is listing calls for
> > > > > remote log segment metadata is slower because of the storage used
> for
> > > > > RLMM. These can be avoided with this new API.
> > > > >
> > > > > Please add this in the motivation section as it is one of the main
> > > > > motivations for the KIP.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Sat, 1 Jul 2023 at 01:43, Jun Rao 
> > wrote:
> > > > > >
> > > > > > Hi, Divij,
> > > > > >
> > > > > > Sorry for the late reply.
> > > > > >
> > > > > > Given your explanation, the new API sounds reasonable to me. Is
> > that
> > > > > enough
> > > > > > to build the external metadata layer for the remote segments or
> do
> > you
> > > > > need
> > > > > > some additional API changes?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya <
> > divijvaidy...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thank you for looking into this Kamal.
> > > > > > >
> > > > > > > You are right in saying that a cold start (i.e. leadership
> > failover
> > > > or
> > > > > > > broker startup) does not impact the broker startup duration.
> But
> > it
> > > > > does
> > > > > > > have the following impact:
> > > > > > > 1. It leads to a burst of full-scan requests to RLMM in case
> > multiple
> > > > > > > leadership failovers occur at the same time. Even if the RLMM
> > > > > > > implementation has the capability to serve the total size from
> an
> > > > index
> > > > > > > (and hence handle this burst), we wouldn't be able to use it
> > since
> > > > the
> > > > > > > current API necessarily calls for a full scan.
> > > > > > > 2. The archival (copying of data to tiered storage) process
> will
> > > > have a
> > > > > > > delayed start. The delayed start of archival could lead to
> local
> > > > build
> > > > > up
> > > > > > > of data which may lead to disk full.
> > > > > > >
> > > > > > > The disadvantage of adding this new API is that every provider
> > will
> > > > > have to
> > > > > > > implement it, agreed. But I believe that this tradeoff is
> > worthwhile
> > > > > since
> > > > > > > the 

Re: [DISCUSS] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-13 Thread Omnia Ibrahim
Hi Chris,
The implementation should be very small so backporting this to 3.1 and 3.2
would be perfect for this case if you or any other committer are okay with
approving the backporting. Do we need to state this in KIP as well or not?

Also, I’ll open a vote for the KIP today and prepare the pr for it so we
can merge it as soon as we can.

Thanks,

 Omnia

On Wed, Jul 12, 2023 at 4:31 PM Chris Egerton 
wrote:

> Hi Omnia,
>
> Thanks for changing the default, LGTM 
>
> As far as backporting goes, we probably won't be doing another release for
> 3.1, and possibly not for 3.2 either; however, if we can make the
> implementation focused enough (which I don't think would be too difficult,
> but correct me if I'm wrong), then we can still backport through 3.1. Even
> if we don't do another release it can make life easier for people who are
> maintaining parallel forks. Obviously this shouldn't be taken as a blanket
> precedent but in this case it seems like the benefits may outweigh the
> costs. What are your thoughts?
>
> Cheers,
>
> Chris
>
> On Wed, Jul 12, 2023 at 9:05 AM Omnia Ibrahim 
> wrote:
>
> > Hi Chris, thanks for the feedback.
> > 1. regarding the default value I had the same conflict of which version
> to
> > break the backward compatibility with. We can just say that this KIP
> gives
> > the release Pre KIP-690 the ability to keep the old behaviour with one
> > config and keep the backwards compatibility from post-KIP-690 the same so
> > we don't break at least the last 3 versions. I will update the KIP to
> > switch the default value to true.
> > 2. For the backporting, which versions can we backport these to? Usually,
> > Kafka supports bugfix releases as needed for the last 3 releases. Now we
> @
> > 3.5 so the last 3 are 3.4, 3.3 and 3.2 is this correct?
> > 3. I'll add a Jira for updating the docs for this KIP so we don't forget
> > about it.
> >
> > Thanks
> > Omnia
> >
> >
> > On Mon, Jul 10, 2023 at 5:33 PM Chris Egerton 
> > wrote:
> >
> > > Hi Omnia,
> > >
> > > Thanks for taking this on! I have some thoughts but the general
> approach
> > > looks good.
> > >
> > > 1. Default value
> > >
> > > One thing I'm wrestling with is what the default value of the new
> > property
> > > should be. I know on the Jira ticket I proposed that it should be
> false,
> > > but I'm having second thoughts. Technically we'd preserve backward
> > > compatibility with pre-KIP-690 releases by defaulting to false, but at
> > the
> > > same time, we'd break compatibility with post-KIP-690 releases. And if
> we
> > > default to true, the opposite would be true: compatibility would be
> > broken
> > > with pre-KIP-690 releases, but preserved with post-KIP-690 releases.
> > >
> > > One argument against defaulting to false (which, again, would preserve
> > the
> > > behavior of MM2 before we accidentally broke compatibility with
> KIP-690)
> > is
> > > that this change could possibly cause a single MM2 setup to break
> > > twice--once when upgrading from a pre-KIP-690 release to an existing
> > > release, and again when upgrading from that existing release to a
> version
> > > that reverted (by default) to pre-KIP-690 behavior. On the other hand,
> if
> > > we default to true (which would preserve the existing behavior that
> > breaks
> > > compatibility with pre-KIP-690 releases), then any given setup will
> only
> > be
> > > broken once.
> > >
> > > In addition, if we default to true right now, then we don't have to
> worry
> > > about changing that default in 4.0 to a more intuitive value (I hope we
> > can
> > > all agree that, for new clusters, it makes sense to set this property
> to
> > > true and not to distinguish between internal and non-internal topics).
> > >
> > > With that in mind, I'm now leaning more towards defaulting to true, but
> > > would be interested in your thoughts.
> > >
> > >
> > > 2. Backport?
> > >
> > > It's highly unlikely to backport changes for a KIP, but given the
> impact
> > of
> > > the compatibility break that we're trying to address here, and the
> > > extremely low risk of the proposed changes, I think we should consider
> > > backporting the proposed fix to all affected release branches (i.e.,
> 3.1
> > > through 3.5).
> > >
> > >
> > > 3. Extra steps
> > >
> > > I also think we can take these additional steps to try to help prevent
> > > users from being bitten by this change:
> > >
> > > - Add a note to our upgrade instructions [1] for all affected versions
> > that
> > > instructs users on how to safely upgrade to a post-KIP-690 release, for
> > > versions that both do and do not include the changes from this KIP
> > > - Log a warning message on MM2 startup if the config contains an
> explicit
> > > value for "replication.policy.separator" but does not contain an
> explicit
> > > value for "replication.policy.internal.topic.separator.enabled"
> > >
> > > These details don't necessarily have to be codified in the KIP, but
> > they're
> > > worth taking into account when 

[jira] [Created] (KAFKA-15186) AppInfo metrics don't contain the client-id

2023-07-13 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-15186:
--

 Summary: AppInfo metrics don't contain the client-id
 Key: KAFKA-15186
 URL: https://issues.apache.org/jira/browse/KAFKA-15186
 Project: Kafka
  Issue Type: Task
  Components: metrics
Reporter: Mickael Maison


All Kafka components register AppInfo metrics to track the application start 
time or commit id.

The AppInfoParser class registers a JMX MBean with the provided client-id but 
when it adds metrics to the Metrics registry the client-id is not included. 

This means if you use a custom MetricsReporter, the metrics you get don't have 
the client-id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15185) Consumers using the latest strategy may lose data after the topic adds partitions

2023-07-13 Thread RivenSun (Jira)
RivenSun created KAFKA-15185:


 Summary: Consumers using the latest strategy may lose data after 
the topic adds partitions
 Key: KAFKA-15185
 URL: https://issues.apache.org/jira/browse/KAFKA-15185
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 3.4.1
Reporter: RivenSun
Assignee: Luke Chen


h2. condition:

1. Business topic adds partition
2. The configuration metadata.max.age.ms of producers and consumers is set to 
five minutes.
But the producer discovered the new partition before the consumer, and 
generated 100 messages to the new partition.
3. The consumer parameter auto.offset.reset is set to latest
h2. result:

Consumers will lose these 100 messages


First of all we cannot directly set auto.offset.reset to {*}earliest{*}.
Because the user's demand is that a newly subscribed group can discard all old 
messages of the topic.
However, after the group is subscribed, the message generated by the expanded 
partition must be guaranteed not to be lost, similar to starting consumption 
from the earliest.
h2. 
suggestion:

So we have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the 
producer's configuration.
But this still can't solve the problem, because in many cases, the producer may 
force refresh the metadata.
Secondly, a smaller metadata.max.age.ms value will bring more metadata refresh 
requests, which will increase the burden on the broker.

So can we add a parameter to control how the consumer determines whether to 
start consumption from the earliest or latest for the newly added partition.
Perhaps during the rebalance process, the leaderConsumer needs to mark which 
partitions are newly added when calculating the assignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-852: Optimize calculation of size for log in remote tier

2023-07-13 Thread Luke Chen
+1 (binding) from me.

Thanks for the KIP!

Luke

On Sun, Jul 2, 2023 at 11:49 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> +1 (non-binding). Thanks for the KIP!
>
> —
> Kamal
>
> On Mon, 7 Nov 2022 at 2:20 AM, John Roesler  wrote:
>
> > Hi Divij,
> >
> > Thanks for the KIP!
> >
> > I’ve read through your write-up, and it sounds reasonable to me.
> >
> > I’m +1 (binding)
> >
> > Thanks,
> > John
> >
> > On Tue, Nov 1, 2022, at 05:03, Divij Vaidya wrote:
> > > Hey folks
> > >
> > > The discuss thread for this KIP has been open for a few months with no
> > > concerns being surfaced. I would like to start a vote for the
> > > implementation of this KIP.
> > >
> > > The KIP is available at
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > >
> > >
> > > Regards
> > > Divij Vaidya
> >
>


Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-13 Thread Luke Chen
Hi Divij,

One minor comment:
remoteLogSize takes 2 parameters, but in the code snippet, you only provide
1 parameter.

Otherwise, LGTM

Thank you.
Luke

On Wed, Jul 12, 2023 at 8:56 PM Divij Vaidya 
wrote:

> Jorge,
> About API name: Good point. I have changed it to remoteLogSize instead of
> getRemoteLogSize
>
> About partition tag in the metric: We don't use partition tag across any of
> the RemoteStorage metrics and I would like to keep this metric aligned with
> the rest. I will change the metric though to type=BrokerTopicMetrics
> instead of type=RemoteLogManager, since this is topic level information and
> not specific to RemoteLogManager.
>
>
> Satish,
> Ah yes! Updated from "This would increase the broker start-up time." to
> "This would increase the bootstrap time for the remote storage thread pool
> before the first eligible segment is archived."
>
> --
> Divij Vaidya
>
>
>
> On Mon, Jul 3, 2023 at 2:07 PM Satish Duggana 
> wrote:
>
> > Thanks Divij for taking the feedback and updating the motivation
> > section in the KIP.
> >
> > One more comment on Alternative solution-3, The con is not valid as
> > that will not affect the broker restart times as discussed in the
> > earlier email in this thread. You may want to update that.
> >
> > ~Satish.
> >
> > On Sun, 2 Jul 2023 at 01:03, Divij Vaidya 
> wrote:
> > >
> > > Thank you folks for reviewing this KIP.
> > >
> > > Satish, I have modified the motivation to make it more clear. Now it
> > says,
> > > "Since the main feature of tiered storage is storing a large amount of
> > > data, we expect num_remote_segments to be large. A frequent linear scan
> > > (i.e. listing all segment metadata) could be expensive/slower because
> of
> > > the underlying storage used by RemoteLogMetadataManager. This slowness
> to
> > > list all segment metadata could result in the loss of availability"
> > >
> > > Jun, Kamal, Satish, if you don't have any further concerns, I would
> > > appreciate a vote for this KIP in the voting thread -
> > > https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > Thanks for the explanation. LGTM.
> > > >
> > > > --
> > > > Kamal
> > > >
> > > > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Divij,
> > > > > I am fine with having an API to compute the size as I mentioned in
> my
> > > > > earlier reply in this mail thread. But I have the below comment for
> > > > > the motivation for this KIP.
> > > > >
> > > > > As you discussed offline, the main issue here is listing calls for
> > > > > remote log segment metadata is slower because of the storage used
> for
> > > > > RLMM. These can be avoided with this new API.
> > > > >
> > > > > Please add this in the motivation section as it is one of the main
> > > > > motivations for the KIP.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Sat, 1 Jul 2023 at 01:43, Jun Rao 
> > wrote:
> > > > > >
> > > > > > Hi, Divij,
> > > > > >
> > > > > > Sorry for the late reply.
> > > > > >
> > > > > > Given your explanation, the new API sounds reasonable to me. Is
> > that
> > > > > enough
> > > > > > to build the external metadata layer for the remote segments or
> do
> > you
> > > > > need
> > > > > > some additional API changes?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya <
> > divijvaidy...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thank you for looking into this Kamal.
> > > > > > >
> > > > > > > You are right in saying that a cold start (i.e. leadership
> > failover
> > > > or
> > > > > > > broker startup) does not impact the broker startup duration.
> But
> > it
> > > > > does
> > > > > > > have the following impact:
> > > > > > > 1. It leads to a burst of full-scan requests to RLMM in case
> > multiple
> > > > > > > leadership failovers occur at the same time. Even if the RLMM
> > > > > > > implementation has the capability to serve the total size from
> an
> > > > index
> > > > > > > (and hence handle this burst), we wouldn't be able to use it
> > since
> > > > the
> > > > > > > current API necessarily calls for a full scan.
> > > > > > > 2. The archival (copying of data to tiered storage) process
> will
> > > > have a
> > > > > > > delayed start. The delayed start of archival could lead to
> local
> > > > build
> > > > > up
> > > > > > > of data which may lead to disk full.
> > > > > > >
> > > > > > > The disadvantage of adding this new API is that every provider
> > will
> > > > > have to
> > > > > > > implement it, agreed. But I believe that this tradeoff is
> > worthwhile
> > > > > since
> > > > > > > the default implementation could be the same as you mentioned,
> > i.e.
> > > > > keeping
> > > > > >