[jira] [Created] (KAFKA-16946) Utils.getHost/getPort cannot parse SASL_PLAINTEXT://host:port

2024-06-12 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16946:
-

 Summary: Utils.getHost/getPort cannot parse 
SASL_PLAINTEXT://host:port
 Key: KAFKA-16946
 URL: https://issues.apache.org/jira/browse/KAFKA-16946
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.8.0
Reporter: Luke Chen


In KAFKA-16824, we tried to improve the regex for Utils.getHost/getPort, but it 
failed to parse SASL_PLAINTEXT://host:port now. Need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-12 Thread Luke Chen
Hi Josep

For KIP-966, I think Calvin had mentioned he won't complete in v3.8.0.
https://lists.apache.org/thread/fsnr8wy5fznzfso7jgk90skgyo277fmw

For unclean leader election, all we need is this PR:
https://github.com/apache/kafka/pull/16284
For this PR, I think it needs one more week to be completed.

Thanks.
Luke

On Wed, Jun 12, 2024 at 4:51 PM Josep Prat 
wrote:

> Hi all,
>
> We are now really close to the planned code freeze deadline (today EOD).
> According to KIP-1012 [1] we agreed to stay in the 3.x branch until we
> achieve feature parity regarding Zookeeper and KRaft. The two main KIPs
> identified that would achieve this are: KIP-853 [2] and KIP-966 [3].
> At the moment of writing this email both KIPs are not completed. My
> question to the people driving both KIPs would be, how much more time do
> you think it's needed to bring these KIPs to completion?
>
> - If the time needed would be short, we could still include these 2 KIPs in
> the release.
> - However, if the time needed would be on the scale of weeks, we should
> continue with the release plan for 3.8 and after start working on the 3.9
> release.
>
> What are your thoughts?
>
>
> [1]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1012%3A+The+need+for+a+Kafka+3.8.x+release
> [2]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes
> [3]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
>
> On Wed, Jun 12, 2024 at 10:40 AM Josep Prat  wrote:
>
> > Hi Rajini,
> > Yes, we could backport this one to the 3.8 branch. Would you be able to
> do
> > this once you merge this PR?
> >
> > Thanks
> >
> > On Tue, Jun 11, 2024 at 10:53 PM Rajini Sivaram  >
> > wrote:
> >
> >> Hi Josep,
> >>
> >> The PR https://github.com/apache/kafka/pull/13277 for KIP-899 looks
> ready
> >> to be merged (waiting for the PR build).The PR changes several files,
> but
> >> is relatively straightforward and not risky. Also the changes are under
> a
> >> config that is not enabled by default. Since the KIP was approved before
> >> KIP freeze, will it be ok to include in 3.8.0?
> >>
> >> Thank you,
> >>
> >> Rajini
> >>
> >>
> >> On Tue, Jun 11, 2024 at 9:35 AM Josep Prat  >
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I just want to remind everybody that the code freeze deadline is
> >> > approaching. June 12th EOD is the deadline.
> >> >
> >> > Please do not automatically backport any commit to the 3.8 branch
> after
> >> > June 12th EOD. Ping me if you think the commit should be backported
> >> (fixes
> >> > failures in the branch or critical bug fixes).
> >> >
> >> > Thanks all!
> >> >
> >> > On Sat, Jun 1, 2024 at 8:43 PM José Armando García Sancio
> >> >  wrote:
> >> >
> >> > > Hi Josep,
> >> > >
> >> > > See my comments below.
> >> > >
> >> > > On Wed, May 29, 2024 at 10:52 AM Josep Prat
> >>  >> > >
> >> > > wrote:
> >> > > > So I would propose to leave the deadlines as they are and manually
> >> > cherry
> >> > > > pick the commits related to KIP-853 and KIP-966.
> >> > >
> >> > > Your proposal sounds good to me. I suspect that will be doing
> feature
> >> > > development for KIP-853 past the feature freeze and code freeze
> date.
> >> > > Maybe feature development will be finished around the end of June.
> >> > >
> >> > > I'll make sure to cherry pick commits for KIP-853 to the 3.8 branch
> >> > > once we have one.
> >> > >
> >> > > Thanks,
> >> > > --
> >> > > -José
> >> > >
> >> >
> >> >
> >> > --
> >> > [image: Aiven] 
> >> >
> >> > *Josep Prat*
> >> > Open Source Engineering Director, *Aiven*
> >> > josep.p...@aiven.io   |   +491715557497
> >> > aiven.io    |   <
> >> https://www.facebook.com/aivencloud
> >> > >
> >> >      <
> >> > https://twitter.com/aiven_io>
> >> > *Aiven Deutschland GmbH*
> >> > Alexanderufer 3-7, 10117 Berlin
> >> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >> > Amtsgericht Charlottenburg, HRB 209739 B
> >> >
> >>
> >
> >
> > --
> > [image: Aiven] 
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io    |
> > 
> >    <
> https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht 

[jira] [Resolved] (KAFKA-16924) No log output when running kafka

2024-06-10 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16924.
---
Resolution: Fixed

> No log output when running kafka 
> -
>
> Key: KAFKA-16924
> URL: https://issues.apache.org/jira/browse/KAFKA-16924
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
> Fix For: 4.0.0
>
>
> In [https://github.com/apache/kafka/pull/12148] , we removed log4jAppender 
> dependency, and add testImplementation dependency for `slf4jlog4j` lib. 
> However, we need this runtime dependency in tools module to output logs. 
> ([ref]([https://stackoverflow.com/a/21787813])) Adding this dependency back.
>  
> Note: The {{slf4jlog4j}} lib was added in {{log4j-appender}} dependency. 
> Since it's removed, we need to explicitly declare it.
>  
> Current output will be like this:
> {code:java}
> > ./gradlew clean jar
> > bin/kafka-server-start.sh config/kraft/controller.properties
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16924) No log output when running kafka

2024-06-09 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16924:
-

 Summary: No log output when running kafka 
 Key: KAFKA-16924
 URL: https://issues.apache.org/jira/browse/KAFKA-16924
 Project: Kafka
  Issue Type: Bug
Reporter: Luke Chen
Assignee: Luke Chen
 Fix For: 4.0.0


In [https://github.com/apache/kafka/pull/12148] , we removed log4jAppender 
dependency, and "Add {{compileOnly}} dependency from {{tools}} to {{log4j}} 
(same approach as {{{}core{}}})." . However, we need this runtime dependency in 
tools module to output logs. Adding this dependency back.

 

Current output will be like this:
{code:java}
> bin/kafka-server-start.sh config/kraft/controller.properties
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16918) TestUtils#assertFutureThrows should use future.get with timeout

2024-06-08 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16918:
-

 Summary: TestUtils#assertFutureThrows should use future.get with 
timeout
 Key: KAFKA-16918
 URL: https://issues.apache.org/jira/browse/KAFKA-16918
 Project: Kafka
  Issue Type: Test
Reporter: Luke Chen


In KAFKA-16916, we had a test running forever. To avoid this issue happened 
again, we can use future.get with timeout in TestUtils#assertFutureThrows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16916) ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run forever

2024-06-08 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16916.
---
Resolution: Fixed

> ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will 
> run forever
> --
>
> Key: KAFKA-16916
> URL: https://issues.apache.org/jira/browse/KAFKA-16916
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Assignee: Apoorv Mittal
>Priority: Blocker
> Fix For: 3.8.0
>
>
> ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will 
> run forever



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Build hanging

2024-06-07 Thread Luke Chen
> Let's disable for now to unblock builds, and revert later if we can't
solve
it until code freeze?

That's exactly what I meant to do.
I've opened KAFKA-16916 <https://issues.apache.org/jira/browse/KAFKA-16916>
for this issue and assigned to you.
Welcome to unassign yourselves if you don't have time to fix the
adminClient behavior change issue.
But, let's disable it first.

Thanks.
Luke


On Sat, Jun 8, 2024 at 8:55 AM Haruki Okada  wrote:

> Hi Luke,
>
> I see, but since this is likely due to AdminClient's behavior change, we
> need to fix it anyways not only disabling test before 3.8 release.
> Let's disable for now to unblock builds, and revert later if we can't solve
> it until code freeze?
>
> 2024年6月8日(土) 9:31 Luke Chen :
>
> > Hi Haruki,
> >
> > Thanks for identifying this blocking test.
> > Could you help quickly open a PR to disable this test to unblock the CI
> > build?
> >
> > Thanks.
> > Luke
> >
> > On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada  wrote:
> >
> > > Hi
> > >
> > > I found that the hanging can be reproduced locally.
> > > The blocking test is
> > >
> > >
> >
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> > > It started to block after this commit (
> > >
> > >
> >
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> > > )
> > >
> > > Thanks,
> > >
> > > 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
> > >
> > > > Seems like the build is currently broken -- specifically, a test is
> > > hanging
> > > > and causing it to abort after 7+ hours. There are many examples in
> the
> > > > current PRs, such as
> > > >
> > > > Timed out after almost 8 hours:
> > > > 1.
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> > > > 2.
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
> > > >
> > > > Still running after 6+ hours:
> > > > 1.
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
> > > >
> > > > It's pretty difficult to tell which test is hanging but it seems like
> > one
> > > > of the commits in the last 1-2 days is the likely culprit. If anyone
> > has
> > > an
> > > > idea of what may have caused this or is actively investigating,
> please
> > > let
> > > > everyone know.
> > > >
> > > > Needless to say, this is rather urgent given the upcoming 3.8 code
> > > freeze.
> > > >
> > > > Thanks,
> > > > Sophie
> > > >
> > >
> > >
> > > --
> > > 
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > 
> > >
> >
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


[jira] [Created] (KAFKA-16916) ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run forever

2024-06-07 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16916:
-

 Summary: 
ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run 
forever
 Key: KAFKA-16916
 URL: https://issues.apache.org/jira/browse/KAFKA-16916
 Project: Kafka
  Issue Type: Bug
Reporter: Luke Chen
Assignee: Haruki Okada


ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run 
forever



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Build hanging

2024-06-07 Thread Luke Chen
Hi Haruki,

Thanks for identifying this blocking test.
Could you help quickly open a PR to disable this test to unblock the CI
build?

Thanks.
Luke

On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada  wrote:

> Hi
>
> I found that the hanging can be reproduced locally.
> The blocking test is
>
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> It started to block after this commit (
>
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> )
>
> Thanks,
>
> 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
>
> > Seems like the build is currently broken -- specifically, a test is
> hanging
> > and causing it to abort after 7+ hours. There are many examples in the
> > current PRs, such as
> >
> > Timed out after almost 8 hours:
> > 1.
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> > 2.
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
> >
> > Still running after 6+ hours:
> > 1.
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
> >
> > It's pretty difficult to tell which test is hanging but it seems like one
> > of the commits in the last 1-2 days is the likely culprit. If anyone has
> an
> > idea of what may have caused this or is actively investigating, please
> let
> > everyone know.
> >
> > Needless to say, this is rather urgent given the upcoming 3.8 code
> freeze.
> >
> > Thanks,
> > Sophie
> >
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


[jira] [Resolved] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-06-05 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16662.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Duplicate

> UnwritableMetadataException: Metadata has been lost
> ---
>
> Key: KAFKA-16662
> URL: https://issues.apache.org/jira/browse/KAFKA-16662
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
> Environment: Docker Image (bitnami/kafka:3.7.0)
> via Docker Compose
>Reporter: Tobias Bohn
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
> Attachments: log.txt
>
>
> Hello,
> First of all: I am new to this Jira and apologize if anything is set or 
> specified incorrectly. Feel free to advise me.
> We currently have an error in our test system, which unfortunately I can't 
> solve, because I couldn't find anything related to it. No solution could be 
> found via the mailing list either.
> The error occurs when we want to start up a node. The node runs using Kraft 
> and is both a controller and a broker. The following error message appears at 
> startup:
> {code:java}
> kafka  | [2024-04-16 06:18:13,707] ERROR Encountered fatal fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> kafka  | org.apache.kafka.image.writer.UnwritableMetadataException: Metadata 
> has been lost because the following could not be represented in metadata 
> version 3.5-IV2: the directory assignment state of one or more replicas
> kafka  |        at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
> kafka  |        at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
> kafka  |        at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
> kafka  |        at 
> org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
> kafka  |        at 
> org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> kafka  |        at java.base/java.lang.Thread.run(Thread.java:840)
> kafka exited with code 0 {code}
> We use Docker to operate the cluster. The error occurred while we were trying 
> to restart a node. All other nodes in the cluster are still running correctly.
> If you need further information, please let us know. The complete log is 
> attached to this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16814) KRaft broker cannot startup when `partition.metadata` is missing

2024-06-04 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16814.
---
Resolution: Fixed

> KRaft broker cannot startup when `partition.metadata` is missing
> 
>
> Key: KAFKA-16814
> URL: https://issues.apache.org/jira/browse/KAFKA-16814
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Assignee: Kuan Po Tseng
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> When starting up kafka logManager, we'll check stray replicas to avoid some 
> corner cases. But this check might cause broker unable to startup if 
> `partition.metadata` is missing because when startup kafka, we load log from 
> file, and the topicId of the log is coming from `partition.metadata` file. 
> So, if `partition.metadata` is missing, the topicId will be None, and the 
> `LogManager#isStrayKraftReplica` will fail with no topicID error.
> The `partition.metadata` missing could be some storage failure, or another 
> possible path is unclean shutdown after topic is created in the replica, but 
> before data is flushed into `partition.metadata` file. This is possible 
> because we do the flush in async way 
> [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/core/src/main/scala/kafka/log/UnifiedLog.scala#L229].
>  
>  
> {code:java}
> ERROR Encountered fatal fault: Error starting LogManager 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> java.lang.RuntimeException: The log dir 
> Log(dir=/tmp/kraft-broker-logs/quickstart-events-0, topic=quickstart-events, 
> partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, 
> logEndOffset=0) does not have a topic ID, which is not allowed when running 
> in KRaft mode.
>     at 
> kafka.log.LogManager$.$anonfun$isStrayKraftReplica$1(LogManager.scala:1609)
>     at scala.Option.getOrElse(Option.scala:201)
>     at kafka.log.LogManager$.isStrayKraftReplica(LogManager.scala:1608)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1(BrokerMetadataPublisher.scala:294)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1$adapted(BrokerMetadataPublisher.scala:294)
>     at kafka.log.LogManager.loadLog(LogManager.scala:359)
>     at kafka.log.LogManager.$anonfun$loadLogs$15(LogManager.scala:493)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>     at java.base/java.lang.Thread.run(Thread.java:1623) {code}
>  
> Because if we don't do the isStrayKraftReplica check, the topicID and the 
> `partition.metadata` will get recovered after getting topic partition update 
> and becoming leader or follower later. I'm proposing we delete the 
> `isStrayKraftReplica` check if topicID is None (see below), instead of 
> throwing exception to terminate the kafka. 
>  
>  
> === update ===
> Checked KAFKA-14616 and KAFKA-15605, our purpose of finding strayReplicas and 
> delete them is because the replica should be deleted, but left in the log 
> dir. So, if we have a replica that doesn't have topicID (due to 
> `partition.metadata` is missing), then we cannot identify if this is a stray 
> replica or not. In this case, we can do:
>  # Delete it
>  # Ignore it
> For (1), the impact is, if this is not a stray replica, and the 
> replication-factor only has 1, then the data might be moved to another 
> "xxx-stray" dir, and the partition becomes empty.
> For (2), the impact is, if this is a stray replica and we didn't delete it, 
> it might cause partition dir is not created as in KAFKA-15605 or KAFKA-14616.
> As the investigation above, this `partition.metadata` missing issue is mostly 
> because the async `partition.metadata` when creating a topic. Later, before 
> any data append into log, we must make sure partition metadata file is 
> written to the log dir 
> [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/core/src/main/scala/kafka/log/UnifiedLog.scala#L772-L774].
>  So, it should be fine if we delete it since the topic should be empty.
> In short, when finding a log without topicID, we should treat it as a stray 
> log and then delete it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16887) document busy metrics value when remoteLogManager throttling

2024-06-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16887:
-

 Summary: document busy metrics value when remoteLogManager 
throttling
 Key: KAFKA-16887
 URL: https://issues.apache.org/jira/browse/KAFKA-16887
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen


Context: https://github.com/apache/kafka/pull/15820#discussion_r1625304008



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16886) KRaft partition reassignment failed after upgrade to 3.7.0

2024-06-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16886:
-

 Summary: KRaft partition reassignment failed after upgrade to 
3.7.0 
 Key: KAFKA-16886
 URL: https://issues.apache.org/jira/browse/KAFKA-16886
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


Before upgrade, the topic image doesn't have dirID for the assignment. After 
upgrade, the assignment has the dirID. So in the 
{{{}ReplicaManager#applyDelta{}}}, we'll have have directoryId changes in 
{{{}localChanges{}}}, which will invoke {{AssignmentEvent}} 
[here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2748].
 With that, we'll get the unexpected {{NOT_LEADER_OR_FOLLOWER}} error.

Reproduce steps:
 # Launch a 3.6.0 controller and a 3.6.0 broker(BrokerA) in Kraft mode;
 # Create a topic with 1 partition;
 # Upgrade Broker A, B, Controllers to 3.7.0
 # Upgrade MV to 3.7: ./bin/kafka-features.sh --bootstrap-server localhost:9092 
upgrade --metadata 3.7
 # reassign the step 2 partition to Broker B

 

The logs in broker B:
[2024-05-31 15:33:25,763] INFO [ReplicaFetcherManager on broker 2] Removed 
fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
[2024-05-31 15:33:25,837] INFO [ReplicaFetcherManager on broker 2] Removed 
fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
[2024-05-31 15:33:25,837] INFO [ReplicaAlterLogDirsManager on broker 2] Removed 
fetcher for partitions Set(t1-0) (kafka.server.ReplicaAlterLogDirsManager)
[2024-05-31 15:33:25,853] INFO Log for partition t1-0 is renamed to 
/tmp/kraft-broker-logs/t1-0.3e6d8bebc1c04f3186ad6cf63145b6fd-delete and is 
scheduled for deletion (kafka.log.LogManager)
[2024-05-31 15:33:26,279] ERROR Controller returned error 
NOT_LEADER_OR_FOLLOWER for assignment of partition 
PartitionData(partitionIndex=0, errorCode=6) into directory 
oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
[2024-05-31 15:33:26,280] WARN Re-queueing assignments: 
[Assignment\{timestampNs=26022187148625, partition=t1:0, 
dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
(org.apache.kafka.server.AssignmentsManager)
[2024-05-31 15:33:26,786] ERROR Controller returned error 
NOT_LEADER_OR_FOLLOWER for assignment of partition 
PartitionData(partitionIndex=0, errorCode=6) into directory 
oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
[2024-05-31 15:33:27,296] WARN Re-queueing assignments: 
[Assignment\{timestampNs=26022187148625, partition=t1:0, 
dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
(org.apache.kafka.server.AssignmentsManager)
...{{}}
 
Logs in controller:
[2024-05-31 15:33:25,727] INFO [QuorumController id=1] Successfully altered 1 
out of 1 partition reassignment(s). 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,727] INFO [QuorumController id=1] Replayed partition 
assignment change PartitionChangeRecord(partitionId=0, 
topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=null, leader=-2, replicas=[6, 2], 
removingReplicas=[2], addingReplicas=[6], leaderRecoveryState=-1, 
directories=[RuDIAGGJrTG2NU6tEOkbHw, AA], 
eligibleLeaderReplicas=null, lastKnownElr=null) for topic t1 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,802] INFO [QuorumController id=1] AlterPartition request 
from node 2 for t1-0 completed the ongoing partition reassignment and triggered 
a leadership change. Returning NEW_LEADER_ELECTED. 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,802] INFO [QuorumController id=1] UNCLEAN partition change 
for t1-0 with topic ID tMiJOQznTLKtOZ8rLqdgqw: replicas: [6, 2] -> [6], 
directories: [RuDIAGGJrTG2NU6tEOkbHw, AA] -> 
[RuDIAGGJrTG2NU6tEOkbHw], isr: [2] -> [6], removingReplicas: [2] -> [], 
addingReplicas: [6] -> [], leader: 2 -> 6, leaderEpoch: 3 -> 4, partitionEpoch: 
5 -> 6 (org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,802] INFO [QuorumController id=1] Replayed partition 
assignment change PartitionChangeRecord(partitionId=0, 
topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=[6], leader=6, replicas=[6], 
removingReplicas=[], addingReplicas=[], leaderRecoveryState=-1, 
directories=[RuDIAGGJrTG2NU6tEOkbHw], eligibleLeaderReplicas=null, 
lastKnownElr=null) for topic t1 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:26,277] WARN [QuorumController id=1] 
AssignReplicasToDirsRequest from broker 2 references non assigned partition 
t1-0 (org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:26,785] WARN [QuorumController id=1] 
AssignReplicasToDirsRequest from broker 2 references non assigned partition 
t1-0 (org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:27,293] WARN [QuorumController id=1] 
AssignReplicasToD

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-05-31 Thread Luke Chen
Hi Justine,

In the KIP-1012 discussion thread
, our
conclusion should be having an "automatic" unclean leader election in
KRaft, even if KIP-966 cannot complete in time.

> We should specify in KIP-1012 that we need to have some way to configure
the system to automatically do unclean leader election. If we run out of
time implementing KIP-966, this could be something quite simple, like
honoring the static unclean.leader.election = true configuration.

I think we still need to include this in v3.8.0, to honor the static
unclean.leader.election = true configuration.

Thanks.
Luke



On Fri, May 31, 2024 at 1:55 AM Justine Olshan 
wrote:

> My understanding is on Kraft, automatic unclean leadership election is
> disabled, but it can be manually triggered.
>
> See this note from Colin on another email thread:
> > We do have the concept of unclean leader election in KRaft, but it has to
> be triggered by the leader election tool currently. We've been talking
> about adding configuration-based unclean leader election as part of the
> KIP-966 work.
>
> Just wanted to add this clarification.
>
> Justine
>
> On Thu, May 30, 2024 at 9:38 AM Calvin Liu 
> wrote:
>
> > Hi Mickael,
> > Part 1 adds the ELR and enables the leader election improvements related
> to
> > ELR. It does not change unclean leader election behavior which I think is
> > hard-coded to be disabled.
> > Part 2 should replace the current unclean leader election with the
> unclean
> > recovery. Colin McCabe will help with part 2 as the Kraft controller
> > expert. Thanks Colin!
> >
> >
> >
> >
> > On Thu, May 30, 2024 at 2:43 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > What's not clear from your reply is whether "KIP-966 Part 1" contains
> > > the ability to perform unclean leader elections with KRaft?
> > > Hopefully we have committers already looking at these. If you need
> > > additional help, please shout (well ping!)
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Thu, May 30, 2024 at 6:05 AM Ismael Juma  wrote:
> > > >
> > > > Sounds good, thanks Josep!
> > > >
> > > > Ismael
> > > >
> > > > On Wed, May 29, 2024 at 7:51 AM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > I think your proposal makes more sense than mine. The end goal is
> to
> > > try to
> > > > > get these 2 KIPs in 3.8.0 if possible. I think we can also achieve
> > > this by
> > > > > not delaying the general feature freeze, but rather by cherry
> picking
> > > the
> > > > > future commits on these features to the 3.8 branch.
> > > > >
> > > > > So I would propose to leave the deadlines as they are and manually
> > > cherry
> > > > > pick the commits related to KIP-853 and KIP-966.
> > > > >
> > > > > Best,
> > > > >
> > > > > On Wed, May 29, 2024 at 3:48 PM Ismael Juma 
> > wrote:
> > > > >
> > > > > > Hi Josep,
> > > > > >
> > > > > > It's generally a bad idea to push these dates because the scope
> > keeps
> > > > > > increasing then. If there are features that need more time and we
> > > believe
> > > > > > they are essential for 3.8 due to its special nature as the last
> > > release
> > > > > > before 4.0, we should allow them to be cherry-picked to the
> release
> > > > > branch
> > > > > > versus delaying the feature freeze and code freeze for
> everything.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, May 29, 2024 at 2:38 AM Josep Prat
> > > 
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Kafka developers,
> > > > > > >
> > > > > > > Given the fact we have a couple of KIPs that are halfway
> through
> > > their
> > > > > > > implementation and it seems it's a matter of days (1 or 2
> weeks)
> > to
> > > > > have
> > > > > > > them completed. What would you think if we delay feature freeze
> > and
> > > > > code
> > > > > > > freeze by 2 weeks? Let me know your thoughts.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > On Tue, May 28, 2024 at 8:47 AM Josep Prat <
> josep.p...@aiven.io>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Kafka developers,
> > > > > > > >
> > > > > > > > This is a reminder about the upcoming deadlines:
> > > > > > > > - Feature freeze is on May 29th
> > > > > > > > - Code freeze is June 12th
> > > > > > > >
> > > > > > > > I'll cut the new branch during morning hours (CEST) on May
> > 30th.
> > > > > > > >
> > > > > > > > Thanks all!
> > > > > > > >
> > > > > > > > On Thu, May 16, 2024 at 8:34 AM Josep Prat <
> > josep.p...@aiven.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi all,
> > > > > > > >>
> > > > > > > >> We are now officially past the KIP freeze deadline. KIPs
> that
> > > are
> > > > > > > >> approved after this point in time shouldn't be adopted in
> the
> > > 3.8.x
> > > > > > > release
> > > > > > > >> (except the 2 already mentioned KIPS 989 and 1028 assuming
> no
> > > vetoes
> > > > > > > occur).
> > > > > > > >>
> > > > > > > >> Reminder of the upcoming 

[jira] [Resolved] (KAFKA-16824) Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports

2024-05-31 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16824.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports
> 
>
> Key: KAFKA-16824
> URL: https://issues.apache.org/jira/browse/KAFKA-16824
> Project: Kafka
>  Issue Type: Bug
>Reporter: José Armando García Sancio
>Assignee: TengYao Chi
>Priority: Major
> Fix For: 3.8.0
>
>
> For example it is not able to detect at least these malformed hosts and ports:
>  # ho(st:9092
>  # host:-92



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16790) Calls to RemoteLogManager are made before it is configured

2024-05-29 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16790.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Fixed

> Calls to RemoteLogManager are made before it is configured
> --
>
> Key: KAFKA-16790
> URL: https://issues.apache.org/jira/browse/KAFKA-16790
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.8.0
>Reporter: Christo Lolov
>Assignee: Muralidhar Basani
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> BrokerMetadataPublisher#onMetadataUpdate calls ReplicaManager#applyDelta (1) 
> which in turn calls RemoteLogManager#onLeadershipChange (2), however, the 
> RemoteLogManager is configured after the BrokerMetadataPublisher starts 
> running (3, 4). This is incorrect, we either need to initialise the 
> RemoteLogManager before we start the BrokerMetadataPublisher or we need to 
> skip calls to onLeadershipChange if the RemoteLogManager is not initialised.
> (1) 
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala#L151]
> (2) 
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2737]
> (3) 
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerServer.scala#L432]
> (4) 
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerServer.scala#L515]
> The way to reproduce the problem is by looking at the following changes
> {code:java}
>  config/kraft/broker.properties                         | 10 ++
>  .../main/java/kafka/log/remote/RemoteLogManager.java   |  8 +++-
>  core/src/main/scala/kafka/server/ReplicaManager.scala  |  6 +-
>  3 files changed, 22 insertions(+), 2 deletions(-)diff --git 
> a/config/kraft/broker.properties b/config/kraft/broker.properties
> index 2d15997f28..39d126cf87 100644
> --- a/config/kraft/broker.properties
> +++ b/config/kraft/broker.properties
> @@ -127,3 +127,13 @@ log.segment.bytes=1073741824
>  # The interval at which log segments are checked to see if they can be 
> deleted according
>  # to the retention policies
>  log.retention.check.interval.ms=30
> +
> +remote.log.storage.system.enable=true
> +remote.log.metadata.manager.listener.name=PLAINTEXT
> +remote.log.storage.manager.class.name=org.apache.kafka.server.log.remote.storage.LocalTieredStorage
> +remote.log.storage.manager.class.path=/home/ec2-user/kafka/storage/build/libs/kafka-storage-3.8.0-SNAPSHOT-test.jar
> +remote.log.storage.manager.impl.prefix=rsm.config.
> +remote.log.metadata.manager.impl.prefix=rlmm.config.
> +rsm.config.dir=/tmp/kafka-remote-storage
> +rlmm.config.remote.log.metadata.topic.replication.factor=1
> +log.retention.check.interval.ms=1000
> diff --git a/core/src/main/java/kafka/log/remote/RemoteLogManager.java 
> b/core/src/main/java/kafka/log/remote/RemoteLogManager.java
> index 6555b7c0cd..e84a072abc 100644
> --- a/core/src/main/java/kafka/log/remote/RemoteLogManager.java
> +++ b/core/src/main/java/kafka/log/remote/RemoteLogManager.java
> @@ -164,6 +164,7 @@ public class RemoteLogManager implements Closeable {
>      // The endpoint for remote log metadata manager to connect to
>      private Optional endpoint = Optional.empty();
>      private boolean closed = false;
> +    private boolean up = false;
>  
>      /**
>       * Creates RemoteLogManager instance with the given arguments.
> @@ -298,6 +299,7 @@ public class RemoteLogManager implements Closeable {
>          // in connecting to the brokers or remote storages.
>          configureRSM();
>          configureRLMM();
> +        up = true;
>      }
>  
>      public RemoteStorageManager storageManager() {
> @@ -329,7 +331,11 @@ public class RemoteLogManager implements Closeable {
>      public void onLeadershipChange(Set partitionsBecomeLeader,
>                                     Set partitionsBecomeFollower,
>                                     Map topicIds) {
> -        LOGGER.debug("Received leadership changes for leaders: {} and 
> followers: {}", partitionsBecomeLeader, partitionsBecomeFollower);
> +        if (!up) {
> +            LOGGER.error("NullPointerException");
> +            return;
> +        }
> +        LOGGER.error("Received leadership changes for leaders: {} and 
> followers: {}", partitionsBecomeLeader, partitionsBecomeFollower);
>  
>          Map leaderPartitionsWithLeaderEpoch =

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-05-29 Thread Luke Chen
Hi Josep,

Thanks for raising this.
I'm +1 for delaying some time to have features completed.

But I think we might need to make it clear, what's the updated feature
freeze date/code freeze date?
Is this correct?
- Feature freeze is on May 12th
- Code freeze is June 26th


Thanks.
Luke

On Wed, May 29, 2024 at 5:38 PM Josep Prat 
wrote:

> Hi Kafka developers,
>
> Given the fact we have a couple of KIPs that are halfway through their
> implementation and it seems it's a matter of days (1 or 2 weeks) to have
> them completed. What would you think if we delay feature freeze and code
> freeze by 2 weeks? Let me know your thoughts.
>
> Best,
>
> On Tue, May 28, 2024 at 8:47 AM Josep Prat  wrote:
>
> > Hi Kafka developers,
> >
> > This is a reminder about the upcoming deadlines:
> > - Feature freeze is on May 29th
> > - Code freeze is June 12th
> >
> > I'll cut the new branch during morning hours (CEST) on May 30th.
> >
> > Thanks all!
> >
> > On Thu, May 16, 2024 at 8:34 AM Josep Prat  wrote:
> >
> >> Hi all,
> >>
> >> We are now officially past the KIP freeze deadline. KIPs that are
> >> approved after this point in time shouldn't be adopted in the 3.8.x
> release
> >> (except the 2 already mentioned KIPS 989 and 1028 assuming no vetoes
> occur).
> >>
> >> Reminder of the upcoming deadlines:
> >> - Feature freeze is on May 29th
> >> - Code freeze is June 12th
> >>
> >> If you have an approved KIP that you know already you won't be able to
> >> complete before the feature freeze deadline, please update the Release
> >> column in the
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >> page.
> >>
> >> Thanks all,
> >>
> >> On Wed, May 15, 2024 at 8:53 PM Josep Prat  wrote:
> >>
> >>> Hi Nick,
> >>>
> >>> If nobody comes up with concerns or pushback until the time of closing
> >>> the vote, I think we can take it for 3.8.
> >>>
> >>> Best,
> >>>
> >>> -
> >>>
> >>> Josep Prat
> >>> Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> >>> +491715557497 | aiven.io
> >>> Aiven Deutschland GmbH
> >>> Alexanderufer 3-7, 10117 Berlin
> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>
> >>> On Wed, May 15, 2024, 20:48 Nick Telford 
> wrote:
> >>>
>  Hi Josep,
> 
>  Would it be possible to sneak KIP-989 into 3.8? Just as with 1028,
> it's
>  currently being voted on and has already received the requisite votes.
>  The
>  only thing holding it back is the 72 hour voting window.
> 
>  Vote thread here:
>  https://lists.apache.org/thread/nhr65h4784z49jbsyt5nx8ys81q90k6s
> 
>  Regards,
> 
>  Nick
> 
>  On Wed, 15 May 2024 at 17:47, Josep Prat  >
>  wrote:
> 
>  > And my maths are wrong! I added 24 hours more to all the numbers in
>  there.
>  > If after 72 hours no vetoes appear, I have no objections on adding
>  this
>  > specific KIP as it shouldn't have a big blast radius of affectation.
>  >
>  > Best,
>  >
>  > On Wed, May 15, 2024 at 6:44 PM Josep Prat 
>  wrote:
>  >
>  > > Ah, I see Chris was faster writing this than me.
>  > >
>  > > On Wed, May 15, 2024 at 6:43 PM Josep Prat 
>  wrote:
>  > >
>  > >> Hi all,
>  > >> You still have the full day of today (independently for the
>  timezone) to
>  > >> get KIPs approved. Tomorrow morning (CEST timezone) I'll send
>  another
>  > email
>  > >> asking developers to assign future approved KIPs to another
> version
>  > that is
>  > >> not 3.8.
>  > >>
>  > >> So, the only problem I see with KIP-1028 is that it hasn't been
>  open for
>  > >> a vote for 72 hours (48 hours as of now). If there is no negative
>  > voting on
>  > >> the KIP I think we can let that one in, given it would only miss
>  the
>  > >> deadline by less than 12 hours (if my timezone maths add up).
>  > >>
>  > >> Best,
>  > >>
>  > >> On Wed, May 15, 2024 at 6:35 PM Ismael Juma 
>  wrote:
>  > >>
>  > >>> The KIP freeze is just about having the KIP accepted. Not sure
>  why we
>  > >>> would
>  > >>> need an exception for that.
>  > >>>
>  > >>> Ismael
>  > >>>
>  > >>> On Wed, May 15, 2024 at 9:20 AM Chris Egerton <
>  fearthecel...@gmail.com
>  > >
>  > >>> wrote:
>  > >>>
>  > >>> > FWIW I think that the low blast radius for KIP-1028 should
>  allow it
>  > to
>  > >>> > proceed without adhering to the usual KIP and feature freeze
>  dates.
>  > >>> Code
>  > >>> > freeze is probably worth still  respecting, at least if
> changes
>  are
>  > >>> > required to the docker/jvm/Dockerfile. But I defer to Josep's
>  > >>> judgement as
>  > >>> > the release manager.
>  > >>> >
>  > >>> > On Wed, May 15, 2024, 06:59 Vedarth Sharma <
>  

[jira] [Resolved] (KAFKA-16399) Add JBOD support in tiered storage

2024-05-29 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16399.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add JBOD support in tiered storage
> --
>
> Key: KAFKA-16399
> URL: https://issues.apache.org/jira/browse/KAFKA-16399
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
> Fix For: 3.8.0
>
>
> Add JBOD support in tiered storage
> Currently, when JBOD is configured, the Tiered Storage feature is forced to 
> be disabled. This Jira is to fix the gap. And why is that important? Because 
> it doesn't make sense that to use Tiered Storage feature, users cannot use 
> JBOD storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Action requested: Changes to CI for JDK 11 & 17 builds on Pull Requests

2024-05-28 Thread Luke Chen
Wow! I've never seen this beautiful green on jenkins for years!
Thanks Greg!!

Luke

On Tue, May 28, 2024 at 4:12 PM Chia-Ping Tsai  wrote:

> Please take a look at following QA. ALL PASS!!!
>
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15889/25/pipeline
>
> I almost cried, and BIG thanks to Harris!!!
>
> On 2024/05/28 03:20:01 Greg Harris wrote:
> > Hello Apache Kafka Developers,
> >
> > In order to better utilize scarce CI resources shared with other Apache
> > projects, the Kafka project will no longer be running full test suites
> for
> > the JDK 11 & 17 components of PR builds.
> >
> > *Action requested: If you have an active pull request, please merge or
> > rebase the latest trunk into your branch* before continuing development
> as
> > normal. You may wait to push the resulting branch until you make another
> > commit, or push the result immediately.
> >
> > What to expect with this change:
> > * Trunk (and release branch) builds will not be affected.
> > * JDK 8 and 21 builds will not be affected.
> > * Compilation will not be affected.
> > * Static analysis (spotbugs, checkstyle, etc) will not be affected.
> > * Overall build execution time should be similar or slightly better than
> > before.
> > * You can expect fewer tests to be run on your PRs (~6 instead of
> > ~12).
> > * Test flakiness should be similar or slightly better than before.
> >
> > And as a reminder, build failures (red indicators in CloudBees) are
> always
> > blockers for merging. Starting now, the 11 and 17 builds should always
> pass
> > (green indicators in CloudBees) before merging, as failed tests (yellow
> > indicators in CloudBees) should no longer be present.
> >
> > Thanks everyone,
> > Greg Harris
> >
>


[jira] [Created] (KAFKA-16848) Reverting KRaft migration for "Migrating brokers to KRaft" state is wrong

2024-05-28 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16848:
-

 Summary: Reverting KRaft migration for "Migrating brokers to 
KRaft" state is wrong
 Key: KAFKA-16848
 URL: https://issues.apache.org/jira/browse/KAFKA-16848
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


Hello,

 

I would like to report a mistake in the {_}Kafka 3.7 Documentation -> 6.10 
KRaft -> ZooKeeper to KRaft Migration -> Reverting to ZooKeeper mode During the 
Migration{_}.

 

While migrating my Kafka + Zookeeper cluster to KRaft and testing rollbacks at 
a different migration stages I have noticed, that "{_}Directions for 
reverting{_}" provided for "{_}Migrating brokers to KRaft{_}" are wrong.



Following the first step provided in documentation you suppose to : _On each 
broker, remove the process.roles configuration, and restore the 
zookeeper.connect configuration to its previous value. If your cluster requires 
other ZooKeeper configurations for brokers, such as zookeeper.ssl.protocol, 
re-add those configurations as well. Then perform a rolling._


In that case, if you remove _process.roles_ configuration and restore  
_zookeeper.connect_ as well as other _ZooKeeper_ configuration (If your cluster 
requires) you will receive an error that looks like this:
[2024-05-28 08:09:49,396] lvl=ERROR Exiting Kafka due to fatal exception 
logger=kafka.Kafka$

java.lang.IllegalArgumentException: requirement failed: 
controller.listener.names must be empty when not running in KRaft mode: 
[CONTROLLER]

    at scala.Predef$.require(Predef.scala:337)

    at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:2441)

    at kafka.server.KafkaConfig.(KafkaConfig.scala:2290)

    at kafka.server.KafkaConfig.(KafkaConfig.scala:1639)

    at kafka.Kafka$.buildServer(Kafka.scala:71)

    at kafka.Kafka$.main(Kafka.scala:90)

    at kafka.Kafka.main(Kafka.scala)

 

However I was able to perform rollback successfully by performing additional 
steps:
 * Restore _zookeeper.metadata.migration.enable=true_ line in broker 
configuration;
 * We are using {_}[authorizer.class.name|http://authorizer.class.name/]{_}, so 
it also had to be reverted: 
_org.apache.kafka.metadata.authorizer.StandardAuthorizer_ -> 
{_}kafka.security.authorizer.AclAuthorizer{_};

 

I believe that should be mentioned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Action requested: Changes to CI for JDK 11 & 17 builds on Pull Requests

2024-05-28 Thread Luke Chen
> I did not see build failure that happens in 11 and 17 but not in 8 or 21,
and also it can save more CI resources and make our CI be thinner.
Same here. I've never seen build passed in jdk 21 but failed in 11 or 17.
But even if it happened, it is rare. I think we are just making a trade-off
to make CI more reliable and faster.

Thanks.
Luke

On Tue, May 28, 2024 at 2:22 PM Chia-Ping Tsai  wrote:

> Dear all,
>
> I do love Harris's patch as no one love slow CI I believe. For another, I
> file https://issues.apache.org/jira/browse/KAFKA-16847 just now to revise
> our readme about JDK. I'd like to raise more discussion here.
>
> > Note that compilation with Java 11/17 doesn't add any value over
> compiling
> > with Java 21 with the appropriate --release config (which we set). So,
> this
> > part of the build process is wasteful.
>
> I did not see build failure that happens in 11 and 17 but not in 8 or 21,
> and also it can save more CI resources and make our CI be thinner. Hence,
> I'm +1 to drop 11 and 17 totally.
>
> Best,
> Chia-Ping
>
>
> On 2024/05/28 04:40:48 Ismael Juma wrote:
> > Hi Greg,
> >
> > Thanks for making this change.
> >
> > Note that compilation with Java 11/17 doesn't add any value over
> compiling
> > with Java 21 with the appropriate --release config (which we set). So,
> this
> > part of the build process is wasteful. Running the tests does add some
> > value (and hence why we originally had it), but the return on investment
> is
> > not good enough given our CI issues (and hence why the change is good).
> >
> > Ismael
> >
> > On Mon, May 27, 2024, 8:20 PM Greg Harris 
> > wrote:
> >
> > > Hello Apache Kafka Developers,
> > >
> > > In order to better utilize scarce CI resources shared with other Apache
> > > projects, the Kafka project will no longer be running full test suites
> for
> > > the JDK 11 & 17 components of PR builds.
> > >
> > > *Action requested: If you have an active pull request, please merge or
> > > rebase the latest trunk into your branch* before continuing
> development as
> > > normal. You may wait to push the resulting branch until you make
> another
> > > commit, or push the result immediately.
> > >
> > > What to expect with this change:
> > > * Trunk (and release branch) builds will not be affected.
> > > * JDK 8 and 21 builds will not be affected.
> > > * Compilation will not be affected.
> > > * Static analysis (spotbugs, checkstyle, etc) will not be affected.
> > > * Overall build execution time should be similar or slightly better
> than
> > > before.
> > > * You can expect fewer tests to be run on your PRs (~6 instead of
> > > ~12).
> > > * Test flakiness should be similar or slightly better than before.
> > >
> > > And as a reminder, build failures (red indicators in CloudBees) are
> always
> > > blockers for merging. Starting now, the 11 and 17 builds should always
> pass
> > > (green indicators in CloudBees) before merging, as failed tests (yellow
> > > indicators in CloudBees) should no longer be present.
> > >
> > > Thanks everyone,
> > > Greg Harris
> > >
> >
>


[jira] [Resolved] (KAFKA-16709) alter logDir within broker might cause log cleanup hanging

2024-05-27 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16709.
---
Resolution: Fixed

> alter logDir within broker might cause log cleanup hanging
> --
>
> Key: KAFKA-16709
> URL: https://issues.apache.org/jira/browse/KAFKA-16709
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>    Assignee: Luke Chen
>Priority: Major
> Fix For: 3.8.0
>
>
> When doing alter replica logDirs, we'll create a future log and pause log 
> cleaning for the partition( 
> [here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L1200]).
>  And this log cleaning pausing will resume after alter replica logDirs 
> completes 
> ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogManager.scala#L1254]).
>  And when in the resuming log cleaning, we'll decrement 1 for the 
> LogCleaningPaused count. Once the count reached 0, the cleaning pause is 
> really resuming. 
> ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L310]).
>  For more explanation about the logCleaningPaused state can check 
> [here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L55].
>  
> But, there's still one factor that could increase the LogCleaningPaused 
> count: leadership change 
> ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L2126]).
>  When there's a leadership change, we'll check if there's a future log in 
> this partition, if so, we'll create future log and pauseCleaning 
> (LogCleaningPaused count + 1). So, if during the alter replica logDirs:
>  # alter replica logDirs for tp0 triggered (LogCleaningPaused count = 1)
>  # tp0 leadership changed (LogCleaningPaused count = 2)
>  # alter replica logDirs completes, resuming logCleaning (LogCleaningPaused 
> count = 1)
>  # LogCleaning keeps paused because the count is always >  0
>  
> The log cleaning is not just related to compacting logs, but also affecting 
> the normal log retention processing, which means, the log retention for these 
> paused partitions will be pending. This issue can be fixed when broker 
> restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Kafka jenkins is unable to run and view old builds.

2024-05-26 Thread Luke Chen
Hi all,

Currently, the Kafka jenkins is unable to run and view old builds.
I've filed INFRA-25824 
to infra team.

Thanks.
Luke


Re: [VOTE] KIP 1047 - Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder

2024-05-24 Thread Luke Chen
+1 (binding)
Thanks Frank!

Luke

On Fri, May 24, 2024 at 5:21 PM Josep Prat 
wrote:

> Hi Frank,
>
> thanks for the KIP.
> +1 (binding)
>
> Best,
>
> On Fri, May 24, 2024 at 11:11 AM Kuan Po Tseng 
> wrote:
>
> > +1 (non-binding)
> >
> > On 2024/05/23 16:26:42 Frank Yang wrote:
> > > Hi all,
> > >
> > > I would like to start a vote on KIP-1047: Introduce new
> > > org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder.
> > >
> > > KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1047+Introduce+new+org.apache.kafka.tools.api.Decoder+to+replace+kafka.serializer.Decoder
> > >
> > > Discussion thread:
> > https://lists.apache.org/thread/n3k6vb4vddl1s5nopcyglnddtvzp4j63
> > >
> > > Thanks and regards,
> > > PoAn
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Re: Inquire about a bug issue

2024-05-23 Thread Luke Chen
Hi Jianbin,

Thanks for asking.
I'll review the PR this week or next week.
Let's target this bug fix for v3.7.1 and v3.8.0.

Thanks.
Luke

On Fri, May 24, 2024 at 11:20 AM Jianbin Chen  wrote:

> I would like to inquire if anyone is paying attention to this issue
> https://issues.apache.org/jira/browse/KAFKA-16583. When the broker
> allocates partitions and then restarts, there is a chance that this problem
> will occur, causing the broker to fail to start. This is a bug that greatly
> affects the stability of production services. Why has it not been dealt
> with after more than a month? I believe it is necessary to include it in
> version 3.7.1 and release it as soon as possible to prevent more users from
> being affected.
>
> Jianbin Chen, githubId: funky-eyes
>


[jira] [Created] (KAFKA-16828) RackAwareTaskAssignorTest failed

2024-05-23 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16828:
-

 Summary: RackAwareTaskAssignorTest failed
 Key: KAFKA-16828
 URL: https://issues.apache.org/jira/browse/KAFKA-16828
 Project: Kafka
  Issue Type: Test
Reporter: Luke Chen


Found in the latest trunk build.

It fails many tests in `RackAwareTaskAssignorTest` suite.

 

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15951/7/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP 1047 - Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder

2024-05-23 Thread Luke Chen
LGTM!
Thanks for raising this improvement.

Luke

On Thu, May 23, 2024 at 12:52 AM Chia-Ping Tsai  wrote:

> Thanks for Josep's response
>
> > We can add this to 3.8.0, but keep in mind the KIP is not voted yet (as
> far
> as I can see), so I would highly encourage to start the vote thread ASAP
> and strat with the implementation right after.
>
> sure. We will file a draft PR at the same time!
>
> Josep Prat  於 2024年5月23日 週四 上午12:31寫道:
>
> > Hi all,
> >
> > We can add this to 3.8.0, but keep in mind the KIP is not voted yet (as
> far
> > as I can see), so I would highly encourage to start the vote thread ASAP
> > and strat with the implementation right after.
> >
> > Best,
> >
> > -
> > Josep Prat
> > Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> > +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > On Wed, May 22, 2024, 17:06 Chia-Ping Tsai  wrote:
> >
> > > > One issue I also noted is that some of the existing Decoder
> > > implementations (StringDecoder for example) can accept configurations
> > > but currently DumpLogSegments does not provide a way to pass any
> > > configurations, it creates an empty VerifiableProperties object each
> > > time it instantiates a Decoder instance. If we were to use
> > > Deserializer we would also need a way to provide configurations.
> > >
> > > BTW, if the known bug gets fixed, we have to make new interface extend
> > > `configurable`.
> > >
> > > Or we can just ignore the known issue as `DumpLogSegments` has no
> options
> > > to take custom configs for `Decoder`. That allow the `Decoder` more
> > simple
> > >
> > >
> > > Chia-Ping Tsai  於 2024年5月22日 週三 下午10:58寫道:
> > >
> > > >
> > > > Thanks for Mickael response!
> > > >
> > > > >I'm wondering whether we need to introduce a new Decoder interface
> and
> > > > instead if we could reuse Deserializer. We could deprecate the
> > > > key-decoder-class and value-decoder-class flags and introduce new
> > > > flags like key-deserializer-class and value-deserializer-class. One
> > > > benefit is that we already have many existing deserializer
> > > > implementations. WDYT?
> > > >
> > > > I prefer to use different interface, since using the same interface
> > > > (Deserializer) may obstruct us from enhancing the interface used by
> > > > DumpLogSegments only in the future.
> > > >
> > > > > One issue I also noted is that some of the existing Decoder
> > > > implementations (StringDecoder for example) can accept configurations
> > > > but currently DumpLogSegments does not provide a way to pass any
> > > > configurations, it creates an empty VerifiableProperties object each
> > > > time it instantiates a Decoder instance. If we were to use
> > > > Deserializer we would also need a way to provide configurations.
> > > >
> > > > yep, that is a known issue:
> > > > https://issues.apache.org/jira/browse/KAFKA-12311
> > > >
> > > > We will file PR to fix it
> > > >
> > > > Mickael Maison  於 2024年5月22日 週三 下午10:51寫道:
> > > >
> > > >> Hi,
> > > >>
> > > >> Thanks for the KIP. Sorting this out in 3.8.0 would be really nice
> as
> > > >> it would allow us to migrate this tool in 4.0.0. We're unfortunately
> > > >> past the KIP deadline but maybe this is small enough to have an
> > > >> exception.
> > > >>
> > > >> I'm wondering whether we need to introduce a new Decoder interface
> and
> > > >> instead if we could reuse Deserializer. We could deprecate the
> > > >> key-decoder-class and value-decoder-class flags and introduce new
> > > >> flags like key-deserializer-class and value-deserializer-class. One
> > > >> benefit is that we already have many existing deserializer
> > > >> implementations. WDYT?
> > > >>
> > > >> One issue I also noted is that some of the existing Decoder
> > > >> implementations (StringDecoder for example) can accept
> configurations
> > > >> but currently DumpLogSegments does not provide a way to pass any
> > > >> configurations, it creates an empty VerifiableProperties object each
> > > >> time it instantiates a Decoder instance. If we were to use
> > > >> Deserializer we would also need a way to provide configurations.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >> On Wed, May 22, 2024 at 4:12 PM Chia-Ping Tsai  >
> > > >> wrote:
> > > >> >
> > > >> > Dear all,
> > > >> >
> > > >> > We know that  3.8.0 KIP is already frozen, but this is a small KIP
> > and
> > > >> we need to ship it to 3.8.0 so as to remove the deprecated scala
> > > interface
> > > >> from 4.0.
> > > >> >
> > > >> > Best,
> > > >> > Chia-Ping
> > > >> >
> > > >> > On 2024/05/22 14:05:16 Frank Yang wrote:
> > > >> > > Hi team,
> > > >> > >
> > > >> > > Chia-Ping Tsai and I would like to propose KIP-1047 to migrate
> > > >> kafka.serializer.Decoder from core module (scala) to tools module
> > > (java).
> > > >> > >
> > > >> > > Feedback and comments 

[jira] [Created] (KAFKA-16814) KRaft broker cannot startup when `partition.metadata` is missing

2024-05-22 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16814:
-

 Summary: KRaft broker cannot startup when `partition.metadata` is 
missing
 Key: KAFKA-16814
 URL: https://issues.apache.org/jira/browse/KAFKA-16814
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


When starting up kafka logManager, we'll check stray replicas to avoid some 
corner cases. But this check might cause broker unable to startup if 
`partition.metadata` is missing because when startup kafka, we load log from 
file, and the topicId of the log is coming from `partition.metadata` file. So, 
if `partition.metadata` is missing, the topicId will be None, and the 
`LogManager#isStrayKraftReplica` will fail with no topicID error.

The `partition.metadata` missing could be some storage failure, or another 
possible path is unclean shutdown after topic is created in the replica, but 
before data is flushed into `partition.metadata` file. This is possible because 
we do the flush, it's done async 
[here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/core/src/main/scala/kafka/log/UnifiedLog.scala#L229].

 

 
{code:java}
ERROR Encountered fatal fault: Error starting LogManager 
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
java.lang.RuntimeException: The log dir 
Log(dir=/tmp/kraft-broker-logs/quickstart-events-0, topic=quickstart-events, 
partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, 
logEndOffset=0) does not have a topic ID, which is not allowed when running in 
KRaft mode.
    at 
kafka.log.LogManager$.$anonfun$isStrayKraftReplica$1(LogManager.scala:1609)
    at scala.Option.getOrElse(Option.scala:201)
    at kafka.log.LogManager$.isStrayKraftReplica(LogManager.scala:1608)
    at 
kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1(BrokerMetadataPublisher.scala:294)
    at 
kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1$adapted(BrokerMetadataPublisher.scala:294)
    at kafka.log.LogManager.loadLog(LogManager.scala:359)
    at kafka.log.LogManager.$anonfun$loadLogs$15(LogManager.scala:493)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1623) {code}
 

Because if we don't do the isStrayKraftReplica check, the topicID and the 
`partition.metadata` will get recovered after getting topic partition update 
and becoming leader or follower later. I'm proposing we skip the 
`isStrayKraftReplica` check if topicID is None, instead of throwing exception 
to terminate the kafka. `isStrayKraftReplica` check is just for a corner case 
only, it should be fine IMO.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16783) Migrate RemoteLogMetadataManagerTest to new test infra

2024-05-21 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16783.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Migrate RemoteLogMetadataManagerTest to new test infra
> --
>
> Key: KAFKA-16783
> URL: https://issues.apache.org/jira/browse/KAFKA-16783
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: storage_test
> Fix For: 3.8.0
>
>
> as title
> `TopicBasedRemoteLogMetadataManagerWrapperWithHarness` could be replaced by 
> `RemoteLogMetadataManagerTestUtils#builder`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16760) alterReplicaLogDirs failed even if responded with none error

2024-05-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16760.
---
Resolution: Not A Problem

> alterReplicaLogDirs failed even if responded with none error
> 
>
> Key: KAFKA-16760
> URL: https://issues.apache.org/jira/browse/KAFKA-16760
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Priority: Major
>
> When firing alterLogDirRequest, it gets error NONE result. But actually, the 
> alterLogDir never happened with these errors:
> {code:java}
> [2024-05-14 16:48:50,796] INFO [ReplicaAlterLogDirsThread-1]: Partition 
> topicB-0 has an older epoch (0) than the current leader. Will await the new 
> LeaderAndIsr state before resuming fetching. 
> (kafka.server.ReplicaAlterLogDirsThread:66)
> [2024-05-14 16:48:50,796] WARN [ReplicaAlterLogDirsThread-1]: Partition 
> topicB-0 marked as failed (kafka.server.ReplicaAlterLogDirsThread:70)
> {code}
> Note: It's under KRaft mode. So the log with LeaderAndIsr is wrong. 
> This can be reproduced in this 
> [branch|https://github.com/showuon/kafka/tree/alterLogDirTest] and running 
> this test:
> {code:java}
> ./gradlew cleanTest storage:test --tests 
> org.apache.kafka.tiered.storage.integration.AlterLogDirTest
> {code}
> The complete logs can be found here: 
> https://gist.github.com/showuon/b16cdb05a125a7c445cc6e377a2b7923



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-16 Thread Luke Chen
Thanks Chia-Ping!
Since ZK is going to be removed, I agree the KRaft part has higher priority.
But if Christo or the community contributor has spare time, it's good to
have ZK part, too!

Thanks.
Luke

On Thu, May 16, 2024 at 5:45 PM Chia-Ping Tsai  wrote:

> +1 but I prefer to ship it to KRaft only.
>
> I do concern that community have enough time to accept more feature in 3.8
> :(
>
> Best,
> Chia-Ping
>
> On 2024/05/14 15:20:50 Christo Lolov wrote:
> > Heya!
> >
> > I would like to start a vote on KIP-950: Tiered Storage Disablement in
> > order to catch the last Kafka release targeting Zookeeper -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> >
> > Best,
> > Christo
> >
>


[jira] [Created] (KAFKA-16781) Expose advertised.listeners in controller node

2024-05-16 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16781:
-

 Summary: Expose advertised.listeners in controller node
 Key: KAFKA-16781
 URL: https://issues.apache.org/jira/browse/KAFKA-16781
 Project: Kafka
  Issue Type: Improvement
Reporter: Luke Chen


After 
[KIP-919|https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum+and+add+Controller+Registration],
 we allow clients to talk to the KRaft controller node directly. But unlike 
broker node, we don't allow users to config advertised.listeners for clients to 
connect to. Without this config, the client cannot connect to the controller 
node if the controller is sitting behind NAT network while the client is in the 
external network.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.7.1 release

2024-05-15 Thread Luke Chen
Hi Igor,

Thanks for volunteering!
+1

Luke

On Wed, May 15, 2024 at 11:15 PM Mickael Maison 
wrote:

> Hi Igor,
>
> Thanks for volunteering, +1
>
> Mickael
>
> On Thu, Apr 25, 2024 at 11:09 AM Igor Soarez  wrote:
> >
> > Hi everyone,
> >
> > I'd like to volunteer to be the release manager for a 3.7.1 release.
> >
> > Please keep in mind, this would be my first release, so I might have
> some questions,
> > and it might also take me a bit longer to work through the release
> process.
> > So I'm thinking a good target would be toward the end of May.
> >
> > Please let me know your thoughts and if there are any objections or
> concerns.
> >
> > Thanks,
> >
> > --
> > Igor
>


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-15 Thread Luke Chen
Hi Christo,

In addition to the minor comments left in the discussion thread, it LGTM.
+1 from me.

Thank you.
Luke


On Tue, May 14, 2024 at 11:21 PM Christo Lolov 
wrote:

> Heya!
>
> I would like to start a vote on KIP-950: Tiered Storage Disablement in
> order to catch the last Kafka release targeting Zookeeper -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
>
> Best,
> Christo
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-15 Thread Luke Chen
Hi Christo,

Thanks for the explanation.
I think it would be good if you could add that into the KIP.

Otherwise, LGTM.

Thank you.
Luke

On Mon, May 13, 2024 at 11:55 PM Christo Lolov 
wrote:

> Heya!
>
> re Kamal - Okay, I believe I understand what you mean and I agree. I have
> made the following change
>
> ```
>
> During tiered storage disablement, when RemoteLogManager#stopPartition() is
> called:
>
>- Tasks scheduled for the topic-partitions in the
>RemoteStorageCopierThreadPool will be canceled.
>- If the disablement policy is retain, scheduled tasks for the
>topic-partitions in the RemoteDataExpirationThreadPool will remain
>unchanged.
>- If the disablement policy is delete, we will first advance the log
>start offset and we will let tasks scheduled for the topic-partitions in
>the RemoteDataExpirationThreadPool to successfully delete all remote
>segments before the log start offset and then unregister themselves.
>
> ```
>
> re Luke - I checked once again. As far as I understand when a broker goes
> down all replicas it hosts go to OfflineReplica state in the state machine
> the controller maintains. The moment the broker comes back up again the
> state machine resends StopReplica based on
> ```
>
> * OfflineReplica -> ReplicaDeletionStarted
> * --send StopReplicaRequest to the replica (with deletion)
>
> ```
> from ReplicaStateMachine.scala. So I was wrong and you are right, we do not
> appear to be sending constant requests today. I believe it is safe for us
> to follow a similar approach i.e. if a replica comes online again we resend
> the StopReplica.
>
> If you don't notice any more problems I will aim to start a VOTE tomorrow
> so we can get at least part of this KIP in 3.8.
>
> Best,
> Christo
>
> On Fri, 10 May 2024 at 11:11, Luke Chen  wrote:
>
> > Hi Christo,
> >
> > > 1. I am not certain I follow the question. From DISABLED you can only
> go
> > to
> > ENABLED regardless of whether your cluster is backed by Zookeeper or
> KRaft.
> > Am I misunderstanding your point?
> >
> > Yes, you're right.
> >
> > > 4. I was thinking that if there is a mismatch we will just fail
> accepting
> > the request for disablement. This should be the same in both Zookeeper
> and
> > KRaft. Or am I misunderstanding your question?
> >
> > OK, sounds good.
> >
> > > 6. I think my current train of thought is that there will be unlimited
> > retries until all brokers respond in a similar way to how deletion of a
> > topic works today in ZK. In the meantime the state will continue to be
> > DISABLING. Do you have a better suggestion?
> >
> > I don't think infinite retries is a good idea since if a broker is down
> > forever, this request will never complete.
> > You mentioned the existing topic deletion is using the similar pattern,
> how
> > does it handle this issue?
> >
> > Thanks.
> > Luke
> >
> > On Thu, May 9, 2024 at 9:21 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > re: Luke
> > >
> > > 1. I am not certain I follow the question. From DISABLED you can only
> go
> > to
> > > ENABLED regardless of whether your cluster is backed by Zookeeper or
> > KRaft.
> > > Am I misunderstanding your point?
> > >
> > > 2. Apologies, this was a leftover from previous versions. I have
> updated
> > > the Zookeeper section. The steps ought to be: controller receives
> change,
> > > commits necessary data to Zookeeper, enqueues disablement and starts
> > > sending StopReplicas request to brokers; brokers receive StopReplicas
> and
> > > propagate them all the way to RemoteLogManager#stopPartitions which
> takes
> > > care of the rest.
> > >
> > > 3. Correct, it should say DISABLED - this should now be corrected.
> > >
> > > 4. I was thinking that if there is a mismatch we will just fail
> accepting
> > > the request for disablement. This should be the same in both Zookeeper
> > and
> > > KRaft. Or am I misunderstanding your question?
> > >
> > > 5. Yeah. I am now doing a second pass on all diagrams and will update
> > them
> > > by the end of the day!
> > >
> > > 6. I think my current train of thought is that there will be unlimited
> > > retries until all brokers respond in a similar way to how deletion of a
> > > topic works today in ZK. In the meantime the state will continue to be
> > > DISABLING. Do you have a better suggestion?
>

[jira] [Created] (KAFKA-16760) alterReplicaLogDirs failed even if responded with none error

2024-05-14 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16760:
-

 Summary: alterReplicaLogDirs failed even if responded with none 
error
 Key: KAFKA-16760
 URL: https://issues.apache.org/jira/browse/KAFKA-16760
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


When firing alterLogDirRequest, it gets error NONE result. But actually, the 
alterLogDir never happened with these errors:
{code:java}
[2024-05-14 16:48:50,796] INFO [ReplicaAlterLogDirsThread-1]: Partition 
topicB-0 has an older epoch (0) than the current leader. Will await the new 
LeaderAndIsr state before resuming fetching. 
(kafka.server.ReplicaAlterLogDirsThread:66)
[2024-05-14 16:48:50,796] WARN [ReplicaAlterLogDirsThread-1]: Partition 
topicB-0 marked as failed (kafka.server.ReplicaAlterLogDirsThread:70)
{code}
Note: It's under KRaft mode. This can be reproduced in this 
[branch|https://github.com/showuon/kafka/tree/alterLogDirTest] and running this 
test:


{code:java}
./gradlew cleanTest storage:test --tests 
org.apache.kafka.tiered.storage.integration.AlterLogDirTest
{code}

The complete logs can be found here: 
https://gist.github.com/showuon/b16cdb05a125a7c445cc6e377a2b7923



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16711) highestOffsetInRemoteStorage is not updated after logDir altering within broker

2024-05-13 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16711:
-

 Summary: highestOffsetInRemoteStorage is not updated after logDir 
altering within broker
 Key: KAFKA-16711
 URL: https://issues.apache.org/jira/browse/KAFKA-16711
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 3.7.0
Reporter: Luke Chen
Assignee: Luke Chen


We use topicIdPartition as the key for each RLM task. It will cause 
highestOffsetInRemoteStorage in log not updated after logDir alter completion. 
The reproducing flow is like this:

 
 # tp-0 locating in dirA is the leader of the partition
 # tp-0 is altering logDir to dirB
 # tp-0 is copying segments to remote storage (note: the log in dirA)
 # The logDir altering for tp-0 is completed
 # remoteLogManager#onLeadershipChange is invoked, copiedOffsetOption is reset 
to Optional.empty()
 # The copying segments to remote storage in step 3 for tp-0 is completed, 
updating copiedOffsetOption to new offset, as well as the 
log#highestOffsetInRemoteStorage. (note: the log in dirA)
 # In the next run of RLMTask, the log will be the one in target dir (dirB), 
and the log#highestOffsetInRemoteStorage (dirB) will be the default value (-1), 
which will block the log cleanup operation.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16709) move logDir within broker might cause log cleanup hanging

2024-05-12 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16709:
-

 Summary: move logDir within broker might cause log cleanup hanging
 Key: KAFKA-16709
 URL: https://issues.apache.org/jira/browse/KAFKA-16709
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen
Assignee: Luke Chen


When doing alter replica logDirs, we'll create a future log and pause log 
cleaning for the partition( 
[here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L1200]).
 And this log cleaning pausing will resume after alter replica logDirs 
completes 
([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogManager.scala#L1254]).
 And when in the resuming log cleaning, we'll decrement 1 for the 
LogCleaningPaused count. Once the count reached 0, the cleaning pause is really 
resuming. 
([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L310]).
 For more explanation about the logCleaningPaused state can check 
[here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L55].

 

But, there's still one factor that could increase the LogCleaningPaused count: 
leadership change 
([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L2126]).
 When there's a leadership change, we'll check if there's a future log in this 
partition, if so, we'll create future log and pauseCleaning (LogCleaningPaused 
count + 1). So, if during the alter replica logDirs:
 # alter replica logDirs for tp0 triggered (LogCleaningPaused count = 1)
 # tp0 leadership changed (LogCleaningPaused count = 2)
 # alter replica logDirs completes, resuming logCleaning (LogCleaningPaused 
count = 1)
 # LogCleaning keeps paused because the count is always >  0

 

The log cleaning is not just related to compacting logs, but also affecting the 
normal log retention processing, which means, the log retention for these 
paused partitions will be pending. This issue can be fixed when broker 
restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-10 Thread Luke Chen
e can be
> bound
> > > to happen when the number of remote log segments to
> > > delete is huge.
> > >
> > >
> > > On Mon, May 6, 2024, 18:12 Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > >> Hi Christo,
> > >>
> > >> Thanks for the update!
> > >>
> > >> 1. In the ZK mode, how will the transition from DISABLING to DISABLED
> > >> state happen?
> > >> For the "retain" policy, until we delete all the remote-log segments,
> > the
> > >> state will be
> > >> DISABLING and the deletion can happen only when they breach either the
> > >> retention
> > >> time (or) size.
> > >>
> > >> How does the controller monitor that all the remote log segments are
> > >> deleted for all
> > >> the partitions of the topic before transitioning the state to
> DISABLED?
> > >>
> > >> 2. In Kraft, we have only ENABLED -> DISABLED state. How are we
> > >> supporting the case
> > >> "retain" -> "enable"?
> > >>
> > >> If the remote storage is degraded, we want to avoid uploading the
> > >> segments temporarily
> > >> and resume back once the remote storage is healthy. Is the case
> > supported?
> > >>
> > >>
> > >>
> > >> On Fri, May 3, 2024 at 12:12 PM Luke Chen  wrote:
> > >>
> > >>> Also, I think using `stopReplicas` request is a good idea because it
> > >>> won't cause any problems while migrating to KRaft mode.
> > >>> The stopReplicas request is one of the request that KRaft controller
> > >>> will send to ZK brokers during migration.
> > >>>
> > >>> Thanks.
> > >>> Luke
> > >>>
> > >>> On Fri, May 3, 2024 at 11:48 AM Luke Chen  wrote:
> > >>>
> > >>>> Hi Christo,
> > >>>>
> > >>>> Thanks for the update.
> > >>>>
> > >>>> Questions:
> > >>>> 1. For this
> > >>>> "The possible state transition from DISABLED state is to the
> ENABLED."
> > >>>> I think it only applies for KRaft mode. In ZK mode, the possible
> state
> > >>>> is "DISABLING", right?
> > >>>>
> > >>>> 2. For this:
> > >>>> "If the cluster is using Zookeeper as the control plane, enabling
> > >>>> remote storage for a topic triggers the controller to send this
> > information
> > >>>> to Zookeeper. Each broker listens for changes in Zookeeper, and
> when a
> > >>>> change is detected, the broker triggers
> > >>>> RemoteLogManager#onLeadershipChange()."
> > >>>>
> > >>>> I think the way ZK brokers knows the leadership change is by getting
> > >>>> the LeaderAndISRRequeset from the controller, not listening for
> > changes in
> > >>>> ZK.
> > >>>>
> > >>>> 3. In the KRaft handler steps, you said:
> > >>>> "The controller also updates the Topic metadata to increment the
> > >>>> tiered_epoch and update the tiered_stateto DISABLING state."
> > >>>>
> > >>>> Should it be "DISABLED" state since it's KRaft mode?
> > >>>>
> > >>>> 4. I was thinking how we handle the tiered_epoch not match error.
> > >>>> For ZK, I think the controller won't write any data into ZK Znode,
> > >>>> For KRaft, either configRecord or updateTopicMetadata records won't
> be
> > >>>> written.
> > >>>> Is that right? Because the current workflow makes me think there
> will
> > >>>> be partial data updated in ZK/KRaft when tiered_epoch error.
> > >>>>
> > >>>> 5. Since we changed to use stopReplicas (V5) request now, the
> diagram
> > >>>> for ZK workflow might also need to update.
> > >>>>
> > >>>> 6. In ZK mode, what will the controller do if the "stopReplicas"
> > >>>> responses not received from all brokers? Reverting the changes?
> > >>>> This won't happen in KRaft mode because it's broker's responsibility
> > to
> > >>>> fetch metadata update from c

Re: [VOTE] KIP-1018: Introduce max remote fetch timeout config

2024-05-09 Thread Luke Chen
Hi Kamal,

Thanks for the KIP!
+1 from me.

Thanks.
Luke

On Mon, May 6, 2024 at 5:03 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi all,
>
> We would like to start a voting thread for KIP-1018: Introduce
> max remote fetch timeout config for DelayedRemoteFetch requests.
>
> The KIP is available on
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
>
> If you have any suggestions, feel free to participate in the discussion
> thread:
> https://lists.apache.org/thread/9x21hzpxzmrt7xo4vozl17d70fkg3chk
>
> --
> Kamal
>


Re: request permissions to contribute to Kafka

2024-05-06 Thread Luke Chen
Hi Zhisheng,

I've granted your permission.

Thank you.
Luke

On Tue, May 7, 2024 at 10:25 AM Zhisheng Zhang <31791909...@gmail.com>
wrote:

> Hi
>
> I'd like to request permissions to contribute to Kafka to propose a KIP
>
> Wiki ID:zhangzhisheng
> Jira ID:zhangzhisheng
>
> Thank you
>


Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-05-03 Thread Luke Chen
Hi Kamal,

Thanks for the KIP!
Sorry for the late review.

Overall LGTM! Just 1 question:

If one fetch request contains 2 partitions: [p1, p2]
fetch.max.wait.ms: 500, remote.fetch.max.wait.ms: 1000

And now, p1 fetch offset is the log end offset and has no new data coming,
and p2 fetch offset is to fetch from remote storage.
And suppose the fetch from remote storage takes 1000ms.
So, question:
Will this fetch request return in 500ms or 1000ms?
And what will be returned?

I think before this change, it'll return within 500ms, right?
But it's not clear what behavior it will be after this KIP.

Thank you.
Luke


On Fri, May 3, 2024 at 1:56 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Christo,
>
> We have localTimeMs, remoteTimeMs, and totalTimeMs as part of the
> FetchConsumer request metric.
>
>
> kafka.network:type=RequestMetrics,name={LocalTimeMs|RemoteTimeMs|TotalTimeMs},request={Produce|FetchConsumer|FetchFollower}
>
> RemoteTimeMs refers to the amount of time spent in the purgatory for normal
> fetch requests
> and amount of time spent in reading the remote data for remote-fetch
> requests. Do we want
> to have a separate `TieredStorageTimeMs` to capture the time spent in
> remote-read requests?
>
> With per-broker level timer metrics combined with the request level
> metrics, the user will have
> sufficient information.
>
> Metric name =
>
> kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs
>
> --
> Kamal
>
> On Mon, Apr 29, 2024 at 1:38 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > Is it difficult to instead add the metric at
> > kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other
> > name=*)? Alternatively, if it is difficult to add it there, is it
> possible
> > to add 2 metrics, one at the RequestMetrics level (even if it is
> > total-time-ms - (all other times)) and one at what you are proposing? As
> an
> > operator I would find it strange to not see the metric in the
> > RequestMetrics.
> >
> > Your thoughts?
> >
> > Best,
> > Christo
> >
> > On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Christo,
> > >
> > > Updated the KIP with the remote fetch latency metric. Please take
> another
> > > look!
> > >
> > > --
> > > Kamal
> > >
> > > On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Federico,
> > > >
> > > > Thanks for the suggestion! Updated the config name to "
> > > > remote.fetch.max.wait.ms".
> > > >
> > > > Christo,
> > > >
> > > > Good point. We don't have the remote-read latency metrics to measure
> > the
> > > > performance of the remote read requests. I'll update the KIP to emit
> > this
> > > > metric.
> > > >
> > > > --
> > > > Kamal
> > > >
> > > >
> > > > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri <
> fedeval...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi Kamal, it looks like all TS configurations starts with "remote."
> > > >> prefix, so I was wondering if we should name it
> > > >> "remote.fetch.max.wait.ms".
> > > >>
> > > >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash
> > > >>  wrote:
> > > >> >
> > > >> > Hi all,
> > > >> >
> > > >> > If there are no more comments, I'll start a vote thread by
> tomorrow.
> > > >> > Please review the KIP.
> > > >> >
> > > >> > Thanks,
> > > >> > Kamal
> > > >> >
> > > >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash <
> > > >> > kamal.chandraprak...@gmail.com> wrote:
> > > >> >
> > > >> > > Hi all,
> > > >> > >
> > > >> > > Bumping the thread. Please review this KIP. Thanks!
> > > >> > >
> > > >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash <
> > > >> > > kamal.chandraprak...@gmail.com> wrote:
> > > >> > >
> > > >> > >> Hi Jorge,
> > > >> > >>
> > > >> > >> Thanks for the review! Added your suggestions to the KIP. PTAL.
> > > >> > >>
> > > >> > >> The `fetch.max.wait.ms` config will be also applicable for
> > topics
> > > >> > >> enabled with remote storage.
> > > >> > >> Updated the description to:
> > > >> > >>
> > > >> > >> ```
> > > >> > >> The maximum amount of time the server will block before
> answering
> > > the
> > > >> > >> fetch request
> > > >> > >> when it is reading near to the tail of the partition
> > > >> (high-watermark) and
> > > >> > >> there isn't
> > > >> > >> sufficient data to immediately satisfy the requirement given by
> > > >> > >> fetch.min.bytes.
> > > >> > >> ```
> > > >> > >>
> > > >> > >> --
> > > >> > >> Kamal
> > > >> > >>
> > > >> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate Otoya <
> > > >> > >> quilcate.jo...@gmail.com> wrote:
> > > >> > >>
> > > >> > >>> Hi Kamal,
> > > >> > >>>
> > > >> > >>> Thanks for this KIP! It should help to solve one of the main
> > > issues
> > > >> with
> > > >> > >>> tiered storage at the moment that is dealing with individual
> > > >> consumer
> > > >> > >>> configurations to avoid flooding logs with interrupted
> > exceptions.
> > 

[jira] [Created] (KAFKA-16661) add a lower `log.initial.task.delay.ms` value to integration test framework

2024-05-03 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16661:
-

 Summary: add a lower `log.initial.task.delay.ms` value to 
integration test framework
 Key: KAFKA-16661
 URL: https://issues.apache.org/jira/browse/KAFKA-16661
 Project: Kafka
  Issue Type: Test
Reporter: Luke Chen


After KAFKA-16552, we created an internal config `log.initial.task.delay.ms` to 
control the initial task delay in log manager. This ticket follows it up, to 
set a default low value (100ms, 500ms maybe?) for it, to speed up the tests.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-03 Thread Luke Chen
Also, I think using `stopReplicas` request is a good idea because it won't
cause any problems while migrating to KRaft mode.
The stopReplicas request is one of the request that KRaft controller will
send to ZK brokers during migration.

Thanks.
Luke

On Fri, May 3, 2024 at 11:48 AM Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the update.
>
> Questions:
> 1. For this
> "The possible state transition from DISABLED state is to the ENABLED."
> I think it only applies for KRaft mode. In ZK mode, the possible state is
> "DISABLING", right?
>
> 2. For this:
> "If the cluster is using Zookeeper as the control plane, enabling remote
> storage for a topic triggers the controller to send this information to
> Zookeeper. Each broker listens for changes in Zookeeper, and when a change
> is detected, the broker triggers RemoteLogManager#onLeadershipChange()."
>
> I think the way ZK brokers knows the leadership change is by getting the
> LeaderAndISRRequeset from the controller, not listening for changes in ZK.
>
> 3. In the KRaft handler steps, you said:
> "The controller also updates the Topic metadata to increment the
> tiered_epoch and update the tiered_stateto DISABLING state."
>
> Should it be "DISABLED" state since it's KRaft mode?
>
> 4. I was thinking how we handle the tiered_epoch not match error.
> For ZK, I think the controller won't write any data into ZK Znode,
> For KRaft, either configRecord or updateTopicMetadata records won't be
> written.
> Is that right? Because the current workflow makes me think there will be
> partial data updated in ZK/KRaft when tiered_epoch error.
>
> 5. Since we changed to use stopReplicas (V5) request now, the diagram for
> ZK workflow might also need to update.
>
> 6. In ZK mode, what will the controller do if the "stopReplicas" responses
> not received from all brokers? Reverting the changes?
> This won't happen in KRaft mode because it's broker's responsibility to
> fetch metadata update from controller.
>
>
> Thank you.
> Luke
>
>
> On Fri, Apr 19, 2024 at 10:23 PM Christo Lolov 
> wrote:
>
>> Heya all!
>>
>> I have updated KIP-950. A list of what I have updated is:
>>
>> * Explicitly state that Zookeeper-backed clusters will have ENABLED ->
>> DISABLING -> DISABLED while KRaft-backed clusters will only have ENABLED ->
>> DISABLED
>> * Added two configurations for the new thread pools and explained where
>> values will be picked-up mid Kafka version upgrade
>> * Explained how leftover remote partitions will be scheduled for deletion
>> * Updated the API to use StopReplica V5 rather than a whole new
>> controller-to-broker API
>> * Explained that the disablement procedure will be triggered by the
>> controller listening for an (Incremental)AlterConfig change
>> * Explained that we will first move log start offset and then issue a
>> deletion
>> * Went into more details that changing remote.log.disable.policy after
>> disablement won't do anything and that if a customer would like additional
>> data deleted they would have to use already existing methods
>>
>> Let me know if there are any new comments or I have missed something!
>>
>> Best,
>> Christo
>>
>> On Mon, 15 Apr 2024 at 12:40, Christo Lolov 
>> wrote:
>>
>>> Heya Doguscan,
>>>
>>> I believe that the state of the world after this KIP will be the
>>> following:
>>>
>>> For Zookeeper-backed clusters there will be 3 states: ENABLED, DISABLING
>>> and DISABLED. We want this because Zookeeper-backed clusters will await a
>>> confirmation from the brokers that they have indeed stopped tiered-related
>>> operations on the topic.
>>>
>>> For KRaft-backed clusters there will be only 2 states: ENABLED and
>>> DISABLED. KRaft takes a fire-and-forget approach for topic deletion. I
>>> believe the same approach ought to be taken for tiered topics. The
>>> mechanism which will ensure that leftover state in remote due to failures
>>> is cleaned up to me is the retention mechanism. In today's code, a leader
>>> deletes all segments it finds in remote with offsets below the log start
>>> offset. I believe this will be good enough for cleaning up leftover state
>>> in remote due to failures.
>>>
>>> I know that quite a few changes have been discussed so I will aim to put
>>> them on paper in the upcoming days and let everyone know!
>>>
>>> Best,
>>> Christo
>>>
>>> On Tue, 9 Apr 2024 at 14:49, Doğuşcan Namal 
>>> wrot

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-02 Thread Luke Chen
ge.
>>> b) on each restart, we should initiate the remote storage deletion
>>> because although we replayed a record with a DISABLED state, we can not be
>>> sure if the remote data is deleted or not.
>>>
>>> We could either consider keeping the remote topic in DISABLING state
>>> until all of the remote storage data is deleted, or we need an additional
>>> mechanism to handle the remote stray data.
>>>
>>> The existing topic deletion, for instance, handles stray logs on disk by
>>> detecting them on KafkaBroker startup and deleting before the
>>> ReplicaManager is started.
>>> Maybe we need a similar mechanism here as well if we don't want a
>>> DISABLING state. Otherwise, we need a callback from Brokers to validate
>>> that remote storage data is deleted and now we could move to the DISABLED
>>> state.
>>>
>>> Thanks.
>>>
>>> On Tue, 9 Apr 2024 at 12:45, Luke Chen  wrote:
>>>
>>>> Hi Christo,
>>>>
>>>> > I would then opt for moving information from DisableRemoteTopic
>>>> within the StopReplicas API which will then disappear in KRaft world as
>>>> it
>>>> is already scheduled for deprecation. What do you think?
>>>>
>>>> Sounds good to me.
>>>>
>>>> Thanks.
>>>> Luke
>>>>
>>>> On Tue, Apr 9, 2024 at 6:46 PM Christo Lolov 
>>>> wrote:
>>>>
>>>> > Heya Luke!
>>>> >
>>>> > I thought a bit more about it and I reached the same conclusion as
>>>> you for
>>>> > 2 as a follow-up from 1. In other words, in KRaft world I don't think
>>>> the
>>>> > controller needs to wait for acknowledgements for the brokers. All we
>>>> care
>>>> > about is that the leader (who is responsible for archiving/deleting
>>>> data in
>>>> > tiered storage) knows about the change and applies it properly. If
>>>> there is
>>>> > a leadership change halfway through the operation then the new leader
>>>> still
>>>> > needs to apply the message from the state topic and we know that a
>>>> > disable-message will be applied before a reenablement-message. I will
>>>> > change the KIP later today/tomorrow morning to reflect this reasoning.
>>>> >
>>>> > However, with this I believe that introducing a new API just for
>>>> > Zookeeper-based clusters (i.e. DisableRemoteTopic) becomes a bit of an
>>>> > overkill. I would then opt for moving information from
>>>> DisableRemoteTopic
>>>> > within the StopReplicas API which will then disappear in KRaft world
>>>> as it
>>>> > is already scheduled for deprecation. What do you think?
>>>> >
>>>> > Best,
>>>> > Christo
>>>> >
>>>> > On Wed, 3 Apr 2024 at 07:59, Luke Chen  wrote:
>>>> >
>>>> > > Hi Christo,
>>>> > >
>>>> > > 1. I agree with Doguscan that in KRaft mode, the controller won't
>>>> send
>>>> > RPCs
>>>> > > to the brokers (except in the migration path).
>>>> > > So, I think we could adopt the similar way we did to
>>>> > `AlterReplicaLogDirs`
>>>> > > (
>>>> > > KIP-858
>>>> > > <
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft#KIP858:HandleJBODbrokerdiskfailureinKRaft-Intra-brokerreplicamovement
>>>> > > >)
>>>> > > that let the broker notify controller any update, instead of
>>>> controller
>>>> > to
>>>> > > broker. And once the controller receives all the complete requests
>>>> from
>>>> > > brokers, it'll enter "Disabled" state. WDYT?
>>>> > >
>>>> > > 2. Why should we wait until all brokers to respond before moving to
>>>> > > "Disabled" state in "KRaft mode"?
>>>> > > Currently, only the leader node does the remote log upload/fetch
>>>> tasks,
>>>> > so
>>>> > > does that mean the controller only need to make sure the leader
>>>> completes
>>>> > > the stopPartition?
>>>> > > If during the lead

[jira] [Resolved] (KAFKA-16467) Add README to docs folder

2024-04-29 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16467.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add README to docs folder
> -
>
> Key: KAFKA-16467
> URL: https://issues.apache.org/jira/browse/KAFKA-16467
> Project: Kafka
>  Issue Type: Improvement
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Minor
> Fix For: 3.8.0
>
>
> We don't have a guide in project root folder or docs folder to show how to 
> run local website. It's good to provide a way to run document with kafka-site 
> repository.
>  
> Option 1: Add links to wiki page 
> [https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Website+Documentation+Changes]
>  and 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67634793]. 
> Option 2: Show how to run the document within container. For example: moving 
> `site-docs` from kafka to kafka-site repository and run `./start-preview.sh`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16563) migration to KRaft hanging after MigrationClientException

2024-04-29 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16563.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Fixed

> migration to KRaft hanging after MigrationClientException
> -
>
> Key: KAFKA-16563
> URL: https://issues.apache.org/jira/browse/KAFKA-16563
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>    Assignee: Luke Chen
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> When running ZK migrating to KRaft process, we encountered an issue that the 
> migrating is hanging and the `ZkMigrationState` cannot move to `MIGRATION` 
> state. After investigation, the root cause is because the pollEvent didn't 
> retry with the retriable `MigrationClientException` (i.e. ZK client retriable 
> errors) while it should. And because of this, the poll event will not poll 
> anymore, which causes the KRaftMigrationDriver cannot work as expected.
>  
> {code:java}
> 2024-04-11 21:27:55,393 INFO [KRaftMigrationDriver id=5] Encountered 
> ZooKeeper error during event PollEvent. Will retry. 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver) 
> [controller-5-migration-driver-event-handler]org.apache.zookeeper.KeeperException$NodeExistsException:
>  KeeperErrorCode = NodeExists for /migrationat 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:126)at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)at 
> kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:570)at 
> kafka.zk.KafkaZkClient.createInitialMigrationState(KafkaZkClient.scala:1701)  
>   at 
> kafka.zk.KafkaZkClient.getOrCreateMigrationState(KafkaZkClient.scala:1689)
> at 
> kafka.zk.ZkMigrationClient.$anonfun$getOrCreateMigrationRecoveryState$1(ZkMigrationClient.scala:109)
> at 
> kafka.zk.ZkMigrationClient.getOrCreateMigrationRecoveryState(ZkMigrationClient.scala:69)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.recoverMigrationStateFromZK(KRaftMigrationDriver.java:169)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$1900(KRaftMigrationDriver.java:62)
> at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver$PollEvent.run(KRaftMigrationDriver.java:794)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> at java.base/java.lang.Thread.run(Thread.java:840){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1023: Follower fetch from tiered offset

2024-04-26 Thread Luke Chen
Hi Abhijeet,

Thanks for the KIP.
+1 from me.

Thanks.
Luke

On Fri, Apr 26, 2024 at 5:41 PM Omnia Ibrahim 
wrote:

> Thanks for the KIP. +1 non-binding from me
>
> > On 26 Apr 2024, at 06:29, Abhijeet Kumar 
> wrote:
> >
> > Hi All,
> >
> > I would like to start the vote for KIP-1023 - Follower fetch from tiered
> > offset
> >
> > The KIP is here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> >
> > Regards.
> > Abhijeet.
>
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Luke Chen
Hi, Abhijeet,

Thanks for the update.

I have no more comments.

Luke

On Thu, Apr 25, 2024 at 4:21 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the updated KIP. It looks good to me.
>
> Jun
>
> On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > Please find my comments inline.
> >
> >
> > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the reply.
> > >
> > > 1. I am wondering if we could achieve the same result by just lowering
> > > local.retention.ms and local.retention.bytes. This also allows the
> newly
> > > started follower to build up the local data before serving the consumer
> > > traffic.
> > >
> >
> > I am not sure I fully followed this. Do you mean we could lower the
> > local.retention (by size and time)
> > so that there is little data on the leader's local storage so that the
> > follower can quickly catch up with the leader?
> >
> > In that case, we will need to set small local retention across brokers in
> > the cluster. It will have the undesired
> > effect where there will be increased remote log fetches for serving
> consume
> > requests, and this can cause
> > degradations. Also, this behaviour (of increased remote fetches) will
> > happen on all brokers at all times, whereas in
> > the KIP we are restricting the behavior only to the newly bootstrapped
> > brokers and only until the time it fully builds
> > the local logs as per retention defined at the cluster level.
> > (Deprioritization of the broker could help reduce the impact
> >  even further)
> >
> >
> > >
> > > 2. Have you updated the KIP?
> > >
> >
> > The KIP has been updated now.
> >
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > also to the existing section that talked about the drawback when a
> > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > mentioned, we plan to have a follow-up KIP that will address these by
> > > > having a deprioritzation of these brokers. The deprioritization of
> > > > those brokers can be removed once they catchup until the local log
> > > > retention.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > > >
> > > > > Hi Abhijeet,
> > > > >
> > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > >
> > > > > Questions:
> > > > > 1. We could also get the "pending-upload-offset" and epoch via
> remote
> > > log
> > > > > metadata, instead of adding a new API to fetch from the leader.
> Could
> > > you
> > > > > explain why you choose the later approach, instead of the former?
> > > > > 2.
> > > > > > We plan to have a follow-up KIP that will address both the
> > > > > deprioritization
> > > > > of these brokers from leadership, as well as
> > > > > for consumption (when fetching from followers is allowed).
> > > > >
> > > > > I agree with Jun that we might need to make it clear all possible
> > > > drawbacks
> > > > > that could have. So, could we add the drawbacks that Jun mentioned
> > > about
> > > > > the performance issue when consumer fetch from follower?
> > > > >
> > > > > 3. Could we add "Rejected Alternatives" section to the end of the
> KIP
> > > to
> > > > > add some of them?
> > > > > Like the "ListOffsetRequest" approach VS
> > > "Earliest-Pending-Upload-Offset"
> > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from
> remote
> > > log
> > > > > metadata... etc.
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > >
> > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> &g

Re: [Confluence] Request for an account

2024-04-24 Thread Luke Chen
Hi Arpit,

You should be able to check the status in
https://issues.apache.org/jira/browse/INFRA-25451 .
It's infra's issue, not ours.

Thanks.
Luke

On Thu, Apr 25, 2024 at 12:49 PM Arpit Goyal 
wrote:

> Hi All,
> Is this issue resolved ?
> Thanks and Regards
> Arpit Goyal
> 8861094754
>
>
> On Tue, Mar 26, 2024 at 9:11 AM Luke Chen  wrote:
>
> > Hi Johnny,
> >
> > Currently, there is an infra issue about this:
> > https://issues.apache.org/jira/browse/INFRA-25451 , and unfortunately
> it's
> > not fixed, yet.
> > I think alternatively, maybe you could put your proposal in a shared
> google
> > doc for discussion. (without comment enabled since we want to keep all
> the
> > discussion history in apache email threads).
> > After discussion is completed, committers can help you add the content
> into
> > confluence wiki.
> >
> > Thanks.
> > Luke
> >
> > On Mon, Mar 25, 2024 at 8:56 PM ChengHan Hsu 
> > wrote:
> >
> > > Hi all,
> > >
> > > I have sent a email to infrastruct...@apache.org for registering an
> > > account
> > > of Confluence, I am contributing to Kafka and would like to update some
> > > wiki.
> > > May I know if anyone can help with this?
> > >
> > > Thanks in advance!
> > >
> > > Best,
> > > Johnny
> > >
> >
>


[jira] [Created] (KAFKA-16617) Add KRaft info for the `advertised.listeners` doc description

2024-04-24 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16617:
-

 Summary: Add KRaft info for the `advertised.listeners` doc 
description
 Key: KAFKA-16617
 URL: https://issues.apache.org/jira/browse/KAFKA-16617
 Project: Kafka
  Issue Type: Improvement
Reporter: Luke Chen


Currently, we only write ZK handler in the `advertised.listeners` doc 
description:

> Listeners to publish to ZooKeeper for clients to use, if different than the 
> listeners config property.

We should also add KRaft handler info in the doc

ref: https://kafka.apache.org/documentation/#brokerconfigs_advertised.listeners



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New committer: Igor Soarez

2024-04-24 Thread Luke Chen
Congrats, Igor!

On Thu, Apr 25, 2024 at 6:10 AM Matthias J. Sax  wrote:

> Congrats!
>
> On 4/24/24 2:29 PM, Bill Bejeck wrote:
> > Congrats Igor!
> >
> > -Bill
> >
> > On Wed, Apr 24, 2024 at 2:37 PM Tom Bentley  wrote:
> >
> >> Congratulations Igor!
> >>
> >> On Thu, 25 Apr 2024 at 6:27 AM, Chia-Ping Tsai 
> wrote:
> >>
> >>> Congratulations, Igor! you are one of the best Kafka developers!!!
> >>>
> >>> Mickael Maison  於 2024年4月25日 週四 上午2:16寫道:
> >>>
>  Congratulations Igor!
> 
>  On Wed, Apr 24, 2024 at 8:06 PM Colin McCabe 
> >> wrote:
> >
> > Hi all,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer,
>  Igor Soarez.
> >
> > Igor has been a Kafka contributor since 2019. In addition to being a
>  regular contributor and reviewer, he has made significant
> contributions
> >>> to
>  improving Kafka's JBOD support in KRaft mode. He has also contributed
> >> to
>  discussing and reviewing many KIPs such as KIP-690, KIP-554, KIP-866,
> >> and
>  KIP-938.
> >
> > Congratulations, Igor!
> >
> > Thanks,
> >
> > Colin (on behalf of the Apache Kafka PMC)
> 
> >>>
> >>
> >
>


Re: [VOTE] KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission

2024-04-24 Thread Luke Chen
Hi Nikhil,
Thanks for the KIP.

+1 from me.

Luke

On Mon, Apr 22, 2024 at 7:41 PM Andrew Schofield 
wrote:

> Hi Nikhil,
> Thanks for the KIP. Looks good to me.
>
> +1 (non-binding)
>
> Thanks,
> Andrew
>
> > On 22 Apr 2024, at 09:17, Christo Lolov  wrote:
> >
> > Heya Nikhil,
> >
> > Thanks for the proposal, as mentioned before it makes sense to me!
> >
> > +1 (binding)
> >
> > Best,
> > Christo
> >
> > On Sat, 20 Apr 2024 at 00:25, Justine Olshan
> 
> > wrote:
> >
> >> Hey Nikhil,
> >>
> >> I meant to comment on the discussion thread, but my draft took so long,
> you
> >> opened the vote.
> >>
> >> Regardless, I just wanted to say that it makes sense to me. +1 (binding)
> >>
> >> Justine
> >>
> >> On Fri, Apr 19, 2024 at 7:22 AM Nikhil Ramakrishnan <
> >> ramakrishnan.nik...@gmail.com> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> I would like to start a voting thread for KIP-1037: Allow
> >>> WriteTxnMarkers API with Alter Cluster Permission
> >>> (
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1037%3A+Allow+WriteTxnMarkers+API+with+Alter+Cluster+Permission
> >>> )
> >>> as there have been no objections on the discussion thread.
> >>>
> >>> For comments or feedback please check the discussion thread here:
> >>> https://lists.apache.org/thread/bbkyt8mrc8xp3jfyvhph7oqtjxl29xmn
> >>>
> >>> Thanks,
> >>> Nikhil
> >>>
> >>
>
>


[jira] [Resolved] (KAFKA-16424) truncated logs will be left undeleted after alter dir completion

2024-04-24 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16424.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> truncated logs will be left undeleted after alter dir completion
> 
>
> Key: KAFKA-16424
> URL: https://issues.apache.org/jira/browse/KAFKA-16424
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Assignee: PoAn Yang
>Priority: Major
> Fix For: 3.8.0
>
>
> When doing log dir movement, we'll create a temp future replica with the dir 
> named: topic-partition.uniqueId-future, ex: 
> t3-0.9af8e054dbe249cf9379a210ec449af8-future. After the log dir movement 
> completed, we'll rename the future log dir to the normal log dir, in the 
> above case, it'll be "t3" only.
> So, if there are some logs to be deleted during the log dir movement, we'll 
> send for a scheduler to do the deletion later 
> ([here|https://github.com/apache/kafka/blob/2d4abb85bf4a3afb1e3359a05786ab8f3fda127e/core/src/main/scala/kafka/log/LocalLog.scala#L926]).
>  However, when the log dir movement completed, the future log is renamed, the 
> async log deletion will fail with no file existed error:
>  
> {code:java}
> [2024-03-26 17:35:10,809] INFO [LocalLog partition=t3-0, 
> dir=/tmp/kraft-broker-logs] Deleting segment files LogSegment(baseOffset=0, 
> size=0, lastModifiedTime=0, largestRecordTimestamp=-1) (kafka.log.LocalLog$)
> [2024-03-26 17:35:10,810] INFO Failed to delete log 
> /tmp/kraft-broker-logs/t3-0.9af8e054dbe249cf9379a210ec449af8-future/.log.deleted
>  because it does not exist. 
> (org.apache.kafka.storage.internals.log.LogSegment)
> [2024-03-26 17:35:10,811] INFO Failed to delete offset index 
> /tmp/kraft-broker-logs/t3-0.9af8e054dbe249cf9379a210ec449af8-future/.index.deleted
>  because it does not exist. 
> (org.apache.kafka.storage.internals.log.LogSegment)
> [2024-03-26 17:35:10,811] INFO Failed to delete time index 
> /tmp/kraft-broker-logs/t3-0.9af8e054dbe249cf9379a210ec449af8-future/.timeindex.deleted
>  because it does not exist. 
> (org.apache.kafka.storage.internals.log.LogSegment) {code}
> I think we could consider fall back to the normal log dir if the future log 
> dir cannot find the files. That is, when the file cannot be found under 
> "t3-0.9af8e054dbe249cf9379a210ec449af8-future" dir, then try to find "t3" 
> folder, and delete the file. Because the file is already having the suffix 
> with ".delete", it should be fine if we delete them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16563) migration to KRaft hanging after KeeperException

2024-04-16 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16563:
-

 Summary: migration to KRaft hanging after KeeperException
 Key: KAFKA-16563
 URL: https://issues.apache.org/jira/browse/KAFKA-16563
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen
Assignee: Luke Chen


When running ZK migrating to KRaft process, we encountered an issue that the 
migrating is hanging and the `ZkMigrationState` cannot move to `MIGRATION` 
state. After investigation, the root cause is because the pollEvent didn't 
retry with the retriable KeeperException while it should.

 
{code:java}
2024-04-11 21:27:55,393 INFO [KRaftMigrationDriver id=5] Encountered ZooKeeper 
error during event PollEvent. Will retry. 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver) 
[controller-5-migration-driver-event-handler]org.apache.zookeeper.KeeperException$NodeExistsException:
 KeeperErrorCode = NodeExists for /migrationat 
org.apache.zookeeper.KeeperException.create(KeeperException.java:126)at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)at 
kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:570)at 
kafka.zk.KafkaZkClient.createInitialMigrationState(KafkaZkClient.scala:1701)
at kafka.zk.KafkaZkClient.getOrCreateMigrationState(KafkaZkClient.scala:1689)   
 at 
kafka.zk.ZkMigrationClient.$anonfun$getOrCreateMigrationRecoveryState$1(ZkMigrationClient.scala:109)
at 
kafka.zk.ZkMigrationClient.getOrCreateMigrationRecoveryState(ZkMigrationClient.scala:69)
at 
org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)
at 
org.apache.kafka.metadata.migration.KRaftMigrationDriver.recoverMigrationStateFromZK(KRaftMigrationDriver.java:169)
at 
org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$1900(KRaftMigrationDriver.java:62)
at 
org.apache.kafka.metadata.migration.KRaftMigrationDriver$PollEvent.run(KRaftMigrationDriver.java:794)
at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
at java.base/java.lang.Thread.run(Thread.java:840){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16552) Create an internal config to control InitialTaskDelayMs in LogManager to speed up tests

2024-04-15 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16552:
-

 Summary: Create an internal config to control InitialTaskDelayMs 
in LogManager to speed up tests
 Key: KAFKA-16552
 URL: https://issues.apache.org/jira/browse/KAFKA-16552
 Project: Kafka
  Issue Type: Improvement
Reporter: Luke Chen
Assignee: Luke Chen


When startup LogManager, we'll create schedule tasks like: kafka-log-retention, 
kafka-recovery-point-checkpoint threads...etc 
([here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L629]).
 All of them have public configs to configure the interval, like 
`log.retention.check.interval.ms`. But in addition to the scheduler interval, 
there's a hard coded InitialTaskDelayMs (30 seconds) for all of them. That 
might not be a problem in production env, since it'll make the kafka server 
start up faster. But in test env, the 30 secs delay means if there are tests 
verifying the behaviors like log retention, it'll take 30 secs up to complete 
the tests.

To speed up tests, we should create an internal config (ex: 
"log.initial.task.delay.ms") to control InitialTaskDelayMs in LogManager to 
speed up tests. This is not intended to be used by normal users, just for 
speeding up testing usage.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Request for Contributor Permission

2024-04-14 Thread Luke Chen
Hi Suprem,

I've granted you access.

Thanks.
Luke

On Mon, Apr 15, 2024 at 7:15 AM Suprem Vanam 
wrote:

> Hi, I'm interested in contributing to Kafka. I have access to the JIRA
> board but cannot assign issues to myself. Could you give contributor
> permissions to Jira ID: supremvanam.
>
> Thank you.
>


Re: [ANNOUNCE] New Kafka PMC Member: Greg Harris

2024-04-13 Thread Luke Chen
Congrats, Greg!

On Sun, Apr 14, 2024 at 7:05 AM Viktor Somogyi-Vass
 wrote:

> Congrats Greg! :)
>
> On Sun, Apr 14, 2024, 00:35 Bill Bejeck  wrote:
>
> > Congrats Greg!
> >
> > -Bill
> >
> > On Sat, Apr 13, 2024 at 4:25 PM Boudjelda Mohamed Said <
> bmsc...@gmail.com>
> > wrote:
> >
> > > Congratulations Greg
> > >
> > > On Sat 13 Apr 2024 at 20:42, Chris Egerton 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Greg Harris has been a Kafka committer since July 2023. He has
> remained
> > > > very active and instructive in the community since becoming a
> > committer.
> > > > It's my pleasure to announce that Greg is now a member of Kafka PMC.
> > > >
> > > > Congratulations, Greg!
> > > >
> > > > Chris, on behalf of the Apache Kafka PMC
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-04-09 Thread Luke Chen
Hi Christo,

> I would then opt for moving information from DisableRemoteTopic
within the StopReplicas API which will then disappear in KRaft world as it
is already scheduled for deprecation. What do you think?

Sounds good to me.

Thanks.
Luke

On Tue, Apr 9, 2024 at 6:46 PM Christo Lolov  wrote:

> Heya Luke!
>
> I thought a bit more about it and I reached the same conclusion as you for
> 2 as a follow-up from 1. In other words, in KRaft world I don't think the
> controller needs to wait for acknowledgements for the brokers. All we care
> about is that the leader (who is responsible for archiving/deleting data in
> tiered storage) knows about the change and applies it properly. If there is
> a leadership change halfway through the operation then the new leader still
> needs to apply the message from the state topic and we know that a
> disable-message will be applied before a reenablement-message. I will
> change the KIP later today/tomorrow morning to reflect this reasoning.
>
> However, with this I believe that introducing a new API just for
> Zookeeper-based clusters (i.e. DisableRemoteTopic) becomes a bit of an
> overkill. I would then opt for moving information from DisableRemoteTopic
> within the StopReplicas API which will then disappear in KRaft world as it
> is already scheduled for deprecation. What do you think?
>
> Best,
> Christo
>
> On Wed, 3 Apr 2024 at 07:59, Luke Chen  wrote:
>
> > Hi Christo,
> >
> > 1. I agree with Doguscan that in KRaft mode, the controller won't send
> RPCs
> > to the brokers (except in the migration path).
> > So, I think we could adopt the similar way we did to
> `AlterReplicaLogDirs`
> > (
> > KIP-858
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft#KIP858:HandleJBODbrokerdiskfailureinKRaft-Intra-brokerreplicamovement
> > >)
> > that let the broker notify controller any update, instead of controller
> to
> > broker. And once the controller receives all the complete requests from
> > brokers, it'll enter "Disabled" state. WDYT?
> >
> > 2. Why should we wait until all brokers to respond before moving to
> > "Disabled" state in "KRaft mode"?
> > Currently, only the leader node does the remote log upload/fetch tasks,
> so
> > does that mean the controller only need to make sure the leader completes
> > the stopPartition?
> > If during the leader node stopPartition process triggered leadership
> > change, then the new leader should receive and apply the configRecord
> > update before the leadership change record based on the KRaft design,
> which
> > means there will be no gap that the follower node becomes the leader and
> > starting doing unexpected upload/fetch tasks, right?
> > I agree we should make sure in ZK mode, all brokers are completed the
> > stopPartitions before moving to "Disabled" state because ZK node watcher
> is
> > working in a separate thread. But not sure about KRaft mode.
> >
> > Thanks.
> > Luke
> >
> >
> > On Fri, Mar 29, 2024 at 4:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya everyone!
> > >
> > > re: Doguscan
> > >
> > > I believe the answer to 101 needs a bit more discussion. As far as I
> > know,
> > > tiered storage today has methods to update a metadata of a segment to
> say
> > > "hey, I would like this deleted", but actual deletion is left to plugin
> > > implementations (or any background cleaners). In other words, there is
> no
> > > "immediate" deletion. In this KIP, we would like to continue doing the
> > same
> > > if the retention policy is set to delete. So I believe the answer is
> > > actually that a) we will update the metadata of the segments to mark
> them
> > > as deleted and b) we will advance the log start offset. Any deletion of
> > > actual files will still be delegated to plugin implementations. I
> believe
> > > this is further supported by "*remote.log.disable.policy=delete:* Logs
> > that
> > > are archived in the remote storage will not be part of the contiguous
> > > "active" log and will be deleted asynchronously as part of the
> > disablement
> > > process"
> > >
> > > Following from the above, I believe for 102 it is fine to allow setting
> > of
> > > remote.log.disable.policy on a disabled topic in much the same way we
> > allow
> > > other remote-related configurations to be set on a topic (i.e.
> > > local.retenti

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Luke Chen
Hi Abhijeet,

Thanks for the KIP to improve the tiered storage feature!

Questions:
1. We could also get the "pending-upload-offset" and epoch via remote log
metadata, instead of adding a new API to fetch from the leader. Could you
explain why you choose the later approach, instead of the former?
2.
> We plan to have a follow-up KIP that will address both the
deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).

I agree with Jun that we might need to make it clear all possible drawbacks
that could have. So, could we add the drawbacks that Jun mentioned about
the performance issue when consumer fetch from follower?

3. Could we add "Rejected Alternatives" section to the end of the KIP to
add some of them?
Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
metadata... etc.

Thanks.
Luke


On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
wrote:

> Hi Christo,
>
> Please find my comments inline.
>
> On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> wrote:
>
> > Hello Abhijeet and Jun,
> >
> > I have been mulling this KIP over a bit more in recent days!
> >
> > re: Jun
> >
> > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > retrospect it should have been fairly obvious. I would need to go an
> update
> > KIP-1005 myself then, thank you for giving the useful reference!
> >
> > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > which experienced a disk failure) to decrease the time a partition spends
> > in under-replicated state. In other words, a follower which has just
> fallen
> > out of ISR, but has local data will continue using today's Tiered Storage
> > replication protocol (i.e. fetching from earliest-local). I further
> believe
> > he has taken this approach so that local state of replicas which have
> just
> > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > Abhijeet, have I understood (and summarised) what you are proposing
> > correctly?
> >
> > Yes, your understanding is correct. We want to limit the behavior changes
> only to new replicas.
>
>
> > 5. I think in today's Tiered Storage we know the leader's
> log-start-offset
> > from the FetchResponse and we can learn its local-log-start-offset from
> the
> > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> this
> > ought to be added as an additional API call in the KIP.
> >
> >
> Yes, I clarified this in my reply to Jun. I will add this missing detail in
> the KIP.
>
>
> > re: Abhijeet
> >
> > 101. I am still a bit confused as to why you want to include a new offset
> > (i.e. pending-upload-offset) when you yourself mention that you could use
> > an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> > end your Motivation with "In this KIP, we will focus only on the follower
> > fetch protocol using the *last-tiered-offset*" and then in the following
> > sections you talk about pending-upload-offset. I understand this might be
> > classified as an implementation detail, but if you introduce a new offset
> > (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> > API (i.e. introduce -6) and thus document it in this KIP as such.
> However,
> > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > (under implementation). Am I misunderstanding something here?
> >
>
> I have tried to clarify this in my reply to Jun.
>
> > The follower needs to build the local data starting from the offset
> > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > corresponding leader-epoch.
> > There are two ways to do this:
> >1. We add support in ListOffsetRequest to be able to fetch this offset
> > (and leader epoch) from the leader.
> >2. Or, fetch the tiered-offset (which is already supported). From this
> > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1
> to
> > the tiered-offset.
> >   However, we still need the leader epoch for offset, since there is
> > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset
> will
> > be the same as the
> >   leader epoch for tiered-offset. We may need another API call to the
> > leader for this.
> > I prefer the first approach. The only problem with the first approach is
> > that it introduces one more offset. The second approach avoids this
> problem
> > but is a little complicated.
>
>
>
> >
> > Best,
> > Christo
> >
> > On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the KIP. Left a few comments.
> > >
> > > 1. "A drawback of using the last-tiered-offset is that this new
> follower
> > > would possess only a limited number of locally stored segments. Should
> it
> > > ascend to the role of leader, there is a risk of 

[jira] [Resolved] (KAFKA-16455) Check partition exists before send reassignments to server in ReassignPartitionsCommand

2024-04-08 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16455.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Check partition exists before send reassignments to server in 
> ReassignPartitionsCommand
> ---
>
> Key: KAFKA-16455
> URL: https://issues.apache.org/jira/browse/KAFKA-16455
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Minor
> Fix For: 3.8.0
>
>
> Currently, when executing {{kafka-reassign-partitions.sh}} with the 
> {{--execute}} option, if a partition number specified in the JSON file does 
> not exist, this check occurs only when submitting the reassignments to 
> {{alterPartitionReassignments}} on the server-side.
> We can perform this check in advance before submitting the reassignments to 
> the server side.
> For example, suppose we have three brokers with IDs 1001, 1002, and 1003, and 
> a topic named {{first_topic}} with only three partitions. And execute 
> {code:bash}
> bin/kafka-reassign-partitions.sh 
>   --bootstrap-server 192.168.0.128:9092 
>   --reassignment-json-file reassignment.json 
>   --execute
> {code}
> Where reassignment.json contains
> {code:json}
> {
>   "version": 1,
>   "partitions": [
> {
>   "topic": "first_topic",
>   "partition": 20,
>   "replicas": [1002, 1001, 1003],
>   "log_dirs": ["any", "any", "any"]
> }
>   ]
> }
> {code}
> The console outputs
> {code:java}
> Current partition replica assignment
> {"version":1,"partitions":[]}
> Save this to use as the --reassignment-json-file option during rollback
> Error reassigning partition(s):
> first_topic-20: The partition does not exist.
> {code}
> Apart from the output {{\{"version":1,"partitions":[]\}}} which doesn't 
> provide much help, the error {{first_topic-20: The partition does not 
> exist.}} is reported back to the tool from the server-side, as mentioned 
> earlier. This check could be moved earlier before sending reassignments to 
> server side



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16234) Log directory failure re-creates partitions in another logdir automatically

2024-04-06 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16234.
---
Resolution: Fixed

> Log directory failure re-creates partitions in another logdir automatically
> ---
>
> Key: KAFKA-16234
> URL: https://issues.apache.org/jira/browse/KAFKA-16234
> Project: Kafka
>  Issue Type: Sub-task
>  Components: jbod
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Critical
> Fix For: 3.8.0, 3.7.1
>
>
> With [KAFKA-16157|https://github.com/apache/kafka/pull/15263] we made changes 
> in {{HostedPartition.Offline}} enum variant to embed {{Partition}} object. 
> Further, {{ReplicaManager::getOrCreatePartition}} tries to compare the old 
> and new topicIds to decide if it needs to create a new log.
> The getter for {{Partition::topicId}} relies on retrieving the topicId from 
> {{log}} field or {{{}logManager.currentLogs{}}}. The former is set to 
> {{None}} when a partition is marked offline and the key for the partition is 
> removed from the latter by {{{}LogManager::handleLogDirFailure{}}}. 
> Therefore, topicId for a partitioned marked offline always returns {{None}} 
> and new logs for all partitions in a failed log directory are always created 
> on another disk.
> The broker will fail to restart after the failed disk is repaired because 
> same partitions will occur in two different directories. The error does 
> however inform the operator to remove the partitions from the disk that 
> failed which should help with broker startup.
> We can avoid this with KAFKA-16212 but in the short-term, an immediate 
> solution can be to have {{Partition}} object accept {{Option[TopicId]}} in 
> it's constructor and have it fallback to {{log}} or {{logManager}} if it's 
> unset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16359) kafka-clients-3.7.0.jar published to Maven Central is defective

2024-04-04 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16359.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Fixed

> kafka-clients-3.7.0.jar published to Maven Central is defective
> ---
>
> Key: KAFKA-16359
> URL: https://issues.apache.org/jira/browse/KAFKA-16359
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.7.0
>Reporter: Jeremy Norris
>Assignee: Apoorv Mittal
>Priority: Critical
> Fix For: 3.8.0, 3.7.1
>
>
> The {{kafka-clients-3.7.0.jar}} that has been published to Maven Central is 
> defective: it's {{META-INF/MANIFEST.MF}} bogusly include a {{Class-Path}} 
> element:
> {code}
> Manifest-Version: 1.0
> Class-Path: zstd-jni-1.5.5-6.jar lz4-java-1.8.0.jar snappy-java-1.1.10
> .5.jar slf4j-api-1.7.36.jar
> {code}
> This bogus {{Class-Path}} element leads to compiler warnings for projects 
> that utilize it as a dependency:
> {code}
> [WARNING] [path] bad path element 
> ".../.m2/repository/org/apache/kafka/kafka-clients/3.7.0/zstd-jni-1.5.5-6.jar":
>  no such file or directory
> [WARNING] [path] bad path element 
> ".../.m2/repository/org/apache/kafka/kafka-clients/3.7.0/lz4-java-1.8.0.jar": 
> no such file or directory
> [WARNING] [path] bad path element 
> ".../.m2/repository/org/apache/kafka/kafka-clients/3.7.0/snappy-java-1.1.10.5.jar":
>  no such file or directory
> [WARNING] [path] bad path element 
> ".../.m2/repository/org/apache/kafka/kafka-clients/3.7.0/slf4j-api-1.7.36.jar":
>  no such file or directory
> {code}
> Either the {{kafka-clients-3.7.0.jar}} needs to be respun and published 
> without the bogus {{Class-Path}} element in it's {{META-INF/MANIFEST.MF}} or 
> a new release should be published that corrects this defect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1026: Handling producer snapshot when upgrading from < v2.8.0 for Tiered Storage

2024-04-04 Thread Luke Chen
Hi Kamal,

Thanks for sharing! I didn't know about the change before v2.8.
If that's the case, then we have to reconsider the solution of this PR.
Is making it optional a good solution? Or should we recover the snapshot if
not found before uploading it?
But what if the topic is created before v2.8.0 and old log segments are
deleted, how could we recover all the producer snapshot for old logs?

Thanks.
Luke


On Wed, Apr 3, 2024 at 11:59 PM Arpit Goyal 
wrote:

> Thanks @Kamal Chandraprakash   Greg Harris
> I currently do not have detailed understanding on the behaviour when empty
> producer snapshot  restored. I will try to test out the behaviour.Meanwhile
> I would request other community members if they can chime in and assist if
> they are already aware of the behaviour mentioned above.
> Thanks and Regards
> Arpit Goyal
> 8861094754
>
>
> On Tue, Mar 26, 2024 at 4:04 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi,
> >
> > Sorry for the late reply. Greg has raised some good points:
> >
> > > Does an empty producer snapshot have the same behavior as a
> > non-existent snapshot when restored?
> >
> > Producer snapshots maintain the status about the ongoing txn to either
> > COMMIT/ABORT the transaction. In the older version (<2.8), we maintain
> > the producer snapshots only for the recent segments. If such a topic gets
> > onboarded to tiered storage and the recently built replica becomes then
> > leader,
> > then it might break the producer.
> >
> > Assume there are 100 segments for a partition, and the producer snapshots
> > are available only for the recent 2 segments. Then, tiered storage is
> > enabled
> > for that topic, 90/100 segments are uploaded and local-log-segments are
> > cleared upto segment 50. If a new follower builds the state from remote,
> it
> > will
> > have the empty producer snapshots and start reading the data from the
> > leader
> > from segment-51. If a transaction is started at segment-40, then it will
> > break the
> > client.
> >
> > We also have to check the impact of expiring producer-ids before the
> > default
> > expiration time of 7 days: *transactional.id.expiration.ms
> > <http://transactional.id.expiration.ms>*
> >
> > > 2. Why were empty producer snapshots not backfilled for older data
> > when clusters were upgraded from 2.8?
> >
> > https://github.com/apache/kafka/pull/7929 -- It was not required at that
> > time.
> > With tiered storage, we need the snapshot file for each segment to
> reliably
> > build the follower state from remote storage.
> >
> > > 3. Do producer snapshots need to be available contiguously, or can
> > earlier snapshots be empty while later segments do not exist?
> >
> > I assume you refer to "while later segments do exist". Each snapshot file
> > will contain
> > the cumulative/complete data of all the previous segments. So, a recent
> > segment
> > snapshot is enough to build the producer state. We need to figure out a
> > solution to
> > build the complete producer state for replicas that built the state using
> > the remote.
> >
> > Arpit,
> > We have to deep dive into each of them to come up with the proper
> solution.
> >
> > --
> > Kamal
> >
> >
> > On Tue, Mar 26, 2024 at 3:55 AM Greg Harris  >
> > wrote:
> >
> > > Hi Arpit,
> > >
> > > I think creating empty producer snapshots would be
> > > backwards-compatible for the tiered storage plugins, but I'm not aware
> > > of what the other compatibility/design concerns might be. Maybe you or
> > > another reviewer can answer these questions:
> > > 1. Does an empty producer snapshot have the same behavior as a
> > > non-existent snapshot when restored?
> > > 2. Why were empty producer snapshots not backfilled for older data
> > > when clusters were upgraded from 2.8?
> > > 3. Do producer snapshots need to be available contiguously, or can
> > > earlier snapshots be empty while later segments do not exist?
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Sat, Mar 23, 2024 at 3:24 AM Arpit Goyal 
> > > wrote:
> > > >
> > > > Yes Luke,
> > > > I am also in favour of creating producer snapshot at run time if
> > > > foundEmpty  as this would only be required for topics migrated from <
> > 2.8
> > > > version. This will not break the existing contract with the plugin.
> > Yes,
> > > &

Re: [DISCUSS] KIP-1031: Control offset translation in MirrorSourceConnector

2024-04-03 Thread Luke Chen
Hi Omnia,

Thanks for the KIP!
It LGTM!
But I'm not an expert of MM2, it would be good to see if there is any other
comment from MM2 experts.

Thanks.
Luke

On Thu, Mar 14, 2024 at 6:08 PM Omnia Ibrahim 
wrote:

> Hi everyone, I would like to start a discussion thread for KIP-1031:
> Control offset translation in MirrorSourceConnector
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1031%3A+Control+offset+translation+in+MirrorSourceConnector
>
> Thanks
> Omnia
>


Re: [DISCUSS] KIP-1028: Docker Official Image for Apache Kafka

2024-04-03 Thread Luke Chen
Hi Krishna,

I also have the same question as Manikumar raised:
1. Will the Docker inventory files/etc are the same for OSS Image and
Docker Official Images?
If no, then why not? Could we make them identical so that we don't have to
build 2 images for each release?

Thank you.
Luke

On Wed, Apr 3, 2024 at 12:41 AM Manikumar  wrote:

> Hi Krishna,
>
> Thanks for the KIP.
>
> I think Docker Official Images will be beneficial to the Kafka community.
> Few queries below.
>
> 1. Will the Docker inventory files/etc are the same for OSS Image and
> Docker Official Images
> 2. I am a bit worried about the new steps to the release process. Maybe we
> should consider Docker Official Images release as Post-Release activity.
>
> Thanks,
>
> On Fri, Mar 22, 2024 at 3:29 PM Krish Vora  wrote:
>
> > Hi Hector,
> >
> > Thanks for reaching out. This KIP builds on top of KIP-975
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka
> > >
> > and
> > aims to introduce a JVM-based Docker Official Image (DOI
> > ) for Apache
> > Kafka that will be visible under Docker Official Images
> > . Once
> implemented
> > for Apache Kafka, for each release, there will be one more JVM-based
> Docker
> > image available to users.
> >
> > Currently, we already have an OSS sponsored image, which was introduced
> via
> > KIP-975 (apache/kafka )
> which
> > comes under The Apache Software Foundation <
> > https://hub.docker.com/u/apache> in
> > Docker Hub. The new Docker Image is the Docker Official Image (DOI),
> which
> > will be built and maintained by Docker Community.
> >
> > For example, for a release version like 3.8.0 we will have two JVM based
> > docker images:-
> >
> >- apache/kafka:3.8.0 (OSS sponsored image)
> >- kafka:3.8.0 (Docker Official image)
> >
> >
> > I have added the same in the KIP too for everyone's reference.
> > Thanks,
> > Krish.
> >
> > On Fri, Mar 22, 2024 at 2:50 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) <
> > hgerald...@bloomberg.net> wrote:
> >
> > > Hi,
> > >
> > > What is the difference between this KIP and KIP-975: Docker Image for
> > > Apache Kafka?
> > >
> > > From: dev@kafka.apache.org At: 03/21/24 07:30:07 UTC-4:00To:
> > > dev@kafka.apache.org
> > > Subject: [DISCUSS] KIP-1028: Docker Official Image for Apache Kafka
> > >
> > > Hi everyone,
> > >
> > > I would like to start the discussion on
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Im
> > > age+for+Apache+Kafka
> > >  .
> > >
> > > This KIP aims to introduce JVM based Docker Official Image (DOI) for
> > Apache
> > > Kafka.
> > >
> > > Regards,
> > > Krish.
> > >
> > >
> > >
> >
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-04-03 Thread Luke Chen
 of the existing
> > RemoteLogManagerScheduledThreadPool.*
> >
> > *Will add the details.*
> >
> > 105. How is the behaviour with topic or partition deletion request
> > handled when tiered storage disablement request is still being
> > processed on a topic?
> >
> > *If the disablement policy is Delete then a successive topic deletion
> > request is going to be a NOOP because RemoteLogs is already being
> deleted.*
> > *If the disablement policy is Retain, then we only moved the
> LogStartOffset
> > and didn't touch RemoteLogs anyway, so the delete topic request will
> > result*
> >
> > *in the initiation of RemoteLog deletion.*
> >
> >
> > On Tue, 26 Mar 2024 at 18:21, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Thanks for the KIP! Overall the KIP looks good and covered most of the
> > > items.
> > >
> > > 1. Could you explain how the brokers will handle the DisableRemoteTopic
> > API
> > > request?
> > >
> > > 2. Who will initiate the controller interaction sequence? Does the
> > > controller listens for
> > > topic config updates and initiate the disablement?
> > >
> > > --
> > > Kamal
> > >
> > > On Tue, Mar 26, 2024 at 4:40 PM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thanks Mehari, Divij, Christo etal for the KIP.
> > > >
> > > > I had an initial review of the KIP and left the below comments.
> > > >
> > > > 101. For remote.log.disable.policy=delete:
> > > > Does it delete the remote log data immediately and the data in remote
> > > > storage will not be taken into account by any replica? That means
> > > > log-start-offset is moved to the earlier local-log-start-offset.
> > > >
> > > > 102. Can we update the remote.log.disable.policy after tiered storage
> > > > is disabled on a topic?
> > > >
> > > > 103. Do we plan to add any metrics related to this feature?
> > > >
> > > > 104. Please add configuration details about copier thread pool,
> > > > expiration thread pool and the migration of the existing
> > > > RemoteLogManagerScheduledThreadPool.
> > > >
> > > > 105. How is the behaviour with topic or partition deletion request
> > > > handled when tiered storage disablement request is still being
> > > > processed on a topic?
> > > >
> > > > ~Satish.
> > > >
> > > > On Mon, 25 Mar 2024 at 13:34, Doğuşcan Namal <
> namal.dogus...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Hi Christo and Luke,
> > > > >
> > > > > I think the KRaft section of the KIP requires slight improvement.
> The
> > > > metadata propagation in KRaft is handled by the RAFT layer instead of
> > > > sending Controller -> Broker RPCs. In fact, KIP-631 deprecated these
> > > RPCs.
> > > > >
> > > > > I will come up with some recommendations on how we could improve
> that
> > > > one but until then, @Luke please feel free to review the KIP.
> > > > >
> > > > > @Satish, if we want this to make it to Kafka 3.8 I believe we need
> to
> > > > aim to get the KIP approved in the following weeks otherwise it will
> > slip
> > > > and we can not support it in Zookeeper mode.
> > > > >
> > > > > I also would like to better understand what is the community's
> stand
> > > for
> > > > adding a new feature for Zookeeper since it is marked as deprecated
> > > already.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 18 Mar 2024 at 13:42, Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Heya,
> > > > >>
> > > > >> I do have some time to put into this, but to be honest I am still
> > > after
> > > > >> reviews of the KIP itself :)
> > > > >>
> > > > >> After the latest changes it ought to be detailing both a Zookeeper
> > > > approach
> > > > >> and a KRaft approach.
> > > > >>
> > > > &

[jira] [Resolved] (KAFKA-15823) NodeToControllerChannelManager: authentication error prevents controller update

2024-03-31 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-15823.
---
Resolution: Fixed

> NodeToControllerChannelManager: authentication error prevents controller 
> update
> ---
>
> Key: KAFKA-15823
> URL: https://issues.apache.org/jira/browse/KAFKA-15823
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.6.0, 3.5.1
>Reporter: Gaurav Narula
>Priority: Major
> Fix For: 3.8.0
>
>
> NodeToControllerChannelManager caches the activeController address in an 
> AtomicReference which is updated when:
>  # activeController [has not been 
> set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422]
>  # networkClient [disconnnects from the 
> controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7]
>  # A node replies with 
> `[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`,
>  and
>  # When a controller changes from [Zk mode to Kraft 
> mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325]
>  
> When running multiple Kafka clusters in a dynamic environment, there is a 
> chance that a controller's IP may get reassigned to another cluster's broker 
> when the controller is bounced. In this scenario, the requests from Node to 
> the Controller may fail with an AuthenticationException and are then retried 
> indefinitely. This causes the node to get stuck as the new controller's 
> information is never set.
>  
> A potential fix would be disconnect the network client and invoke 
> `updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16323) Failing test: fix testRemoteFetchExpiresPerSecMetric

2024-03-31 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16323.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Fixed

> Failing test: fix testRemoteFetchExpiresPerSecMetric 
> -
>
> Key: KAFKA-16323
> URL: https://issues.apache.org/jira/browse/KAFKA-16323
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Johnny Hsu
>Assignee: Johnny Hsu
>Priority: Major
>  Labels: test-failure
> Fix For: 3.8.0, 3.7.1
>
>
> Refer to 
> [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2685/testReport/junit/kafka.server/ReplicaManagerTest/Build___JDK_21_and_Scala_2_13___testRemoteFetchExpiresPerSecMetric__/]
> This test is failing, and this ticket aims to address this 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16447) Fix failed ReplicaManagerTest

2024-03-30 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16447.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Fix failed ReplicaManagerTest
> -
>
> Key: KAFKA-16447
> URL: https://issues.apache.org/jira/browse/KAFKA-16447
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Nikolay Izhikov
>Priority: Major
> Fix For: 3.8.0
>
>
> see comment: https://github.com/apache/kafka/pull/15373/files#r1544335647 for 
> root cause



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16391) Cleanup .lock file after server is down

2024-03-26 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16391.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Cleanup .lock file after server is down
> ---
>
> Key: KAFKA-16391
> URL: https://issues.apache.org/jira/browse/KAFKA-16391
> Project: Kafka
>  Issue Type: Improvement
>Reporter: PoAn Yang
>Assignee: PoAn Yang
>Priority: Minor
> Fix For: 3.8.0
>
>
> Currently, server adds a `.lock` file to each log folder. The file is useless 
> after server is down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15949) Improve the KRaft metadata version related messages

2024-03-26 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-15949.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Improve the KRaft metadata version related messages
> ---
>
> Key: KAFKA-15949
> URL: https://issues.apache.org/jira/browse/KAFKA-15949
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>Reporter: Jakub Scholz
>Assignee: PoAn Yang
>Priority: Major
> Fix For: 3.8.0
>
>
> Various error messages related to KRaft seem to use very different style and 
> formatting. Just for example in the {{StorageTool}} Scala class, there are 
> two different examples:
>  * {{Must specify a valid KRaft metadata version of at least 3.0.}}
>  ** Refers to "metadata version"
>  ** Refers to the version as 3.0 (although strictly speaking 3.0-IV0 is not 
> valid for KRaft)
>  * {{SCRAM is only supported in metadataVersion IBP_3_5_IV2 or later.}}
>  ** Talks about "metadataVersion"
>  ** Refers to "IBP_3_5_IV2" instead of "3.5" or "3.5-IV2"
> Other pieces of Kafka code seem to also talk about "metadata.version" for 
> example.
> For users, it would be nice if the style and formats used were the same 
> everywhere. Would it be worth unifying messages like this? If yes, what would 
> be the preferred style to use?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[ANNOUNCE] New committer: Christo Lolov

2024-03-26 Thread Luke Chen
Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer:
Christo Lolov.

Christo has been a Kafka contributor since 2021. He has made over 50
commits. He authored KIP-902, KIP-963, and KIP-1005, as well as many tiered
storage related tasks. He also co-drives the migration from EasyMock to
Mockito and from Junit 4 to JUnit 5.

Congratulations, Christo!

Thanks,
Luke (on behalf of the Apache Kafka PMC)


[jira] [Created] (KAFKA-16425) wrong output when running log dir movement with kafka-reassign-partitions command

2024-03-26 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16425:
-

 Summary: wrong output when running log dir movement with 
kafka-reassign-partitions command 
 Key: KAFKA-16425
 URL: https://issues.apache.org/jira/browse/KAFKA-16425
 Project: Kafka
  Issue Type: Bug
Reporter: Luke Chen
Assignee: Cheng-Kai, Zhang


When running log dir movement with kafka-reassign-partitions command, the log 
output is like:

 
{code:java}
./bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 
--reassignment-json-file /tmp/mv4.json --executeCurrent partition replica 
assignment{"version":1,"partitions":[{"topic":"t3","partition":0,"replicas":[2],"log_dirs":["any"]}]}Save
 this to use as the --reassignment-json-file option during rollback
Successfully started partition reassignment for t3-0
Successfully started log directory move for: t3-0-2{code}
Here, I'm doing the log dir for t3-0, from dir a to b, but it output with:

_Successfully started log directory move for: t3-0-2_

This should be improved.

 

reproduce step:
1. create a broker with 2 log dirs. Ex: 
log.dirs=/tmp/kraft-broker-logs,/tmp/kraft-broker-logs_jbod
2. create a topic with 1 partition, ex: "t3"
3. create a mv.json file to move t3-0 to kraft-broker-logs (or 
kraft-broker-logs_jbod):
{"version":1,
  
"partitions":[\{"topic":"t3","partition":0,"replicas":[2],"log_dirs":["/tmp/kraft-broker-logs"]}]
  }
4. check the script output



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16424) truncated logs will be left undeleted after alter dir completion

2024-03-26 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16424:
-

 Summary: truncated logs will be left undeleted after alter dir 
completion
 Key: KAFKA-16424
 URL: https://issues.apache.org/jira/browse/KAFKA-16424
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


When doing log dir movement, we'll create a temp future replica with the dir 
named: topic-partition.uniqueId-future, ex: 
t3-0.9af8e054dbe249cf9379a210ec449af8-future. After the log dir movement 
completed, we'll rename the future log dir to the normal log dir, in the above 
case, it'll be "t3" only.

So, if there are some logs to be deleted during the log dir movement, we'll 
send for a scheduler to do the deletion later 
([here|https://github.com/apache/kafka/blob/2d4abb85bf4a3afb1e3359a05786ab8f3fda127e/core/src/main/scala/kafka/log/LocalLog.scala#L926]).
 However, when the log dir movement completed, the future log is renamed, the 
async log deletion will fail with no file existed error:

 
{code:java}
[2024-03-26 17:35:10,809] INFO [LocalLog partition=t3-0, 
dir=/tmp/kraft-broker-logs] Deleting segment files LogSegment(baseOffset=0, 
size=0, lastModifiedTime=0, largestRecordTimestamp=-1) (kafka.log.LocalLog$)
[2024-03-26 17:35:10,810] INFO Failed to delete log 
/tmp/kraft-broker-logs/t3-0.9af8e054dbe249cf9379a210ec449af8-future/.log.deleted
 because it does not exist. (org.apache.kafka.storage.internals.log.LogSegment)
[2024-03-26 17:35:10,811] INFO Failed to delete offset index 
/tmp/kraft-broker-logs/t3-0.9af8e054dbe249cf9379a210ec449af8-future/.index.deleted
 because it does not exist. (org.apache.kafka.storage.internals.log.LogSegment)
[2024-03-26 17:35:10,811] INFO Failed to delete time index 
/tmp/kraft-broker-logs/t3-0.9af8e054dbe249cf9379a210ec449af8-future/.timeindex.deleted
 because it does not exist. (org.apache.kafka.storage.internals.log.LogSegment) 
{code}
I think we could consider fall back to the normal log dir if the future log dir 
cannot find the files. That is, when the file cannot be found under 
"t3-0.9af8e054dbe249cf9379a210ec449af8-future" dir, then try to find "t3" 
folder, and delete the file. Because the file is already having the suffix with 
".delete", it should be fine if we delete them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Confluence] Request for an account

2024-03-25 Thread Luke Chen
Hi Johnny,

Currently, there is an infra issue about this:
https://issues.apache.org/jira/browse/INFRA-25451 , and unfortunately it's
not fixed, yet.
I think alternatively, maybe you could put your proposal in a shared google
doc for discussion. (without comment enabled since we want to keep all the
discussion history in apache email threads).
After discussion is completed, committers can help you add the content into
confluence wiki.

Thanks.
Luke

On Mon, Mar 25, 2024 at 8:56 PM ChengHan Hsu 
wrote:

> Hi all,
>
> I have sent a email to infrastruct...@apache.org for registering an
> account
> of Confluence, I am contributing to Kafka and would like to update some
> wiki.
> May I know if anyone can help with this?
>
> Thanks in advance!
>
> Best,
> Johnny
>


[jira] [Resolved] (KAFKA-16409) kafka-delete-records / DeleteRecordsCommand should use standard exception handling

2024-03-25 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16409.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> kafka-delete-records / DeleteRecordsCommand should use standard exception 
> handling
> --
>
> Key: KAFKA-16409
> URL: https://issues.apache.org/jira/browse/KAFKA-16409
> Project: Kafka
>  Issue Type: Task
>  Components: tools
>Affects Versions: 3.7.0
>Reporter: Greg Harris
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: newbie
> Fix For: 3.8.0
>
>
> When an exception is thrown in kafka-delete-records, it propagates through 
> `main` to the JVM, producing the following message:
> {noformat}
> bin/kafka-delete-records.sh --bootstrap-server localhost:9092 
> --offset-json-file /tmp/does-not-exist
> Exception in thread "main" java.io.IOException: Unable to read file 
> /tmp/does-not-exist
>         at 
> org.apache.kafka.common.utils.Utils.readFileAsString(Utils.java:787)
>         at 
> org.apache.kafka.tools.DeleteRecordsCommand.execute(DeleteRecordsCommand.java:105)
>         at 
> org.apache.kafka.tools.DeleteRecordsCommand.main(DeleteRecordsCommand.java:64)
> Caused by: java.nio.file.NoSuchFileException: /tmp/does-not-exist
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>         at java.nio.file.Files.newByteChannel(Files.java:361)
>         at java.nio.file.Files.newByteChannel(Files.java:407)
>         at java.nio.file.Files.readAllBytes(Files.java:3152)
>         at 
> org.apache.kafka.common.utils.Utils.readFileAsString(Utils.java:784)
>         ... 2 more{noformat}
> This is in contrast to the error handling used in other tools, such as the 
> kafka-log-dirs:
> {noformat}
> bin/kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe 
> --command-config /tmp/does-not-exist
> /tmp/does-not-exist
> java.nio.file.NoSuchFileException: /tmp/does-not-exist
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>         at java.nio.file.Files.newByteChannel(Files.java:361)
>         at java.nio.file.Files.newByteChannel(Files.java:407)
>         at 
> java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
>         at java.nio.file.Files.newInputStream(Files.java:152)
>         at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:686)
>         at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:673)
>         at 
> org.apache.kafka.tools.LogDirsCommand.createAdminClient(LogDirsCommand.java:149)
>         at 
> org.apache.kafka.tools.LogDirsCommand.execute(LogDirsCommand.java:68)
>         at 
> org.apache.kafka.tools.LogDirsCommand.mainNoExit(LogDirsCommand.java:54)
>         at 
> org.apache.kafka.tools.LogDirsCommand.main(LogDirsCommand.java:49){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Request to join

2024-03-25 Thread Luke Chen
Hi,

Please send an email to dev-subscr...@kafka.apache.org to subscribe to the
group.

Thanks.
Luke

On Mon, Mar 25, 2024 at 4:19 PM durairaj t  wrote:

>
>


Re: [DISCUSS] KIP-1026: Handling producer snapshot when upgrading from < v2.8.0 for Tiered Storage

2024-03-23 Thread Luke Chen
Hi Arpit,

I'm in favor of creating an empty producer snapshot since it's only for
topics <= v2.8.
About the metric, I don't know what we expect users to know.
I think we can implement with the empty producer snapshot method, without
the metric.
And add them if users are requested it.
WDYT?

Thank you.
Luke

On Sat, Mar 23, 2024 at 1:24 PM Arpit Goyal 
wrote:

> Hi Team,
> Any further comments or suggestions on the possible approaches discussed
> above.
>
> On Tue, Mar 19, 2024, 09:55 Arpit Goyal  wrote:
>
> > @Luke Chen @Kamal Chandraprakash   @Greg
> > Harris Any suggestion on the above two possible approaches.
> > On Sun, Mar 17, 2024, 13:36 Arpit Goyal 
> wrote:
> >
> >>
> >>>>  In summary , There are two possible solution to handle the above
> >> scenario when producer snapshot file found to be null
> >>
> >> 1. *Generate empty producer snapshot file at run time when copying
> >> LogSegment*
> >>
> >>
> >>- This will not require any backward compatibility dependencies with
> >>the plugin.
> >>- It preserves the contract i.e producerSnapshot files should be
> >>mandatory.
> >>- We could have a metric which helps us to assess how many times
> >>empty snapshot files have been created.
> >>
> >> 2.*  Make producerSnapshot files optional *
> >>
> >>- This would break the contract with the plugin and would require
> >>defining a set of approaches to handle it which is mentioned earlier
> in the
> >>thread.
> >>- If we make producer Snapshot optional , We would   not be  handling
> >>the error which @LukeChen mentioned when producerSnapshot
> >>accidentally deleted a given segment. But this holds true for
> >>TransactionalIndex.
> >>- The other question is do we really need to make the field optional.
> >>The only case where this problem can occur is only when the topic
> migrated
> >>from < 2.8 version.
> >>
> >>
>


[jira] [Resolved] (KAFKA-16318) Add javadoc to KafkaMetric

2024-03-21 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16318.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add javadoc to KafkaMetric
> --
>
> Key: KAFKA-16318
> URL: https://issues.apache.org/jira/browse/KAFKA-16318
> Project: Kafka
>  Issue Type: Bug
>  Components: docs
>Reporter: Mickael Maison
>Assignee: Johnny Hsu
>Priority: Major
> Fix For: 3.8.0
>
>
> KafkaMetric is part of the public API but it's missing javadoc describing the 
> class and several of its methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16399) Add JBOD support in tiered storage

2024-03-21 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16399:
-

 Summary: Add JBOD support in tiered storage
 Key: KAFKA-16399
 URL: https://issues.apache.org/jira/browse/KAFKA-16399
 Project: Kafka
  Issue Type: Improvement
Reporter: Luke Chen
Assignee: Luke Chen


Add JBOD support in tiered storage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Different retention semantics for active segment rotation

2024-03-21 Thread Luke Chen
Hi Jorge,

You should check the JIRA: https://issues.apache.org/jira/browse/KAFKA-16385
where we had some discussion.
Welcome to provide your thoughts there.

Thanks.
Luke

On Thu, Mar 21, 2024 at 3:33 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi dev community,
>
> I'd like to share some findings on how rotation of active segment differ
> depending on whether topic retention is time- or size-based.
>
> I was (wrongly) under the assumption that active segments are only rotated
> when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global
> log configs (log.roll.ms) force it  -- regardless of the retention
> configuration.
> This seems to be different depending on how retention is defined:
>
> - If a topic has a retention based on time[1], the condition to rotate the
> active segment is based on the latest timestamp. If the difference with
> current time is largest than retention time, then segment (including
> active) should be deleted. Active segment is rotated, and in next round is
> deleted.
>
> - If a topic has retention based on size[2] though, the condition not only
> depends on the size of the segment itself but first on the total log size,
> forcing to always have at least a single (active) segment: first difference
> between total log size and retention is calculated, let's say a single
> segment of 5MB and retention is 1MB; then total difference is 4MB, then the
> condition to delete validates if the difference of the current segment and
> the total difference is higher than zero, then delete. As the segment size
> will always be higher than the total difference when there is a single
> segment, then there will always be at least 1 segment. In this case the
> only case where active segment is rotated it is when a new message arrives.
>
> Added steps to reproduce[3].
>
> Maybe I missing something obvious, but this seems inconsistent to me.
> Either both retention configs should rotate active segments, or none of
> them should and active segment should be only governed by segment bytes|ms
> configs or log.roll config.
>
> I believe it's a useful feature to "force" active segment rotation without
> changing segment of global log rotation given that features like Compaction
> and Tiered Storage can benefit from this; but would like to clarify this
> behavior and make it consistent for both retention options, and/or call it
> out explicitly in the documentation.
>
> Looking forward to your feedback!
>
> Jorge.
>
> [1]:
>
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566
> [2]:
>
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583
>
> [3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2
>


[jira] [Resolved] (KAFKA-16222) KRaft Migration: Incorrect default user-principal quota after migration

2024-03-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16222.
---
Resolution: Fixed

> KRaft Migration: Incorrect default user-principal quota after migration
> ---
>
> Key: KAFKA-16222
> URL: https://issues.apache.org/jira/browse/KAFKA-16222
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft, migration
>Affects Versions: 3.7.0, 3.6.1
>Reporter: Dominik
>Assignee: PoAn Yang
>Priority: Blocker
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>
> We observed that our default user quota seems not to be migrated correctly.
> Before Migration:
> bin/kafka-configs.sh --describe --all --entity-type users
> Quota configs for the *default user-principal* are 
> consumer_byte_rate=100.0, producer_byte_rate=100.0
> Quota configs for user-principal {color:#172b4d}'myuser{*}@{*}prod'{color} 
> are consumer_byte_rate=1.5E8, producer_byte_rate=1.5E8
> After Migration:
> bin/kafka-configs.sh --describe --all --entity-type users
> Quota configs for *user-principal ''* are consumer_byte_rate=100.0, 
> producer_byte_rate=100.0
> Quota configs for user-principal {color:#172b4d}'myuser{*}%40{*}prod'{color} 
> are consumer_byte_rate=1.5E8, producer_byte_rate=1.5E8
>  
> Additional finding: Our names contains a "@" which also lead to incorrect 
> after migration state.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached

2024-03-19 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16385:
-

 Summary: Segment is rolled before segment.ms or segment.bytes 
breached
 Key: KAFKA-16385
 URL: https://issues.apache.org/jira/browse/KAFKA-16385
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


Steps to reproduce:
1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec .
2. Send a record "aaa" to the topic
3. Wait for 1 second

Will this segment will rolled? I thought no.
But what I have tested is it will roll:

{code:java}
[2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, 
dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. 
(kafka.log.LocalLog)
[2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote 
producer snapshot at offset 1 with 1 producer ids in 1 ms. 
(org.apache.kafka.storage.internals.log.ProducerStateManager)
[2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, 
dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, 
lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to 
log retention time 1000ms breach based on the largest record timestamp in the 
segment (kafka.log.UnifiedLog)
{code}

The segment is rolled due to log retention time 1000ms breached, which is 
unexpected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16342) Fix compressed records

2024-03-16 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16342.
---
Fix Version/s: 3.6.2
   3.8.0
   3.7.1
   Resolution: Fixed

> Fix compressed records
> --
>
> Key: KAFKA-16342
> URL: https://issues.apache.org/jira/browse/KAFKA-16342
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1026: Handling producer snapshot when upgrading from < v2.8.0 for Tiered Storage

2024-03-15 Thread Luke Chen
Hi Arpit,

Thanks for the KIP!

I agree with Greg that we should make it clear about backward Compatibility.
Since you don't have permission to edit the KIP, you could think about it
and write in the email thread directly.

Questions:
1. Could you explain what will happen if one partition created after v2.8,
which should upload the producer snapshot file, but somehow it didn't
upload this file to remote storage (ex: the file is deleted accidentally by
user). Before this KIP, we'll throw exception when uploading the segment
file. But what will happen after this KIP?


Thanks.
Luke

On Fri, Mar 15, 2024 at 3:56 AM Greg Harris 
wrote:

> Hi Arpit,
>
> Thanks for the clarification. Replying here without updating the KIP
> is fine for now.
>
> I disagree with your evaluation of the backwards compatibility. If you
> change the return type of a method, that breaks both source and binary
> compatibility.
> After upgrading, plugin implementations using this method would face
> compilation errors. Implementations that were compiled against the old
> interface will not be able to be loaded when the new interface is
> present.
> I see that the interface is marked Evolving which permits breaking
> compatibility at minor releases, but that doesn't change the
> compatibility of the change itself.
>
> Thanks,
> Greg
>
> On Thu, Mar 14, 2024 at 8:55 AM Arpit Goyal 
> wrote:
> >
> > Hi Greg,
> > I do not have access to update the KIP , Divij is helping me to do it.
> > Meanwhile let me update your queries here.
> >
> > Backward compatibility:
> > These changes will not impact the existing functionality as the existing
> > behaviour always expects producer snapshot files to be present for a
> given
> > segment. Making Producer Snapshot file optional helps to cover both the
> > scenario i.e. both existing  and non existing of the producer snapshot
> file.
> >
> > The getter of producer snapshot file  would also be changed as described
> > below:
> >
> > Current
> >
> > /**
> >  * @return Producer snapshot file until this segment.
> >  */
> > public Path producerSnapshotIndex() {
> > return producerSnapshotIndex;
> > }
> >
> >
> > Proposed
> >
> > /**
> >  * @return Producer snapshot file until this segment.
> >  */
> > public Optional producerSnapshotIndex() {
> > return producerSnapshotIndex;
> > }
> >
> >
> > Thanks and Regards
> > Arpit Goyal
> > 8861094754
> >
> >
> > On Wed, Mar 13, 2024 at 9:25 PM Greg Harris  >
> > wrote:
> >
> > > Hi Arpit,
> > >
> > > Thanks for the KIP!
> > >
> > > I am not familiar with the necessity of producer snapshots, but your
> > > explanation sounds like this should be made optional.
> > >
> > > Can you expand the KIP to include the changes that need to be made to
> > > the constructor and getter, and explain more about backwards
> > > compatibility? From the description I can't tell if this change is
> > > backwards-compatible or not.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Wed, Mar 13, 2024 at 6:48 AM Arpit Goyal 
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I just wanted to bump up this thread.
> > > >
> > > > The KIP introduces a really small change  and it would not take much
> of
> > > the
> > > > time reviewing it.  This change would enable kafka users to use
> tiered
> > > > storage features seamlessly  for the topics migrated  from < 2.8
> version
> > > > which currently failed with NullPointerException.
> > > >
> > > > I am waiting for this KIP to get approved and then start working on
> it.
> > > >
> > > > On Mon, Mar 11, 2024, 14:26 Arpit Goyal 
> > > wrote:
> > > >
> > > > > Hi All,
> > > > > Just a Reminder, KIP-1026  is open for discussion.
> > > > > Thanks and Regards
> > > > > Arpit Goyal
> > > > > 8861094754
> > > > >
> > > > >
> > > > > On Sat, Mar 9, 2024 at 9:27 AM Arpit Goyal <
> goyal.arpit...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi All,
> > > > >>
> > > > >> I have created KIP-1026 for handling producerSnapshot empty
> scenarios
> > > > >> when the topic is upgraded from the kafka  < 2.8 version.
> > > > >>
> > > > >>
> > > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1026%3A+Handling+producer+snapshot+when+upgrading+from+%3C+v2.8.0+for+Tiered+Storage
> > > > >>
> > > > >> Feedback and suggestions are welcome.
> > > > >>
> > > > >> Thanks and Regards
> > > > >> Arpit Goyal
> > > > >> 8861094754
> > > > >>
> > > > >
> > >
>


[jira] [Resolved] (KAFKA-15490) Invalid path provided to the log failure channel upon I/O error when writing broker metadata checkpoint

2024-03-15 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-15490.
---
Fix Version/s: 3.6.2
   Resolution: Fixed

> Invalid path provided to the log failure channel upon I/O error when writing 
> broker metadata checkpoint
> ---
>
> Key: KAFKA-15490
> URL: https://issues.apache.org/jira/browse/KAFKA-15490
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.4.0, 3.4.1, 3.5.1, 3.6.1
>Reporter: Alexandre Dupriez
>Assignee: Divij Vaidya
>Priority: Minor
> Fix For: 3.6.2
>
>
> There is a small bug/typo in the handling of I/O error when writing broker 
> metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir 
> failure channel is the full path of the checkpoint file whereas only the log 
> directory is expected 
> ([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]).
> {code:java}
> case e: IOException =>
>val dirPath = checkpoint.file.getAbsolutePath
>logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing 
> meta.properties to $dirPath", e){code}
> As a result, after an {{IOException}} is captured and enqueued in the log dir 
> failure channel ({{{}{}}} is to be replaced with the actual path of 
> the log directory):
> {code:java}
> [2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to 
> /meta.properties (kafka.server.LogDirFailureChannel) 
> java.io.IOException{code}
> The log dir failure handler cannot lookup the log directory:
> {code:java}
> [2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to 
> (kafka.server.ReplicaManager$LogDirFailureHandler) 
> org.apache.kafka.common.errors.LogDirNotFoundException: Log dir 
> /meta.properties is not found in the config.{code}
> An immediate fix for this is to use the {{logDir}} provided from to the 
> checkpointing method instead of the path of the metadata file.
> For brokers with only one log directory, this bug will result in preventing 
> the broker from shutting down as expected.
> The L{{{}ogDirNotFoundException{}}} then kills the log dir failure handler 
> thread, and subsequent {{IOException}} are not handled, and the broker never 
> stops.
> {code:java}
> [2024-02-27 02:13:13,564] INFO [LogDirFailureHandler]: Stopped 
> (kafka.server.ReplicaManager$LogDirFailureHandler){code}
> Another consideration here is whether the {{LogDirNotFoundException}} should 
> terminate the log dir failure handler thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-03-14 Thread Luke Chen
Hi Christo,

Any update with this KIP?
If you don't have time to complete it, I can collaborate with you to work
on it.

Thanks.
Luke

On Wed, Jan 17, 2024 at 11:38 PM Satish Duggana 
wrote:

> Hi Christo,
> Thanks for volunteering to contribute to the KIP discussion. I suggest
> considering this KIP for both ZK and KRaft as it will be helpful for
> this feature to be available in 3.8.0 running with ZK clusters.
>
> Thanks,
> Satish.
>
> On Wed, 17 Jan 2024 at 19:04, Christo Lolov 
> wrote:
> >
> > Hello!
> >
> > I volunteer to get this KIP moving forward and implemented in Apache
> Kafka
> > 3.8.
> >
> > I have caught up with Mehari offline and we have agreed that given Apache
> > Kafka 4.0 being around the corner we would like to propose this feature
> > only for KRaft clusters.
> >
> > Any and all reviews and comments are welcome!
> >
> > Best,
> > Christo
> >
> > On Tue, 9 Jan 2024 at 09:44, Doğuşcan Namal 
> > wrote:
> >
> > > Hi everyone, any progress on the status of this KIP? Overall looks
> good to
> > > me but I wonder whether we still need to support it for Zookeeper mode
> > > given that it will be deprecated in the next 3 months.
> > >
> > > On 2023/07/21 20:16:46 "Beyene, Mehari" wrote:
> > > > Hi everyone,
> > > > I would like to start a discussion on KIP-950: Tiered Storage
> Disablement
> > > (
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> > > ).
> > > >
> > > > This KIP proposes adding the ability to disable and re-enable tiered
> > > storage on a topic.
> > > >
> > > > Thanks,
> > > > Mehari
> > > >
> > >
>


Re: [VOTE] KIP-956: Tiered Storage Quotas

2024-03-14 Thread Luke Chen
Thanks for the KIP!
+1 from me.

Luke

On Sun, Mar 10, 2024 at 8:44 AM Satish Duggana 
wrote:

> Thanks Abhijeet for the KIP, +1 from me.
>
>
> On Sat, 9 Mar 2024 at 1:51 AM, Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > +1 (non-binding), Thanks for the KIP, Abhijeet!
> >
> > --
> > Kamal
> >
> > On Fri, Mar 8, 2024 at 11:02 PM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the KIP. +1
> > >
> > > Jun
> > >
> > > On Fri, Mar 8, 2024 at 3:44 AM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start the vote for KIP-956 - Tiered Storage Quotas
> > > >
> > > > The KIP is here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > >
> > > > Regards.
> > > > Abhijeet.
> > > >
> > >
> >
>


Re: [DISCUSS] Apache Kafka 3.6.2 release

2024-03-13 Thread Luke Chen
+1, Thanks Manikumar!

On Thu, Mar 14, 2024 at 3:40 AM Bruno Cadonna  wrote:

> Thanks Manikumar!
>
> +1
>
> Best,
> Bruno
>
> On 3/13/24 5:56 PM, Josep Prat wrote:
> > +1 thanks for volunteering!
> >
> > Best
> > ---
> >
> > Josep Prat
> > Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> > +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > On Wed, Mar 13, 2024, 17:17 Divij Vaidya 
> wrote:
> >
> >> +1
> >>
> >> Thank you for volunteering.
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >>
> >> On Wed, Mar 13, 2024 at 4:58 PM Justine Olshan
> >> 
> >> wrote:
> >>
> >>> Thanks Manikumar!
> >>> +1 from me
> >>>
> >>> Justine
> >>>
> >>> On Wed, Mar 13, 2024 at 8:52 AM Manikumar 
> >>> wrote:
> >>>
>  Hi,
> 
>  I'd like to volunteer to be the release manager for a bug fix release
> >> of
>  the 3.6 line.
>  If there are no objections, I'll send out the release plan soon.
> 
>  Thanks,
>  Manikumar
> 
> >>>
> >>
> >
>


Re: [DISCUSS] Minimum constraint for segment.ms

2024-03-13 Thread Luke Chen
Hi Divij,

Thanks for raising this.
The valid minimum value 1 for `segment.ms` is completely unreasonable.
Similarly for `segment.bytes`, `metadata.log.segment.ms`,
`metadata.log.segment.bytes`.

In addition to that, there are also some config default values we'd like to
propose to change in v4.0.
We can collect more comments from the community, and come out with a KIP
for them.

1. num.recovery.threads.per.data.dir:
The current default value is 1. But the log recovery is happening before
brokers are in ready state, which means, we should use all the available
resource to speed up the log recovery to bring the broker to ready state
soon. Default value should be... maybe 4 (to be decided)?

2. Other configs might be able to consider to change the default, but open
for comments:
   2.1. num.replica.fetchers: default is 1, but that's not enough when
there are multiple partitions in the cluster
   2.2. `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`:
Currently, we set 100kb as default value, but that's not enough for
high-speed network.

Thank you.
Luke


On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya 
wrote:

> Hey folks
>
> Before I file a KIP to change this in 4.0, I wanted to understand the
> historical context for the value of the following setting.
>
> Currently, segment.ms minimum threshold is set to 1ms [1].
>
> Segments are expensive. Every segment uses multiple file descriptors and
> it's easy to run out of OS limits when creating a large number of segments.
> Large number of segments also delays log loading on startup because of
> expensive operations such as iterating through all directories &
> conditionally loading all producer state.
>
> I am currently not aware of a reason as to why someone might want to work
> with a segment.ms of less than ~10s (number chosen arbitrary that looks
> sane)
>
> What was the historical context of setting the minimum threshold to 1ms for
> this setting?
>
> [1] https://kafka.apache.org/documentation.html#topicconfigs_segment.ms
>
> --
> Divij Vaidya
>


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-06 Thread Luke Chen
> > > > > > > > > 10. remote.log.manager.write.quota.default:
> > > > > > > > > 10.1 For other configs, we
> > > > > > > > > use replica.alter.log.dirs.io.max.bytes.per.second. To be
> > > > > consistent,
> > > > > > > > > perhaps this can be sth like
> > > > > > > > remote.log.manager.write.max.bytes.per.second.
> > > > > > > > >
> > > > > > > >
> > > > > > > > This makes sense, we can rename the following configs to be
> > > > > consistent.
> > > > > > > >
> > > > > > > > Remote.log.manager.write.quota.default ->
> > > > > > > > remote.log.manager.write.max.bytes.per.second
> > > > > > > >
> > > > > > > > Remote.log.manager.read.quota.default ->
> > > > > > > > remote.log.manager.read.max.bytes.per.second.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > 10.2 Could we list the new metrics associated with the new
> > > quota.
> > > > > > > > >
> > > > > > > >
> > > > > > > > We will add the following metrics as mentioned in the other
> > > > response.
> > > > > > > > *RemoteFetchThrottleTime* - The amount of time needed to
> bring
> > > the
> > > > > > > observed
> > > > > > > > remote fetch rate within the read quota
> > > > > > > > *RemoteCopyThrottleTime *- The amount of time needed to bring
> > the
> > > > > > > observed
> > > > > > > > remote copy rate with the copy quota.
> > > > > > > >
> > > > > > > > 10.3 Is this dynamically configurable? If so, could we
> document
> > > the
> > > > > > > impact
> > > > > > > > > to tools like kafka-configs.sh and AdminClient?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, the quotas are dynamically configurable. We will add
> them
> > as
> > > > > > Dynamic
> > > > > > > > Broker Configs. Users will be able to change
> > > > > > > > the following configs using either kafka-configs.sh or
> > > AdminClient
> > > > by
> > > > > > > > specifying the config name and new value. For eg.
> > > > > > > >
> > > > > > > > Using kafka-configs.sh
> > > > > > > >
> > > > > > > > bin/kafka-configs.sh --bootstrap-server 
> > > > > > --entity-type
> > > > > > > > brokers --entity-default --alter --add-config
> > > > > > > > remote.log.manager.write.max.bytes.per.second=52428800
> > > > > > > >
> > > > > > > > Using AdminClient
> > > > > > > >
> > > > > > > > ConfigEntry configEntry = new
> > > > > > > > ConfigEntry("remote.log.manager.write.max.bytes.per.second",
> > > > > > "5242800");
> > > > > > > > AlterConfigOp alterConfigOp = new AlterConfigOp(configEntry,
> > > > > > > > AlterConfigOp.OpType.SET);
> > > > > > > > List alterConfigOps =
> > > > > > > > Collections.singletonList(alterConfigOp);
> > > > > > > >
> > > > > > > > ConfigResource resource = new
> > > > > > ConfigResource(ConfigResource.Type.BROKER,
> > > > > > > > "");
> > > > > > > > Map> updateConfig =
> > > > > > > > ImmutableMap.of(resource, alterConfigOps);
> > > > > > > > adminClient.incrementalAlterConfigs(updateConfig);
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Tue, Nov 28, 2023 at 2:19 AM Luke Chen <
> show...@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Abhijeet,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP!
> > > > > > > > > > This is an important feature for tiered storage.
> > > > > > > > > >
> > > > > > > > > > Some comments:
> > > > > > > > > > 1. Will we introduce new metrics for this tiered storage
> > > > quotas?
> > > > > > > > > > This is important because the admin can know the
> throttling
> > > > > status
> > > > > > by
> > > > > > > > > > checking the metrics while the remote write/read are
> slow,
> > > like
> > > > > the
> > > > > > > > rate
> > > > > > > > > of
> > > > > > > > > > uploading/reading byte rate, the throttled time for
> > > > > upload/read...
> > > > > > > etc.
> > > > > > > > > >
> > > > > > > > > > 2. Could you give some examples for the throttling
> > algorithm
> > > in
> > > > > the
> > > > > > > KIP
> > > > > > > > > to
> > > > > > > > > > explain it? That will make it much clearer.
> > > > > > > > > >
> > > > > > > > > > 3. To solve this problem, we can break down the RLMTask
> > into
> > > > two
> > > > > > > > smaller
> > > > > > > > > > tasks - one for segment upload and the other for handling
> > > > expired
> > > > > > > > > segments.
> > > > > > > > > > How do we handle the situation when a segment is still
> > > waiting
> > > > > for
> > > > > > > > > > offloading while this segment is expired and eligible to
> be
> > > > > > deleted?
> > > > > > > > > > Maybe it'll be easier to not block the RLMTask when quota
> > > > > exceeded,
> > > > > > > and
> > > > > > > > > > just check it each time the RLMTask runs?
> > > > > > > > > >
> > > > > > > > > > Thank you.
> > > > > > > > > > Luke
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar <
> > > > > > > > > abhijeet.cse@gmail.com
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > I have created KIP-956 for defining read and write
> quota
> > > for
> > > > > > tiered
> > > > > > > > > > > storage.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > > > > > > > > >
> > > > > > > > > > > Feedback and suggestions are welcome.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Abhijeet.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Abhijeet.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Abhijeet.
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Abhijeet.
> > > >
> > >
> >
> >
> > --
> > Abhijeet.
> >
>


[jira] [Resolved] (KAFKA-16209) fetchSnapshot might return null if topic is created before v2.8

2024-03-05 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16209.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Fixed

> fetchSnapshot might return null if topic is created before v2.8
> ---
>
> Key: KAFKA-16209
> URL: https://issues.apache.org/jira/browse/KAFKA-16209
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Luke Chen
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.8.0, 3.7.1
>
>
> Remote log manager will fetch snapshot via ProducerStateManager 
> [here|https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerStateManager.java#L608],
>  but the snapshot map might get nothing if the topic has no snapshot created, 
> ex: topics before v2.8. Need to fix it to avoid NPE.
> old PR: https://github.com/apache/kafka/pull/14615/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16341) Fix un-compressed records

2024-03-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16341:
-

 Summary: Fix un-compressed records
 Key: KAFKA-16341
 URL: https://issues.apache.org/jira/browse/KAFKA-16341
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16342) Fix compressed records

2024-03-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16342:
-

 Summary: Fix compressed records
 Key: KAFKA-16342
 URL: https://issues.apache.org/jira/browse/KAFKA-16342
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16071) NPE in testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2024-03-01 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16071.
---
Resolution: Fixed

> NPE in testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> 
>
> Key: KAFKA-16071
> URL: https://issues.apache.org/jira/browse/KAFKA-16071
> Project: Kafka
>  Issue Type: Test
>    Reporter: Luke Chen
>Priority: Major
>  Labels: newbie, newbie++
>
> Found in the CI build result.
>  
> h3. Error Message
> java.lang.NullPointerException
> h3. Stacktrace
> java.lang.NullPointerException at 
> org.apache.kafka.tools.TopicCommandIntegrationTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandIntegrationTest.java:800)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
>  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>  
>  
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15095/1/testReport/junit/org.apache.kafka.tools/TopicCommandIntegrationTest/Build___JDK_8_and_Scala_2_12___testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress_String__zk/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1021: Allow to get last stable offset (LSO) in kafka-get-offsets.sh

2024-02-28 Thread Luke Chen
Hi Ahmed,

Thanks for the KIP!

Comments:
1. If we all agree with the suggestion from Andrew, you could update the
KIP.

Otherwise, LGTM!


Thanks.
Luke

On Thu, Feb 29, 2024 at 1:32 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi Ahmed,
> Could do. Personally, I find the existing “--time -1” totally horrid
> anyway, which was why
> I suggested an alternative. I think your suggestion of a flag for
> isolation level is much
> better than -6.
>
> Maybe I should put in a KIP which adds:
> --latest (as a synonym for --time -1)
> --earliest (as a synonym for --time -2)
> --max-timestamp (as a synonym for --time -3)
>
> That’s really what I would prefer. If the user has a timestamp, use
> `--time`. If they want a
> specific special offset, use a separate flag.
>
> Thanks,
> Andrew
>
> > On 28 Feb 2024, at 09:22, Ahmed Sobeh 
> wrote:
> >
> > Hi Andrew,
> >
> > Thanks for the hint! That sounds reasonable, do you think adding a
> > conditional argument, something like "--time -1 --isolation -committed"
> and
> > "--time -1 --isolation -uncommitted" would make sense to keep the
> > consistency of getting the offset by time? or do you think having a
> special
> > argument for this case is better?
> >
> > On Tue, Feb 27, 2024 at 2:19 PM Andrew Schofield <
> > andrew_schofield_j...@outlook.com> wrote:
> >
> >> Hi Ahmed,
> >> Thanks for the KIP.  It looks like a useful addition.
> >>
> >> The use of negative timestamps, and in particular letting the user use
> >> `--time -1` or the equivalent `--time latest`
> >> is a bit peculiar in this tool already. The negative timestamps come
> from
> >> org.apache.kafka.common.requests.ListOffsetsRequest,
> >> but you’re not actually adding another value to that. As a result, I
> >> really wouldn’t recommend using -6 for the new
> >> flag. LSO is really a variant of -1 with read_committed isolation level.
> >>
> >> I think that perhaps it would be better to add `--last-stable` as an
> >> alternative to `--time`. Then you’ll get the LSO with
> >> cleaner syntax.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>> On 27 Feb 2024, at 10:12, Ahmed Sobeh 
> >> wrote:
> >>>
> >>> Hi all,
> >>> I would like to start a discussion on KIP-1021, which would enable
> >> getting
> >>> LSO in kafka-get-offsets.sh:
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1021%3A+Allow+to+get+last+stable+offset+%28LSO%29+in+kafka-get-offsets.sh
> >>>
> >>> Best,
> >>> Ahmed
> >>
> >>
> >
> > --
> > [image: Aiven] 
> > *Ahmed Sobeh*
> > Engineering Manager OSPO, *Aiven*
> > ahmed.so...@aiven.io 
> > aiven.io    |   <
> https://www.facebook.com/aivencloud>
> >     <
> https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Immanuelkirchstraße 26, 10405 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
>
>
>


Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

2024-02-28 Thread Luke Chen
> 2. After "RemoveVoter", what is the role of the node?
> It looks like after the voter got removed from the voter set, it is not a
> voter anymore. But I think it can still fetch with the leader. So it
should
> be an observer, with a "process.role=controller"? And if the node was
> originally "process.role=controller,broker", it'll become a broker-only
> node?

> Kafka nodes need to allow for controllers that are not voters. I don't
expect too many issues from an implementation point of view. Most of
it may just be aggressive validation in KafkaConfig. I think the
easier way to explain this state is that there will be controllers
that will never become active controllers. If we want, we can have a
monitor that turns on (1) if a node is in this state. What do you
think?

I agree we have a way for users to monitor the node state, like when does
the controller completed the voter removal ( so that it is safe to be
shutdown), or when does the controller completed the voter addition (so
that users can start to add another controller), etc.

10. controller.quorum.voters:
This is an existing configuration. This configuration describes the state
of the quorum and will only be used if the kraft.version feature is 0.
> From the discussion, it looks like even if the kraft.version is 1, we
still first check the `controller.quorum.voters` if
`controller.quorum.bootstrap.servers` is not set. Is that correct? If so,
maybe we need to update the description?

11. When a controller starts up before joining as a voter, it'll be an
observer. In this case, will it be shown in the observer field of
`kafka-metadata-quorum describe --status`? Same question to a controller
after getting removed.

12. What will happen if there's only 1 voter and user still tries to remove
the voter? Any error returned?

Thanks.
Luke



On Thu, Jan 25, 2024 at 7:50 AM José Armando García Sancio
 wrote:

> Thanks for the feedback Luke. See my comments below:
>
> On Wed, Jan 24, 2024 at 4:20 AM Luke Chen  wrote:
> > 1. About "VotersRecord":
> >
> > > When a KRaft voter becomes leader it will write a KRaftVersionRecord
> and
> > VotersRecord to the log if the log or the latest snapshot doesn't contain
> > any VotersRecord. This is done to make sure that the voter set in the
> > bootstrap snapshot gets replicated to all of the voters and to not rely
> on
> > all of the voters being configured with the same bootstrapped voter set.
> >
> > > This record will also be written to the log if it has never been
> written
> > to the log in the past. This semantic is nice to have to consistently
> > replicate the bootstrapping snapshot, at
> > -00.checkpoint, of the leader to all of the
> > voters.
> >
> > If the `VotersRecord` has written into
> > -00.checkpoint,
> > later, a new voter added. Will we write a new checkpoint to the file?
> > If so, does that mean the `metadata.log.max.snapshot.interval.ms` will
> be
> > ignored?
>
> KRaft (KafkaRaftClient) won't initiate the snapshot generation. The
> snapshot generation will be initiated by the state machine (controller
> or broker) using the RaftClient::createSnapshot method. When the state
> machine calls into RaftClient::createSnapshot the KafkaRaftClient will
> compute the set of voters at the provided offset and epoch, and write
> the VotersRecord after the SnapshotHeaderRecord. This does mean that
> the KafkaRaftClient needs to store in memory all of the voter set
> configurations between the RaftClient::latestSnapshotId and the LEO
> for the KRaft partition.
>
> > If not, then how could we make sure the voter set in the bootstrap
> snapshot
> > gets replicated to all of the voters and to not rely on all of the voters
> > being configured with the same bootstrapped voter set?
>
> I think my answer above should answer your question. VoterRecord-s
> will be in the log (log segments) and the snapshots so they will be
> replicated by Fetch and FetchSnapshot. When the voter set is changed
> or bootstrapped, the leader will write the VotersRecord to the log
> (active log segment). When the state machine (controller or broker)
> asks to create a snapshot, KRaft will write the VotersRecord at the
> start to the snapshot after the SnapshotHeaderRecord.
>
> > 2. After "RemoveVoter", what is the role of the node?
> > It looks like after the voter got removed from the voter set, it is not a
> > voter anymore. But I think it can still fetch with the leader. So it
> should
> > be an observer, with a "process.role=controller"? And if the node was
> > originally "process.role=controller,broker", it'll become a broker-onl

Re: [ANNOUNCE] Apache Kafka 3.7.0

2024-02-27 Thread Luke Chen
Thanks for running the release, Stanislav!

Luke

On Wed, Feb 28, 2024 at 4:06 AM Mickael Maison 
wrote:

> Thanks to all the contributors and thank you Stanislav for running the
> release!
>
>
> On Tue, Feb 27, 2024 at 7:03 PM Stanislav Kozlovski
>  wrote:
> >
> > The Apache Kafka community is pleased to announce the release of
> > Apache Kafka 3.7.0
> >
> > This is a minor release that includes new features, fixes, and
> > improvements from 296 JIRAs
> >
> > An overview of the release and its notable changes can be found in the
> > release blog post:
> > https://kafka.apache.org/blog#apache_kafka_370_release_announcement
> >
> > All of the changes in this release can be found in the release notes:
> > https://www.apache.org/dist/kafka/3.7.0/RELEASE_NOTES.html
> >
> > You can download the source and binary release (Scala 2.12, 2.13) from:
> > https://kafka.apache.org/downloads#3.7.0
> >
> >
> ---
> >
> >
> > Apache Kafka is a distributed streaming platform with four core APIs:
> >
> >
> > ** The Producer API allows an application to publish a stream of records
> to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output stream to one or more output topics, effectively transforming the
> > input streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might
> > capture every change to a table.
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react
> > to the streams of data.
> >
> >
> > Apache Kafka is in use at large and small companies worldwide, including
> > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> >
> > A big thank you to the following 146 contributors to this release!
> > (Please report an unintended omission)
> >
> > Abhijeet Kumar, Akhilesh Chaganti, Alieh, Alieh Saeedi, Almog Gavra,
> > Alok Thatikunta, Alyssa Huang, Aman Singh, Andras Katona, Andrew
> > Schofield, Anna Sophie Blee-Goldman, Anton Agestam, Apoorv Mittal,
> > Arnout Engelen, Arpit Goyal, Artem Livshits, Ashwin Pankaj,
> > ashwinpankaj, atu-sharm, bachmanity1, Bob Barrett, Bruno Cadonna,
> > Calvin Liu, Cerchie, chern, Chris Egerton, Christo Lolov, Colin
> > Patrick McCabe, Colt McNealy, Crispin Bernier, David Arthur, David
> > Jacot, David Mao, Deqi Hu, Dimitar Dimitrov, Divij Vaidya, Dongnuo
> > Lyu, Eaugene Thomas, Eduwer Camacaro, Eike Thaden, Federico Valeri,
> > Florin Akermann, Gantigmaa Selenge, Gaurav Narula, gongzhongqiang,
> > Greg Harris, Guozhang Wang, Gyeongwon, Do, Hailey Ni, Hanyu Zheng, Hao
> > Li, Hector Geraldino, hudeqi, Ian McDonald, Iblis Lin, Igor Soarez,
> > iit2009060, Ismael Juma, Jakub Scholz, James Cheng, Jason Gustafson,
> > Jay Wang, Jeff Kim, Jim Galasyn, John Roesler, Jorge Esteban Quilcate
> > Otoya, Josep Prat, José Armando García Sancio, Jotaniya Jeel, Jouni
> > Tenhunen, Jun Rao, Justine Olshan, Kamal Chandraprakash, Kirk True,
> > kpatelatwork, kumarpritam863, Laglangyue, Levani Kokhreidze, Lianet
> > Magrans, Liu Zeyu, Lucas Brutschy, Lucia Cerchie, Luke Chen, maniekes,
> > Manikumar Reddy, mannoopj, Maros Orsak, Matthew de Detrich, Matthias
> > J. Sax, Max Riedel, Mayank Shekhar Narula, Mehari Beyene, Michael
> > Westerby, Mickael Maison, Nick Telford, Nikhil Ramakrishnan, Nikolay,
> > Okada Haruki, olalamichelle, Omnia G.H Ibrahim, Owen Leung, Paolo
> > Patierno, Philip Nee, Phuc-Hong-Tran, Proven Provenzano, Purshotam
> > Chauhan, Qichao Chu, Matthias J. Sax, Rajini Sivaram, Renaldo Baur
> > Filho, Ritika Reddy, Robert Wagner, Rohan, Ron Dagostino, Roon, runom,
> > Ruslan Krivoshein, rykovsi, Sagar Rao, Said Boudjelda, Satish Duggana,
> > shuoer86, Stanislav Kozlovski, Taher Ghaleb, Tang Yunzi, TapDang,
> >

Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka

2024-02-26 Thread Luke Chen
Hi Assane,

I also share the same concern as Greg has, which is that the KIP is not
kafka ecosystem friendly.
And this will make the kafka client and broker have high dependencies that
once you use the pluggable compression interface, the producer must be java
client.
This seems to go against the original Kafka's design.

If the proposal can support all kinds of clients, that would be great.

Thanks.
Luke

On Tue, Feb 27, 2024 at 7:44 AM Diop, Assane  wrote:

> Hi Greg,
>
> Thanks for taking the time to give some feedback. It was very insightful.
>
> I have some answers:
>
> 1. The current proposal is Java centric. We want to figure out with Java
> first and then later incorporate other languages. We will get there.
>
> 2. The question of where the plugins would live is an important one. I
> would like to get the community engagement on where a plugin would live.
>Officially supported plugins could be part of Kafka and others could
> live in a plugin repository. Is there currently a way to store plugins in
> Kafka and load them into the classpath? If such a space could be allowed
> then it would provide an standard way of installing officially supported
> plugins.
>In OpenSearch for example, there is a plugin utility that takes the jar
> and installs it across the cluster, privileges can be granted by an admin.
> Such utility could be implemented in Kafka.
>
> 3. There is many way to look at this, we could change the message format
> that use the pluggable interface to be for example v3 and synchronize
> against that.
>In order to use the pluggable codec, you will have to be at message
> version 3 for example.
>
> 4. Passing the class name as metadata is one way to have the producer talk
> to the broker about which plugin to use. However there could be other
> implementation
>where you could set every thing to know about the topic using topic
> level compression. In this case for example a rule could be that in order
> to use the
>pluggable interface, you should use topic level compression.
>
>  I would like to have your valuable inputs on this!!
>
> Thanks before end,
> Assane
>
> -Original Message-
> From: Greg Harris 
> Sent: Wednesday, February 14, 2024 2:36 PM
> To: dev@kafka.apache.org
> Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka
>
> Hi Assane,
>
> Thanks for the KIP!
> Looking back, it appears that the project has only ever added compression
> types twice: lz4 in 2014 and zstd in 2018, and perhaps Kafka has fallen
> behind the state-of-the-art compression algorithms.
> Thanks for working to fix that!
>
> I do have some concerns:
>
> 1. I think this is a very "java centric" proposal, and doesn't take
> non-java clients into enough consideration. librdkafka [1] is a great
> example of an implementation of the Kafka protocol which doesn't have the
> same classloading and plugin infrastructure that Java has, which would make
> implementing this feature much more difficult.
>
> 2. By making the interface pluggable, it puts the burden of maintaining
> individual compression codecs onto external developers, which may not be
> willing to maintain a codec for the service-lifetime of such a codec.
> An individual developer can easily implement a plugin to allow them to use
> a cutting-edge compression algorithm without consulting the Kafka project,
> but as soon as data is compressed using that algorithm, they are on the
> hook to support that plugin going forward by the organizations using their
> implementation.
> Part of the collective benefits of the Kafka project is to ensure the
> ongoing maintenance of such codecs, and provide a long deprecation window
> when a codec reaches EOL. I think the Kafka project is well-equipped to
> evaluate the maturity and properties of compression codecs and then
> maintain them going forward.
>
> 3. Also by making the interface pluggable, it reduces the scope of
> individual compression codecs. No longer is there a single lineage of Kafka
> protocols, where vN+1 of a protocol supports a codec that vN does not. Now
> there will be "flavors" of the protocol, and operators will need to ensure
> that their servers and their clients support the same "flavors" or else
> encounter errors.
> This is the sort of protocol forking which can be dangerous to the Kafka
> community going forward. If there is a single lineage of codecs such that
> the upstream Kafka vX.Y supports codec Z, it is much simpler for other
> implementations to check and specify "Kafka vX.Y compatible", than it is to
> check & specify "Kafka vX.Y & Z compatible".
>
> 4. The Java class namespace is distributed, as anyone can name their class
> anything. It achieves this by being very verbose, with long fully-qualified
> names for classes. This is in conflict with a binary protocol, where it is
> desirable for the overhead to be as small as possible.
> This may incentivise developers to keep their class names short, which
> also makes conflict more 

  1   2   3   4   5   6   7   8   9   10   >