[jira] [Created] (KAFKA-14719) Talk about a problem

2023-02-14 Thread shenxingwuying (Jira)
shenxingwuying created KAFKA-14719:
--

 Summary: Talk about a problem
 Key: KAFKA-14719
 URL: https://issues.apache.org/jira/browse/KAFKA-14719
 Project: Kafka
  Issue Type: Task
Reporter: shenxingwuying


I'm not sure whether it is a problem.

 

I write record to kafka when I use cppkafka lib( 
[https://github.com/mfontanini/cppkafka.git] )

which is wrapped by librdkafka( 
[https://github.com/confluentinc/librdkafka.git] ).

 

I start a fuzzy test, which keep restart kafka cluster and keeping continuously 
'write requests'.

 

I find a return code may be not good, but I'm not sure about that.

I think providing some codes may help, someone can help me review it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1586

2023-02-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 444611 lines...]
[2023-02-15T00:14:46.297Z] > Task :storage:api:compileTestJava UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :storage:api:testClasses UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :connect:json:compileTestJava UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :connect:json:testClasses UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :raft:compileTestJava UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :raft:testClasses UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :connect:json:testJar
[2023-02-15T00:14:46.297Z] > Task :group-coordinator:compileTestJava UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :group-coordinator:testClasses UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :connect:json:testSrcJar
[2023-02-15T00:14:46.297Z] > Task :metadata:compileTestJava UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task :metadata:testClasses UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task 
:clients:generateMetadataFileForMavenJavaPublication
[2023-02-15T00:14:46.297Z] > Task :streams:copyDependantLibs
[2023-02-15T00:14:46.297Z] > Task :streams:jar UP-TO-DATE
[2023-02-15T00:14:46.297Z] > Task 
:streams:generateMetadataFileForMavenJavaPublication
[2023-02-15T00:14:49.526Z] 
[2023-02-15T00:14:49.526Z] > Task :connect:api:javadoc
[2023-02-15T00:14:49.526Z] 
/home/jenkins/workspace/Kafka_kafka_trunk/connect/api/src/main/java/org/apache/kafka/connect/source/SourceRecord.java:44:
 warning - Tag @link: reference not found: org.apache.kafka.connect.data
[2023-02-15T00:14:50.548Z] 1 warning
[2023-02-15T00:14:51.567Z] 
[2023-02-15T00:14:51.567Z] > Task :connect:api:copyDependantLibs UP-TO-DATE
[2023-02-15T00:14:51.567Z] > Task :connect:api:jar UP-TO-DATE
[2023-02-15T00:14:51.567Z] > Task 
:connect:api:generateMetadataFileForMavenJavaPublication
[2023-02-15T00:14:51.567Z] > Task :connect:json:copyDependantLibs UP-TO-DATE
[2023-02-15T00:14:51.567Z] > Task :connect:json:jar UP-TO-DATE
[2023-02-15T00:14:51.567Z] > Task 
:connect:json:generateMetadataFileForMavenJavaPublication
[2023-02-15T00:14:51.567Z] > Task :connect:api:javadocJar
[2023-02-15T00:14:51.567Z] > Task 
:connect:json:publishMavenJavaPublicationToMavenLocal
[2023-02-15T00:14:51.567Z] > Task :connect:json:publishToMavenLocal
[2023-02-15T00:14:51.567Z] > Task :connect:api:compileTestJava UP-TO-DATE
[2023-02-15T00:14:51.567Z] > Task :connect:api:testClasses UP-TO-DATE
[2023-02-15T00:14:51.567Z] > Task :connect:api:testJar
[2023-02-15T00:14:51.567Z] > Task :connect:api:testSrcJar
[2023-02-15T00:14:51.567Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2023-02-15T00:14:51.567Z] > Task :connect:api:publishToMavenLocal
[2023-02-15T00:14:55.710Z] > Task :streams:javadoc
[2023-02-15T00:14:56.734Z] > Task :streams:javadocJar
[2023-02-15T00:14:57.754Z] 
[2023-02-15T00:14:57.754Z] > Task :clients:javadoc
[2023-02-15T00:14:57.754Z] 
/home/jenkins/workspace/Kafka_kafka_trunk/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/secured/package-info.java:21:
 warning - Tag @link: reference not found: 
org.apache.kafka.common.security.oauthbearer
[2023-02-15T00:14:58.899Z] 1 warning
[2023-02-15T00:14:59.797Z] 
[2023-02-15T00:14:59.797Z] > Task :clients:javadocJar
[2023-02-15T00:14:59.797Z] > Task :clients:testJar
[2023-02-15T00:15:00.817Z] > Task :clients:testSrcJar
[2023-02-15T00:15:00.817Z] > Task 
:clients:publishMavenJavaPublicationToMavenLocal
[2023-02-15T00:15:00.817Z] > Task :clients:publishToMavenLocal
[2023-02-15T00:15:17.189Z] > Task :core:compileScala
[2023-02-15T00:17:12.621Z] > Task :core:classes
[2023-02-15T00:17:12.621Z] > Task :core:compileTestJava NO-SOURCE
[2023-02-15T00:17:29.570Z] > Task :core:compileTestScala
[2023-02-15T00:19:06.943Z] > Task :core:testClasses
[2023-02-15T00:19:21.102Z] > Task :streams:compileTestJava
[2023-02-15T00:19:21.102Z] > Task :streams:testClasses
[2023-02-15T00:19:21.102Z] > Task :streams:testJar
[2023-02-15T00:19:22.120Z] > Task :streams:testSrcJar
[2023-02-15T00:19:22.120Z] > Task 
:streams:publishMavenJavaPublicationToMavenLocal
[2023-02-15T00:19:22.120Z] > Task :streams:publishToMavenLocal
[2023-02-15T00:19:22.120Z] 
[2023-02-15T00:19:22.120Z] Deprecated Gradle features were used in this build, 
making it incompatible with Gradle 8.0.
[2023-02-15T00:19:22.120Z] 
[2023-02-15T00:19:22.120Z] You can use '--warning-mode all' to show the 
individual deprecation warnings and determine if they come from your own 
scripts or plugins.
[2023-02-15T00:19:22.120Z] 
[2023-02-15T00:19:22.120Z] See 
https://docs.gradle.org/7.6/userguide/command_line_interface.html#sec:command_line_warnings
[2023-02-15T00:19:22.120Z] 
[2023-02-15T00:19:22.120Z] Execution optimizations have been disabled for 2 
invalid unit(s) of work during this build to ensure correctness.
[2023-02-15T00:19:22.120Z] Please consult deprecation warnings for more details.
[2023-02-15T00:19:22.120Z] 
[2023-02-15T0

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.4 #82

2023-02-14 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Zookeeper deprecation

2023-02-14 Thread Divij Vaidya
Thank you for clarifying Ismael.


On Tue 14. Feb 2023 at 20:07, Ismael Juma  wrote:

> Hi Divij,
>
> I think we'll make that call a bit later depending on the progress we make
> on things like zk to kraft migration. The deprecation notice is meant to
> indicate that we recommend customers use kraft mode for new clusters and
> that zk mode will be going away in 4.0. It doesn't change the support we
> provide for the 3.x series.
>
> Ismael
>
> On Tue, Feb 14, 2023 at 9:10 AM Divij Vaidya 
> wrote:
>
> > Hi folks
> >
> > As we begin to prepare a release plan for 3.5, I was wondering if we plan
> > to deprecate Zookeeper support with this release? I am asking this
> because
> > some prior KIPs ([1] and [2]) mentioned 3.5 as the targeted version when
> we
> > would mark Zookeeper support as deprecated.
> >
> > Personally, I would prefer that we update the Zk version with this
> release
> > [3] and deprecate support for Zk in the future versions. Upgrading the Zk
> > version and deprecating future support in the same Apache Kafka version
> may
> > be confusing for the users.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready
> >
> > [3] https://issues.apache.org/jira/browse/KAFKA-14661
> >
> > Regards,
> > Divij Vaidya
> >
>
-- 
Divij Vaidya


Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-02-14 Thread Ismael Juma
Thanks!

Ismael

On Tue, Feb 14, 2023 at 1:07 PM Mickael Maison 
wrote:

> Hi Ismael,
>
> Good call. I shifted all dates by 2 weeks and moved them to Wednesdays.
>
> Thanks,
> Mickael
>
> On Tue, Feb 14, 2023 at 6:01 PM Ismael Juma  wrote:
> >
> > Thanks Mickael. A couple of notes:
> >
> > 1. We typically choose a Wednesday for the various freeze dates - there
> are
> > often 1-2 day slips and it's better if that doesn't require people
> > working through the weekend.
> > 2. Looks like we're over a month later compared to the equivalent release
> > last year (
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.0). I
> > understand that some of it is due to 3.4.0 slipping, but I wonder if we
> > could perhaps aim for the KIP freeze to be one or two weeks earlier.
> >
> > Ismael
> >
> > On Tue, Feb 14, 2023 at 8:00 AM Mickael Maison  >
> > wrote:
> >
> > > Hi,
> > >
> > > I've created a release plan for 3.5.0 in the wiki:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0
> > >
> > > Current dates are:
> > > 1) KIP Freeze: 07 Apr 2023
> > > 2) Feature Freeze: 27 Apr 2023
> > > 3) Code Freeze: 11 May 2023
> > >
> > > Please take a look at the plan. Let me know if there are other KIPs
> > > targeting 3.5.0.
> > > Also if you are the author of one of the KIPs that's missing a status
> > > (or the status is incorrect) please update it and let me know.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Thu, Feb 9, 2023 at 9:23 AM Bruno Cadonna 
> wrote:
> > > >
> > > > Thanks, Mickael!
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > > On 09.02.23 03:15, Luke Chen wrote:
> > > > > Hi Mickael,
> > > > > Thanks for volunteering!
> > > > >
> > > > > Luke
> > > > >
> > > > > On Thu, Feb 9, 2023 at 6:23 AM Chris Egerton <
> fearthecel...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Thanks for volunteering, Mickael!
> > > > >>
> > > > >> On Wed, Feb 8, 2023 at 1:12 PM José Armando García Sancio
> > > > >>  wrote:
> > > > >>
> > > > >>> Thanks for volunteering Mickael.
> > > > >>>
> > > > >>> --
> > > > >>> -José
> > > > >>>
> > > > >>
> > > > >
> > >
>


[jira] [Resolved] (KAFKA-12893) MM2 fails to replicate if starting two+ nodes same time

2023-02-14 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-12893.
---
Resolution: Duplicate

> MM2 fails to replicate if starting two+ nodes same time
> ---
>
> Key: KAFKA-12893
> URL: https://issues.apache.org/jira/browse/KAFKA-12893
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.8.0
>Reporter: Tommi Vainikainen
>Priority: Major
>
> I've observed a situation where starting more than one MM2 node in parallel, 
> MM2 fails to start replication ie. replication flow seems to be stuck without 
> action. I used exactly same mm2.properties file to start only one at a time, 
> and the replication flow was proceeding smoothly.
> In my setup dc1 has topic "mytopic1" and there is a producer with approx 1 
> msg/sec, and I'm trying to repilcate this to dc2. What I observed is that 
> dc1.mytopic1 is created when initially launching two paraller MM2 instances, 
> but no messages gets written into the topic as I would expect. If I kill MM2 
> instances, and only start one MM2 node, then MM2 starts replicating the 
> messages in mytopic1.
> My mm2.properties:
> clusters=dc2, dc1
> dc1->dc2.emit.heartbeats.enabled=true
> dc1->dc2.enabled=true
> dc1->dc2.sync.group.offsets.enabled=false
> dc1->dc2.sync.group.offsets.interval.seconds=45
> dc1->dc2.topics=mytopic1
> dc1->dc2.topics.exclude=
> dc1.bootstrap.servers=tvainika-dc1-dev-sandbox.aivencloud.com:12693
> dc1.security.protocol=SSL
> dc1.ssl.keystore.type=PKCS12
> dc1.ssl.keystore.location=dc1/client.keystore.p12
> dc1.ssl.keystore.password=secret
> dc1.ssl.key.password=secret
> dc1.ssl.truststore.location=dc1/client.truststore.jks
> dc1.ssl.truststore.password=secret
> dc2.bootstrap.servers=tvainika-dc2-dev-sandbox.aivencloud.com:12693
> dc2.security.protocol=SSL
> dc2.ssl.keystore.type=PKCS12
> dc2.ssl.keystore.location=dc2/client.keystore.p12
> dc2.ssl.keystore.password=secret
> dc2.ssl.key.password=secret
> dc2.ssl.truststore.location=dc2/client.truststore.jks
> dc2.ssl.truststore.password=secret
> tasks.max=3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-12150) Consumer group refresh not working with clustered MM2 setup

2023-02-14 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-12150.
---
Resolution: Duplicate

> Consumer group refresh not working with clustered MM2 setup
> ---
>
> Key: KAFKA-12150
> URL: https://issues.apache.org/jira/browse/KAFKA-12150
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.7.0
>Reporter: Ara Zarifian
>Priority: Major
>
> I'm running MM2 with Kafka 2.7 with the following configuration:
> {code}
> clusters = eastus2, westus
> eastus2.bootstrap.servers = clusrter1.example.com:9092
> westus.bootstrap.servers = cluster2.example.com:9092
> eastus2->westus.enabled = true
> eastus2->westus.topics = .*
> westus->eastus2.enabled = true
> westus->eastus2.topics = .*
> refresh.topics.enabled = true
> refresh.topics.interval.seconds = 5
> refresh.groups.enabled = true
> refresh.groups.interval.seconds = 5
> sync.topic.configs.enabled = true
> sync.topic.configs.interval.seconds = 5 
> sync.topic.acls.enabled = false
> sync.topic.acls.interval.seconds = 5
> sync.group.offsets.enabled = true
> sync.group.offsets.interval.seconds = 5
> emit.checkpoints.enabled = true
> emit.checkpoints.interval.seconds = 5
> emit.heartbeats.enabled = true
> emit.heartbeats.interval.seconds = 5
> replication.factor = 3
> checkpoints.topic.replication.factor = 3
> heartbeats.topic.replication.factor = 3
> offset-syncs.topic.replication.factor = 3
> offset.storage.replication.factor = 3
> status.storage.replication.factor = 3
> config.storage.replication.factor = 3
> {code}
> More specifically, I'm running multiple instances of MM2 with the above 
> configuration within Kubernetes pods. I was testing the new automatic 
> consumer group offset translation functionality and noticed what appears to 
> be a problem when running more than 1 instance of MM2 in this fashion. 
> Based on [the 
> KEP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0],
>  I should be able to run multiple instances in this manner (see "Running a 
> dedicated MirrorMaker cluster"), however, I noticed that when enabling 
> replication using a 3-instance MM2 cluster, consumer groups were not 
> synchronizing across clusters at all.
> When running through my test case with a single MM2 instance, consumer group 
> synchronization appears to work as expected consistently. When running 
> through my 3-node test case, synchronization begins as soon as I scale the 
> number of replicas to 1.
> Am I misinterpreting the manner in which the KEP describes MM2 clusters or is 
> this interaction an unexpected one?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-10857) Mirror Maker 2 - replication not working when deploying multiple instances

2023-02-14 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-10857.
---
Resolution: Duplicate

> Mirror Maker 2 - replication not working when deploying multiple instances
> --
>
> Key: KAFKA-10857
> URL: https://issues.apache.org/jira/browse/KAFKA-10857
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Athanasios Fanos
>Priority: Major
>
> We believe we are experiencing a bug when deploying Mirror Maker 2 in 
> distributed mode in our environments. Replication does not work consistently 
> after initial deployment and does not start working even after some time 
> (24h+).
> *Environment & replication set-up*
>  * 2 regions with a separate Kafka cluster (let's call them Region A and 
> Region B)
>  * 3 instances of Mirror maker are deployed at the same time in Region B with 
> the same configuration
>  * Replication is set up to be bi-directional (regionA->regionB & 
> regionB->regionA)
> *Container Version*
> Observed with both {{confluentinc/cp-kafka:5.5.1}} & 
> {{confluentinc/cp-kafka:6.0.1}}
> *Mirror maker 2 configuration*
> {code:java}
> clusters=regionA,regionB
> regionA.bootstrap.servers=regionA-kafka:9092
> regionB.bootstrap.servers=regionB-kafka:9092
> regionA->regionB.enabled=true
> regionA->regionB.topics=testTopic
> regionB->regionA.enabled=true
> regionB->regionA.topics=testTopic
> sync.topic.acls.enabled=false
> tasks.max=9
> {code}
> *Observed behavior*
>  * After deploying the 3 Mirror Maker instances (at the same time), 
> replication for 1 or both mirrors does not work
>  ** If we scale down to a single instance of mirror maker and wait for about 
> 5 minutes (refresh.topics.interval.seconds?) replication starts working. 
> After this scaling up to 3 correctly distributes the load between the 
> deployed instances
> *Expected behavior*
>  * Replication should work for all configured mirrors when running in 
> distributed mode
>  * When starting multiple instances of Mirror Maker at the same time 
> replication should work, 1 by 1 rollout should not be required
> *Additional details*
>  * When replication is not working, we observe that in the internal config 
> topics from Mirror Maker the partitions are not assigned to the tasks, eg 
> {{task.assigned.partitions}} are not set at all under the properties object.
> *Workaround*
>  * As a workaround, we start Mirror Maker instances 1 by 1 with some delay 
> between each instance. This allows for the first instance to set-up the 
> configuration in the internal topics correctly. Doing this seems to ensure 
> that replication works as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-10586) Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2023-02-14 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-10586.
---
Resolution: Done

> Full support for distributed mode in dedicated MirrorMaker 2.0 clusters
> ---
>
> Key: KAFKA-10586
> URL: https://issues.apache.org/jira/browse/KAFKA-10586
> Project: Kafka
>  Issue Type: Task
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Chris Egerton
>Priority: Major
>  Labels: cloudera
>
> KIP-382 introduced MirrorMaker 2.0. Because of scoping issues, the dedicated 
> MirrorMaker 2.0 cluster does not utilize the Connect REST API. This means 
> that with specific workloads, the dedicated MM2 cluster can become unable to 
> react to dynamic topic and group filter changes.
> (This occurs when after a rebalance operation, the leader node has no 
> MirrorSourceConnectorTasks. Because of this, the MirrorSourceConnector is 
> stopped on the leader, meaning it cannot detect config changes by itself. 
> Followers still running the connector can detect config changes, but they 
> cannot query the leader for config updates.)
> Besides the REST support, config provider references should be evaluated 
> lazily in connector configurations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14718) Flaky DedicatedMirrorIntegrationTest test suite

2023-02-14 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-14718:
-

 Summary: Flaky DedicatedMirrorIntegrationTest test suite
 Key: KAFKA-14718
 URL: https://issues.apache.org/jira/browse/KAFKA-14718
 Project: Kafka
  Issue Type: Test
  Components: mirrormaker
Reporter: Chris Egerton


These tests were added recently in [https://github.com/apache/kafka/pull/13137] 
and have been failing occasionally on Jenkins. For example, in 
[https://github.com/apache/kafka/pull/13163|https://github.com/apache/kafka/pull/13163:],
 both test cases (testSingleNodeCluster and testMultiNodeCluster) failed on a 
[single Jenkins 
node|https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-13163/4/pipeline/11/]
 with timeout errors:
{quote}[2023-02-14T16:43:19.054Z] Gradle Test Run 
:connect:mirror:integrationTest > Gradle Test Executor 155 > 
DedicatedMirrorIntegrationTest > testSingleNodeCluster() FAILED 
[2023-02-14T16:43:19.054Z] org.opentest4j.AssertionFailedError: Condition not 
met within timeout 3. topic A.test-topic-1 was not created on cluster B in 
time ==> expected:  but was:  [2023-02-14T16:43:19.054Z] at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
 [2023-02-14T16:43:19.054Z] at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
 [2023-02-14T16:43:19.054Z] at 
org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) 
[2023-02-14T16:43:19.054Z] at 
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) 
[2023-02-14T16:43:19.054Z] at 
org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:211) 
[2023-02-14T16:43:19.054Z] at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337) 
[2023-02-14T16:43:19.054Z] at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385) 
[2023-02-14T16:43:19.054Z] at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334) 
[2023-02-14T16:43:19.054Z] at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318) 
[2023-02-14T16:43:19.054Z] at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:308) 
[2023-02-14T16:43:19.054Z] at 
org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.awaitTopicCreation(DedicatedMirrorIntegrationTest.java:255)
 [2023-02-14T16:43:19.054Z] at 
org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.testSingleNodeCluster(DedicatedMirrorIntegrationTest.java:153){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1585

2023-02-14 Thread Apache Jenkins Server
See 




[press] potential RCE patch, comment request

2023-02-14 Thread Charlie Osborne
Good afternoon team, my name is Charlie from The Daily Swig. I'm covering
the Apache Kafka bug (CVE-2023-25194) and wanted to reach out with a few
follow up questions if i may:

-How useful are bug bounty programs like Aiven’s and indeed bug bounty
programs in general in fixing flaws in Apache projects?
-How did the disclosure and remediation process go?
-Would you like to add further comment for the story?

many thanks
charlie
-

-- 
Charlie Osborne
Technology news, ZDNet, Red Ventures, The Daily Swig


Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-02-14 Thread Mickael Maison
Hi Ismael,

Good call. I shifted all dates by 2 weeks and moved them to Wednesdays.

Thanks,
Mickael

On Tue, Feb 14, 2023 at 6:01 PM Ismael Juma  wrote:
>
> Thanks Mickael. A couple of notes:
>
> 1. We typically choose a Wednesday for the various freeze dates - there are
> often 1-2 day slips and it's better if that doesn't require people
> working through the weekend.
> 2. Looks like we're over a month later compared to the equivalent release
> last year (
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.0). I
> understand that some of it is due to 3.4.0 slipping, but I wonder if we
> could perhaps aim for the KIP freeze to be one or two weeks earlier.
>
> Ismael
>
> On Tue, Feb 14, 2023 at 8:00 AM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > I've created a release plan for 3.5.0 in the wiki:
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0
> >
> > Current dates are:
> > 1) KIP Freeze: 07 Apr 2023
> > 2) Feature Freeze: 27 Apr 2023
> > 3) Code Freeze: 11 May 2023
> >
> > Please take a look at the plan. Let me know if there are other KIPs
> > targeting 3.5.0.
> > Also if you are the author of one of the KIPs that's missing a status
> > (or the status is incorrect) please update it and let me know.
> >
> > Thanks,
> > Mickael
> >
> >
> > On Thu, Feb 9, 2023 at 9:23 AM Bruno Cadonna  wrote:
> > >
> > > Thanks, Mickael!
> > >
> > > Best,
> > > Bruno
> > >
> > > On 09.02.23 03:15, Luke Chen wrote:
> > > > Hi Mickael,
> > > > Thanks for volunteering!
> > > >
> > > > Luke
> > > >
> > > > On Thu, Feb 9, 2023 at 6:23 AM Chris Egerton 
> > > > wrote:
> > > >
> > > >> Thanks for volunteering, Mickael!
> > > >>
> > > >> On Wed, Feb 8, 2023 at 1:12 PM José Armando García Sancio
> > > >>  wrote:
> > > >>
> > > >>> Thanks for volunteering Mickael.
> > > >>>
> > > >>> --
> > > >>> -José
> > > >>>
> > > >>
> > > >
> >


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #153

2023-02-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 421103 lines...]
[2023-02-14T20:15:03.997Z] 
[2023-02-14T20:15:03.997Z] 
org.apache.kafka.streams.integration.StandbyTaskCreationIntegrationTest > 
shouldCreateStandByTasksForMaterializedAndOptimizedSourceTables STARTED
[2023-02-14T20:15:06.616Z] 
[2023-02-14T20:15:06.616Z] 
org.apache.kafka.streams.integration.StandbyTaskCreationIntegrationTest > 
shouldCreateStandByTasksForMaterializedAndOptimizedSourceTables PASSED
[2023-02-14T20:15:07.671Z] 
[2023-02-14T20:15:07.671Z] 
org.apache.kafka.streams.integration.StateDirectoryIntegrationTest > 
testNotCleanUpStateDirIfNotEmpty STARTED
[2023-02-14T20:15:10.294Z] 
[2023-02-14T20:15:10.294Z] 
org.apache.kafka.streams.integration.StateDirectoryIntegrationTest > 
testNotCleanUpStateDirIfNotEmpty PASSED
[2023-02-14T20:15:10.294Z] 
[2023-02-14T20:15:10.294Z] 
org.apache.kafka.streams.integration.StateDirectoryIntegrationTest > 
testCleanUpStateDirIfEmpty STARTED
[2023-02-14T20:15:11.228Z] 
[2023-02-14T20:15:11.228Z] 
org.apache.kafka.streams.integration.StateDirectoryIntegrationTest > 
testCleanUpStateDirIfEmpty PASSED
[2023-02-14T20:15:18.234Z] 
[2023-02-14T20:15:18.234Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificActivePartitionStores STARTED
[2023-02-14T20:15:23.017Z] 
[2023-02-14T20:15:23.017Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificActivePartitionStores PASSED
[2023-02-14T20:15:23.017Z] 
[2023-02-14T20:15:23.017Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQueryAllStalePartitionStores STARTED
[2023-02-14T20:15:27.802Z] 
[2023-02-14T20:15:27.802Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQueryAllStalePartitionStores PASSED
[2023-02-14T20:15:27.802Z] 
[2023-02-14T20:15:27.802Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreads STARTED
[2023-02-14T20:15:32.412Z] 
[2023-02-14T20:15:32.412Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreads PASSED
[2023-02-14T20:15:32.412Z] 
[2023-02-14T20:15:32.412Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStores STARTED
[2023-02-14T20:15:38.375Z] 
[2023-02-14T20:15:38.375Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStores PASSED
[2023-02-14T20:15:38.375Z] 
[2023-02-14T20:15:38.375Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQueryOnlyActivePartitionStoresByDefault STARTED
[2023-02-14T20:15:43.318Z] 
[2023-02-14T20:15:43.318Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQueryOnlyActivePartitionStoresByDefault PASSED
[2023-02-14T20:15:43.318Z] 
[2023-02-14T20:15:43.318Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQueryStoresAfterAddingAndRemovingStreamThread STARTED
[2023-02-14T20:15:50.320Z] 
[2023-02-14T20:15:50.320Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQueryStoresAfterAddingAndRemovingStreamThread PASSED
[2023-02-14T20:15:50.320Z] 
[2023-02-14T20:15:50.320Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreadsNamedTopology STARTED
[2023-02-14T20:15:55.086Z] 
[2023-02-14T20:15:55.086Z] 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreadsNamedTopology PASSED
[2023-02-14T20:15:55.086Z] 
[2023-02-14T20:15:55.086Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testInnerRepartitioned[caching enabled = true] STARTED
[2023-02-14T20:16:00.993Z] 
[2023-02-14T20:16:00.993Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testInnerRepartitioned[caching enabled = true] PASSED
[2023-02-14T20:16:00.993Z] 
[2023-02-14T20:16:00.993Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testOuterRepartitioned[caching enabled = true] STARTED
[2023-02-14T20:16:09.400Z] 
[2023-02-14T20:16:09.400Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testOuterRepartitioned[caching enabled = true] PASSED
[2023-02-14T20:16:09.400Z] 
[2023-02-14T20:16:09.400Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testInner[caching enabled = true] STARTED
[2023-02-14T20:16:12.971Z] 
[2023-02-14T20:16:12.971Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testInner[caching enabled = true] PASSED
[2023-02-14T20:16:12.971Z] 
[2023-02-14T20:16:12.971Z] 
org.apache.kafka.streams.integration.StreamStreamJoinIntegrationTest > 
testOuter[caching enabled = true] STARTED
[2023-02-14T20:16:17.574Z] 
[20

[jira] [Created] (KAFKA-14717) KafkaStreams can' get running if the rebalance happens before StreamThread gets shutdown completely

2023-02-14 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-14717:
--

 Summary: KafkaStreams can' get running if the rebalance happens 
before StreamThread gets shutdown completely
 Key: KAFKA-14717
 URL: https://issues.apache.org/jira/browse/KAFKA-14717
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai
 Fix For: 3.5.0


I noticed this issue when tracing KAFKA-7109

StreamThread closes the consumer before changing state to DEAD. If the 
partition rebalance happens quickly, the other StreamThreads can't change 
KafkaStream state from REBALANCING to RUNNING since there is a PENDING_SHUTDOWN 
StreamThread



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] KIP-904 Kafka Streams - Guarantee subtractor is called before adder if key has not changed

2023-02-14 Thread Guozhang Wang
Thanks Farooq, that looks good to me.

Guozhang

On Sun, Feb 12, 2023 at 9:01 AM Dharin Shah  wrote:
>
> Hello Farooq,
>
> This is actually a great idea, we have dealt with this by using an array
> instead of a set.
> +1 to this :)
>
> Thank you,
> Dharin
>
> On Sun, Feb 12, 2023 at 8:11 PM Fq Public  wrote:
>
> > Hi Guozhang,
> >
> > Thanks for reading over my proposal!
> >
> > > Regarding the format, I'm just thinking if we can change the type of
> > "INT newDataLength" to UINT32?
> >
> > Good idea, I've updated the KIP to reflect UINT32 since it makes clear the
> > value can never be less than zero.
> >
> > > `.equals` default implementation on Object is by reference, so if the
> > groupBy did not generate a new object, that may still pass. This means that
> > even if user does not implement the `.equals` function, if the same object
> > is returned then this feature would still be triggered, is that correct?
> >
> > Correct, I've updated the KIP to call out this edge-case clearly as
> > follows:
> >
> > > Since the default `.equals` implementation for an `Object`  is by
> > reference, if a user's `groupBy` returns the same reference for the key,
> > then the oldKey and the newKey will naturally `.equals`  each other. This
> > will result in a single event being sent to the repartition topic. This
> > change in behaviour should be considered a "bug-fix" rather than a
> > "breaking change" as the semantics of the operation remain unchanged, the
> > only thing that changes for users is they no longer see transient
> > "inconsistent" states.  In the worst case, users in this situation will
> > need to update any strict tests that check specifically for the presence of
> > transient "inconsistent" states.
> >
> > What do you think?
> >
> > Thanks,
> > Farooq
> >
> > On 2023/02/07 18:02:24 Guozhang Wang wrote:
> > > Hello Farooq,
> > >
> > > Thanks for the very detailed proposal! I think this is a great idea.
> > > Just a few thoughts:
> > >
> > > 1. I regret that we over-optimized the Changed serde format for
> > > footprint while making it less extensible. It seems to me that a two
> > > rolling bounce migration is unavoidable.. Regarding the format, I'm
> > > just thinking if we can change the type of "INT newDataLength" to
> > > UINT32?
> > >
> > > 2. `.equals` default implementation on Object is by reference, so if
> > > the groupBy did not generate a new object, that may still pass. This
> > > means that even if user does not implement the `.equals` function, if
> > > the same object is returned then this feature would still be
> > > triggered, is that correct?
> > >
> > >
> > > Guozhang
> > >
> > > On Mon, Feb 6, 2023 at 5:05 AM Fq Public  wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I'd like to share a new KIP for discussion:
> > > > https://cwiki.apache.org/confluence/x/P5VbDg
> > > >
> > > > This could be considered mostly as a "bug fix" but we wanted to raise
> > a KIP
> > > > for discussion because it involves changes to the serialization format
> > of
> > > > an internal topic which raises backward compatibility considerations.
> > > >
> > > > Please take a look and let me know what you think.
> > > >
> > > > Thanks,
> > > > Farooq
> > >


Re: [DISCUSS] Zookeeper deprecation

2023-02-14 Thread Ismael Juma
Hi Divij,

I think we'll make that call a bit later depending on the progress we make
on things like zk to kraft migration. The deprecation notice is meant to
indicate that we recommend customers use kraft mode for new clusters and
that zk mode will be going away in 4.0. It doesn't change the support we
provide for the 3.x series.

Ismael

On Tue, Feb 14, 2023 at 9:10 AM Divij Vaidya 
wrote:

> Hi folks
>
> As we begin to prepare a release plan for 3.5, I was wondering if we plan
> to deprecate Zookeeper support with this release? I am asking this because
> some prior KIPs ([1] and [2]) mentioned 3.5 as the targeted version when we
> would mark Zookeeper support as deprecated.
>
> Personally, I would prefer that we update the Zk version with this release
> [3] and deprecate support for Zk in the future versions. Upgrading the Zk
> version and deprecating future support in the same Apache Kafka version may
> be confusing for the users.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> [2]
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready
>
> [3] https://issues.apache.org/jira/browse/KAFKA-14661
>
> Regards,
> Divij Vaidya
>


Re: [VOTE] KIP-889 Versioned State Stores

2023-02-14 Thread Matthias J. Sax

Thanks Victoria. Makes sense to me.


On 2/13/23 5:55 PM, Victoria Xia wrote:

Hi everyone,

I have just pushed two minor amendments to KIP-889:

- Updated the versioned store specification to clarify that the *"history
retention" parameter is also used as "grace period,"* which means that
writes (including inserts, updates, and deletes) to the store will not be
accepted if the associated timestamp is older than the store's grace period
(i.e., history retention) relative to the current observed stream time.
   - Additional context: previously, the KIP was not explicit about
   if/when old writes would no longer be accepted. The reason for
enforcing a
   strict grace period after which writes will no longer be accepted is
   because otherwise tombstones must be retained indefinitely -- if
the latest
   value for a key is a very old tombstone, we would not be able to
expire it
   from the store because if there’s an even older non-null put to the store
   later, then without the tombstone the store would accept this
write as the
   latest value for the key, even though it isn't. In the spirit of
not adding
   more to this KIP which has already been accepted, I do not propose to add
   additional interfaces to allow users to configure grace period separately
   from history retention at this time. Such options can be introduced in a
   future KIP in a backwards-compatible way.
- Added a *new method to TopologyTestDriver* for getting a versioned
store: getVersionedKeyValueStore().
   - This new method is analogous to existing methods for other types of
   stores, and its previous omission from the KIP was an oversight.

If there are no concerns / objections, then perhaps these updates are minor
enough that we can proceed without re-voting.

Happy to discuss,
Victoria

On Wed, Dec 21, 2022 at 8:22 AM Victoria Xia 
wrote:


Hi everyone,

We have 3 binding and 1 non-binding vote in favor of this KIP (and no
objections) so KIP-889 is now accepted.

Thanks for voting, and for your excellent comments in the KIP discussion
thread!

Happy holidays,
Victoria

On Tue, Dec 20, 2022 at 12:24 PM Sagar  wrote:


Hi Victoria,

+1 (non-binding).

Thanks!
Sagar.

On Tue, Dec 20, 2022 at 1:39 PM Bruno Cadonna  wrote:


Hi Victoria,

Thanks for the KIP!

+1 (binding)

Best,
Bruno

On 19.12.22 20:03, Matthias J. Sax wrote:

+1 (binding)

On 12/15/22 1:27 PM, John Roesler wrote:

Thanks for the thorough KIP, Victoria!

I'm +1 (binding)

-John

On 2022/12/15 19:56:21 Victoria Xia wrote:

Hi all,

I'd like to start a vote on KIP-889 for introducing versioned

key-value

state stores to Kafka Streams:




https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores


The discussion thread has been open for a few weeks now and has
converged
among the current participants.

Thanks,
Victoria











Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-14 Thread Chris Egerton
Hi Yash,

I was actually envisioning something like `void
open(Collection
originalPartitions, Collection transformedPartitions)`,
since we already convert and transform each batch of records that we poll
from the sink task's consumer en masse, meaning we could discover several
new transformed partitions in between consecutive calls to SinkTask::put.

It's also worth noting that we'll probably want to deprecate the existing
open/close methods, at which point keeping one non-deprecated variant of
each seems more appealing and less complex than keeping two.

Honestly though, I think we're both on the same page enough that I wouldn't
object to either approach. We've probably reached the saturation point for
ROI here and as long as we provide developers a way to get the information
they need from the runtime and take care to add Javadocs and update our
docs page (possibly including the connector development quickstart), it
should be fine.

At this point, it might be worth updating the KIP based on recent
discussion so that others can see the latest proposal, and we can both take
a look and make sure everything looks good enough before opening a vote
thread.

Finally, I think you make a convincing case for a time-based eviction
policy. I wasn't thinking about the fairly common SMT pattern of deriving a
topic name from, e.g., a record field or header.

Cheers,

Chris

On Tue, Feb 14, 2023 at 11:42 AM Yash Mayya  wrote:

> Hi Chris,
>
> > Plus, if a connector is intentionally designed to
> > use pre-transformation topic partitions in its
> > open/close methods, wouldn't we just be trading
> > one form of the problem for another by making this
> > switch?
>
> Thanks, this makes sense, and given that the KIP already proposes a way for
> sink connector implementations to distinguish between pre-transform and
> post-transform topics per record, I think I'm convinced that going with new
> `open()` / `close()` methods is the right approach. However, I still feel
> like having overloaded methods will make it a lot less unintuitive given
> that the two sets of methods would be different in terms of when they're
> called and what arguments they are passed (also I'm presuming that the
> overloaded methods you're prescribing will only have a single
> `TopicPartition` rather than a `Collection` as their
> parameters). I guess my concern is largely around the fact that it won't be
> possible to distinguish between the overloaded methods' use cases just from
> the method signatures. I agree that naming is going to be difficult here,
> but I think that having two sets of `SinkTask::openXyz` /
> `SinkTask::closeXyz` methods will be less complicated to understand from a
> connector developer perspective (as compared to overloaded methods with
> only differing documentation). Of your suggested options, I think
> `openPreTransform` / `openPostTransform` are the most comprehensible ones.
>
> > BTW, I wouldn't say that we can't make assumptions
> > about the relationships between pre- and post-transformation
> >  topic partitions.
>
> I meant that the framework wouldn't be able to deterministically know when
> to close a post-transform topic partition given that SMTs could use
> per-record data / metadata to manipulate the topic names as and how
> required (which supports the suggestion to use an eviction policy based
> mechanism to call SinkTask::close for post-transform topic partitions).
>
> > We might utilize a policy that assumes a deterministic
> > mapping from the former to the latter, for example.
>
> Wouldn't this be making the assumption that SMTs only use the topic name
> itself and no other data / metadata while computing the new topic name? Are
> you suggesting that since this assumption could work for a majority of
> SMTs, it might be more efficient overall in terms of reducing the number of
> "false-positive" calls to `SinkTask::closePostTransform` (and we'll also be
> able to call `SinkTask::closePostTransform` immediately after topic
> partitions are revoked from the consumer)? I was thinking something more
> generic along the lines of a simple time based eviction policy that
> wouldn't be making any assumptions regarding the SMT implementations.
> Either way, I do like your earlier suggestion of keeping this logic
> internal and not painting ourselves into a corner by promising any
> particular behavior in the KIP.
>
> Thanks,
> Yash
>
> On Tue, Feb 14, 2023 at 1:08 AM Chris Egerton 
> wrote:
>
> > Hi Yash,
> >
> > I think the key difference between adding methods/overloads related to
> > SinkTask::open/SinkTask::close and SinkTask::put is that this isn't
> > auxiliary information that may or may not be useful to connector
> > developers. It's actually critical for them to understand the difference
> > between the two concepts here, even if they look very similar. And yes, I
> > do believe that switching from pre-transform to post-transform topic
> > partitions is too big a change in behavior here. Plus, if a connect

[jira] [Resolved] (KAFKA-14693) KRaft Controller and ProcessExitingFaultHandler can deadlock shutdown

2023-02-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-14693.

Resolution: Fixed

> KRaft Controller and ProcessExitingFaultHandler can deadlock shutdown
> -
>
> Key: KAFKA-14693
> URL: https://issues.apache.org/jira/browse/KAFKA-14693
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 3.4.0
>Reporter: José Armando García Sancio
>Assignee: José Armando García Sancio
>Priority: Critical
> Fix For: 3.5.0, 3.4.1
>
>
> h1. Problem
> When the kraft controller encounters an error that it cannot handle it calls 
> {{ProcessExitingFaultHandler}} which calls {{Exit.exit}} which calls 
> {{{}Runtime.exit{}}}.
> Based on the Runtime.exit documentation:
> {quote}All registered [shutdown 
> hooks|https://docs.oracle.com/javase/8/docs/api/java/lang/Runtime.html#addShutdownHook-java.lang.Thread-],
>  if any, are started in some unspecified order and allowed to run 
> concurrently until they finish. Once this is done the virtual machine 
> [halts|https://docs.oracle.com/javase/8/docs/api/java/lang/Runtime.html#halt-int-].
> {quote}
> One of the shutdown hooks registered by Kafka is {{{}Server.shutdown(){}}}. 
> This shutdown hook eventually calls {{{}KafkaEventQueue.close{}}}. This last 
> close method joins on the controller thread. Unfortunately, the controller 
> thread also joined waiting for the shutdown hook thread to finish.
> Here are an sample thread stacks:
> {code:java}
>"QuorumControllerEventHandler" #45 prio=5 os_prio=0 cpu=429352.87ms 
> elapsed=620807.49s allocated=38544M defined_classes=353 
> tid=0x7f5aeb31f800 nid=0x80c in Object.wait()  [0x7f5a658fb000]
>  java.lang.Thread.State: WAITING (on object monitor)  
>   
>   
>   at java.lang.Object.wait(java.base@17.0.5/Native 
> Method)
>   - waiting on 
>   at java.lang.Thread.join(java.base@17.0.5/Thread.java:1304)
>   - locked <0xa29241f8> (a 
> org.apache.kafka.common.utils.KafkaThread)
>   at java.lang.Thread.join(java.base@17.0.5/Thread.java:1372)
>   at 
> java.lang.ApplicationShutdownHooks.runHooks(java.base@17.0.5/ApplicationShutdownHooks.java:107)
>   at 
> java.lang.ApplicationShutdownHooks$1.run(java.base@17.0.5/ApplicationShutdownHooks.java:46)
>   at java.lang.Shutdown.runHooks(java.base@17.0.5/Shutdown.java:130)
>   at java.lang.Shutdown.exit(java.base@17.0.5/Shutdown.java:173)
>   - locked <0xffe020b8> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Runtime.exit(java.base@17.0.5/Runtime.java:115)
>   at java.lang.System.exit(java.base@17.0.5/System.java:1860)
>   at org.apache.kafka.common.utils.Exit$2.execute(Exit.java:43)
>   at org.apache.kafka.common.utils.Exit.exit(Exit.java:66)
>   at org.apache.kafka.common.utils.Exit.exit(Exit.java:62)
>   at 
> org.apache.kafka.server.fault.ProcessExitingFaultHandler.handleFault(ProcessExitingFaultHandler.java:54)
>   at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent$1.apply(QuorumController.java:891)
>   at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent$1.apply(QuorumController.java:874)
>   at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:969){code}
> and
> {code:java}
>   "kafka-shutdown-hook" #35 prio=5 os_prio=0 cpu=43.42ms elapsed=378593.04s 
> allocated=4732K defined_classes=74 tid=0x7f5a7c09d800 nid=0x4f37 in 
> Object.wait()  [0x7f5a47afd000]
>  java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(java.base@17.0.5/Native Method)
>   - waiting on 
>   at java.lang.Thread.join(java.base@17.0.5/Thread.java:1304)
>   - locked <0xa272bcb0> (a 
> org.apache.kafka.common.utils.KafkaThread)
>   at java.lang.Thread.join(java.base@17.0.5/Thread.java:1372)
>   at 
> org.apache.kafka.queue.KafkaEventQueue.close(KafkaEventQueue.java:509)
>   at 
> org.apache.kafka.controller.QuorumController.close(QuorumController.java:2553)
>   at 
> kafka.server.ControllerServer.shutdown(ControllerServer.scala:521)
>   at kafka.server.KafkaRaftServer.shutdown(KafkaRaftServer.scala:184)
>   at kafka.Kafka$.$anonfun$main$3(Kafka.scala:99)
>   at kafka.Kafka$$$Lambda$406/0x000800fb9730.apply$mcV$sp(Unknown 
> Source)
>   at kafka.utils.Exit$.$an

[DISCUSS] Zookeeper deprecation

2023-02-14 Thread Divij Vaidya
Hi folks

As we begin to prepare a release plan for 3.5, I was wondering if we plan
to deprecate Zookeeper support with this release? I am asking this because
some prior KIPs ([1] and [2]) mentioned 3.5 as the targeted version when we
would mark Zookeeper support as deprecated.

Personally, I would prefer that we update the Zk version with this release
[3] and deprecate support for Zk in the future versions. Upgrading the Zk
version and deprecating future support in the same Apache Kafka version may
be confusing for the users.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready

[3] https://issues.apache.org/jira/browse/KAFKA-14661

Regards,
Divij Vaidya


Re: [DISCUSS] KIP-900: KRaft kafka-storage.sh API additions to support SCRAM

2023-02-14 Thread Proven Provenzano
The original idea was to pass in the JSON string representing the
UserScramCredentialsRecord
directly to make this simple and to require no parsing at all. Here is a
example of the JSON object:
{"name":"alice","mechanism":1,"salt":"MWx2NHBkbnc0ZndxN25vdGN4bTB5eTFrN3E=","SaltedPassword":"mT0yyUUxnlJaC99HXgRTSYlbuqa4FSGtJCJfTMvjYCE=","iterations":8192}
Note that it isn't very friendly. The mechanism is an integer value 1 or 2
not a enum such as
SCRAM-SHA-256 or SCRAM-SHA-512, The salt and iterations are required and
there
is no password, just a SaltedPassword which the customer would have to
generate externally.

Moving away from the above we will have to parse and validate arguments and
then from that generate
the UserScramCredentialsRecord. The question is what that looks like.
Should it be closer to
what kafha-configs uses or should it be our own made up JSON format?
Whichever we chose, one format
should be sufficient as it will only be used in this application.

The requirements so far for argument parsing are:

   - We want to specify the mechanism is a customer friendly enum
   SCRAM-SHA-256 or SCRAM-SHA-512
   - We want the salt and iterations to be optional and have a default if
   not specified.
   - We want the customer to be able to specify a password which then
   generates the salted password.
   - We want to allow the customer to specify a salted password if they so
   choose.
   - We want the user to specify a user for the credentials orto  specify
   default-entity.

This is on top of the arguments needed for kafka-storage. We should also
look forward to when we add additional
record types that need to be parsed and stored for bootstrap.

What I am suggesting is that we have an --add-config argument that requires
at least one key=value subargument
which indicates which record type to add. An example would be
SCRAM-SHA-256=[iterations=8192,password=alice-secret]
This indicates that the record to add is a UserScramCredentialsRecord with
mechanism 1 (SCRAM-SHA-256)
and there are some key values to add to the record. This is very much like
kafka-config. Note that this record is still incomplete
and that we need to specify a user to apply it to and that is where entity-type
users entity-name alice subarguments are needed.
If during parsing of the arguments the record is incomplete, then
kafka-storage will exit with a failure.

--Proven



On Mon, Feb 13, 2023 at 4:54 PM José Armando García Sancio
 wrote:

> Comments below.
>
> On Mon, Feb 13, 2023 at 11:44 AM Proven Provenzano
>  wrote:
> >
> > Hi Jose
> >
> > I want to clarify that the file parsing that Argparse4j provides is just
> a
> > mechanism for
> > taking command line args and putting them in a file. It doesn't
> > actually change what the
> > command line args are for processing the file. So I can add any
> > kafka-storage command
> > line arg into the file including say the storage UUID. I see that the
> file
> > parsing will be useful
> > in the future as we add more record types to add for the bootstrap
> process.
>
> Understood.
>
> > I'm against adding specific parsing for a list of configs vs. a separate
> > JSON file as it is adding
> > more surface area that needs testing for a feature that is used
> > infrequently. One config method
> > should be sufficient for one or more SCRAM records that a customer wants
> to
> > bootstrap with.
>
> Does this mean that the storage tool won't parse and validate SCRAM
> configuration? How will the user know that their SCRAM configuration
> is correct? Do they need to start the cluster to discover if their
> SCRAM configuration is correct?
>
> Thanks,
> --
> -José
>


Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-02-14 Thread Ismael Juma
Thanks Mickael. A couple of notes:

1. We typically choose a Wednesday for the various freeze dates - there are
often 1-2 day slips and it's better if that doesn't require people
working through the weekend.
2. Looks like we're over a month later compared to the equivalent release
last year (
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.0). I
understand that some of it is due to 3.4.0 slipping, but I wonder if we
could perhaps aim for the KIP freeze to be one or two weeks earlier.

Ismael

On Tue, Feb 14, 2023 at 8:00 AM Mickael Maison 
wrote:

> Hi,
>
> I've created a release plan for 3.5.0 in the wiki:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0
>
> Current dates are:
> 1) KIP Freeze: 07 Apr 2023
> 2) Feature Freeze: 27 Apr 2023
> 3) Code Freeze: 11 May 2023
>
> Please take a look at the plan. Let me know if there are other KIPs
> targeting 3.5.0.
> Also if you are the author of one of the KIPs that's missing a status
> (or the status is incorrect) please update it and let me know.
>
> Thanks,
> Mickael
>
>
> On Thu, Feb 9, 2023 at 9:23 AM Bruno Cadonna  wrote:
> >
> > Thanks, Mickael!
> >
> > Best,
> > Bruno
> >
> > On 09.02.23 03:15, Luke Chen wrote:
> > > Hi Mickael,
> > > Thanks for volunteering!
> > >
> > > Luke
> > >
> > > On Thu, Feb 9, 2023 at 6:23 AM Chris Egerton 
> > > wrote:
> > >
> > >> Thanks for volunteering, Mickael!
> > >>
> > >> On Wed, Feb 8, 2023 at 1:12 PM José Armando García Sancio
> > >>  wrote:
> > >>
> > >>> Thanks for volunteering Mickael.
> > >>>
> > >>> --
> > >>> -José
> > >>>
> > >>
> > >
>


Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-02-14 Thread Josep Prat
Thanks for the work Mickael!

On Tue, Feb 14, 2023 at 5:01 PM Mickael Maison 
wrote:

> Hi,
>
> I've created a release plan for 3.5.0 in the wiki:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0
>
> Current dates are:
> 1) KIP Freeze: 07 Apr 2023
> 2) Feature Freeze: 27 Apr 2023
> 3) Code Freeze: 11 May 2023
>
> Please take a look at the plan. Let me know if there are other KIPs
> targeting 3.5.0.
> Also if you are the author of one of the KIPs that's missing a status
> (or the status is incorrect) please update it and let me know.
>
> Thanks,
> Mickael
>
>
> On Thu, Feb 9, 2023 at 9:23 AM Bruno Cadonna  wrote:
> >
> > Thanks, Mickael!
> >
> > Best,
> > Bruno
> >
> > On 09.02.23 03:15, Luke Chen wrote:
> > > Hi Mickael,
> > > Thanks for volunteering!
> > >
> > > Luke
> > >
> > > On Thu, Feb 9, 2023 at 6:23 AM Chris Egerton 
> > > wrote:
> > >
> > >> Thanks for volunteering, Mickael!
> > >>
> > >> On Wed, Feb 8, 2023 at 1:12 PM José Armando García Sancio
> > >>  wrote:
> > >>
> > >>> Thanks for volunteering Mickael.
> > >>>
> > >>> --
> > >>> -José
> > >>>
> > >>
> > >
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-14 Thread Yash Mayya
Hi Chris,

> Plus, if a connector is intentionally designed to
> use pre-transformation topic partitions in its
> open/close methods, wouldn't we just be trading
> one form of the problem for another by making this
> switch?

Thanks, this makes sense, and given that the KIP already proposes a way for
sink connector implementations to distinguish between pre-transform and
post-transform topics per record, I think I'm convinced that going with new
`open()` / `close()` methods is the right approach. However, I still feel
like having overloaded methods will make it a lot less unintuitive given
that the two sets of methods would be different in terms of when they're
called and what arguments they are passed (also I'm presuming that the
overloaded methods you're prescribing will only have a single
`TopicPartition` rather than a `Collection` as their
parameters). I guess my concern is largely around the fact that it won't be
possible to distinguish between the overloaded methods' use cases just from
the method signatures. I agree that naming is going to be difficult here,
but I think that having two sets of `SinkTask::openXyz` /
`SinkTask::closeXyz` methods will be less complicated to understand from a
connector developer perspective (as compared to overloaded methods with
only differing documentation). Of your suggested options, I think
`openPreTransform` / `openPostTransform` are the most comprehensible ones.

> BTW, I wouldn't say that we can't make assumptions
> about the relationships between pre- and post-transformation
>  topic partitions.

I meant that the framework wouldn't be able to deterministically know when
to close a post-transform topic partition given that SMTs could use
per-record data / metadata to manipulate the topic names as and how
required (which supports the suggestion to use an eviction policy based
mechanism to call SinkTask::close for post-transform topic partitions).

> We might utilize a policy that assumes a deterministic
> mapping from the former to the latter, for example.

Wouldn't this be making the assumption that SMTs only use the topic name
itself and no other data / metadata while computing the new topic name? Are
you suggesting that since this assumption could work for a majority of
SMTs, it might be more efficient overall in terms of reducing the number of
"false-positive" calls to `SinkTask::closePostTransform` (and we'll also be
able to call `SinkTask::closePostTransform` immediately after topic
partitions are revoked from the consumer)? I was thinking something more
generic along the lines of a simple time based eviction policy that
wouldn't be making any assumptions regarding the SMT implementations.
Either way, I do like your earlier suggestion of keeping this logic
internal and not painting ourselves into a corner by promising any
particular behavior in the KIP.

Thanks,
Yash

On Tue, Feb 14, 2023 at 1:08 AM Chris Egerton 
wrote:

> Hi Yash,
>
> I think the key difference between adding methods/overloads related to
> SinkTask::open/SinkTask::close and SinkTask::put is that this isn't
> auxiliary information that may or may not be useful to connector
> developers. It's actually critical for them to understand the difference
> between the two concepts here, even if they look very similar. And yes, I
> do believe that switching from pre-transform to post-transform topic
> partitions is too big a change in behavior here. Plus, if a connector is
> intentionally designed to use pre-transformation topic partitions in its
> open/close methods, wouldn't we just be trading one form of the problem for
> another by making this switch?
>
> One possible alternative to overloading the existing methods is to split
> SinkTask::open into openOriginal (or possibly openPhysical or
> openPreTransform) and openTransformed (or openLogical or
> openPostTransform), with a similar change for SinkTask::close. The default
> implementation for SinkTask::openOriginal can be to call SinkTask::open,
> and the same can go for SinkTask::close. However, I prefer overloading the
> existing methods since this alternative increases complexity and none of
> the names are very informative.
>
> BTW, I wouldn't say that we can't make assumptions about the relationships
> between pre- and post-transformation topic partitions. We might utilize a
> policy that assumes a deterministic mapping from the former to the latter,
> for example. The distinction I'd draw is that the assumptions we make can
> and probably should favor some cases in terms of performance (i.e.,
> reducing the number of unnecessary calls to close/open over a given sink
> task's lifetime), but should not lead to guaranteed resource leaks or
> failure to obey API contract in any cases.
>
> Cheers,
>
> Chris
>
> On Mon, Feb 13, 2023 at 10:54 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > > especially if connectors are intentionally designed around
> > > original topic partitions instead of transformed ones.
> >
> > Ha, that's a good point and re

Re: [DISCUSS] KIP-894: Use incrementalAlterConfigs API for syncing topic configurations

2023-02-14 Thread Chris Egerton
Hi Tina,

While I agree that it's reasonable for users to want to favor the source
cluster's defaults over the target cluster's, I'm hesitant to change this
behavior in an opt-out fashion. IMO it's better to allow users to opt into
this (by adding a method to the ConfigPropertyFilter interface, and
possibly extending the DefaultConfigPropertyFilter with configuration
properties related to how it should handle source cluster defaults), but we
should try to preserve the existing behavior by default.

Cheers,

Chris

On Mon, Feb 13, 2023 at 5:10 PM Gantigmaa Selenge 
wrote:

> Hi Chris
>
> My comment on the second point is not correct. Please ignore the part about
> the config source (config source does set back to DEFAULT_CONFIG when
> deleting a config). I got diverted off the issue a little bit.
>
> With the legacy API, we do propagate deletion due to resetting all the
> configs on that target topic that are not being replicated. However with
> incrementalAlterConfigs API, this changes. If we delete a config that was
> previously altered on the source topic, the config on the target topic is
> left with the previous value as the default configs are not replicated. The
> reason for favouring the source defaults was because it would set the
> config on the target topic with the source's default in this situation.
>
> Regards,
> Tina
>
> On Mon, Feb 13, 2023 at 8:15 AM Gantigmaa Selenge 
> wrote:
>
> > Hi Chris and Luke,
> >
> > Thank you very much for your feedback.
> >
> > I have addressed some of the suggestions and would like to clarify a few
> > things on the others:
> >
> > > 1) The current logic for syncing topic configs in MM2 is basically
> > fire-and-forget; all we do is log a warning message [1] if an attempt
> > fails. When "use.incremental.alter.configs" is set to "requested", we'll
> > need to know whether attempts using the incremental API succeed or not,
> and
> > then adjust our behavior accordingly. Will the plan here be to block on
> the
> > success/failure of the first request before sending any more, or will we
> > just switch over to the legacy API as soon as any request fails due to
> > targeting an incompatible broker, possibly after multiple requests with
> the
> > new API have been sent or at least queued up in the admin client?
> >
> > When it's set to "requested", I think the suggestion was to just switch
> > over to the legacy API as soon as any request fails due to
> > targeting an incompatible broker. We could keep using the legacy API
> until
> > the mirrormaker setting is changed by the user or the mirrormaker
> restarts
> > instead of trying the new API every time it syncs the configurations. I'm
> > not sure which approach is the best here.
> >
> > > 2) We don't replicate default properties from the source cluster right
> > now
> > [2].
> > Won't making ConfigPropertyFilter::shouldReplicateSourceDefault return
> true
> > by default change that behavior? If so, what's the motivation for
> favoring
> > defaults from the source cluster over defaults for the target cluster,
> > especially given that users may already be relying on the opposite?
> >
> > The concern was around what happens if deleting a topic config in
> > the source cluster. I initially thought, because the legacy API resets
> all
> > the configs other than the ones being altered, it effectively propagates
> > the deletions of a config in the source cluster. I thought by migrating
> to
> > incrementalAlterConfig API, the deletion would no longer get propagated
> but
> > looking into it again, it may not be the case.
> >
> > If the user deletes a config in the source cluster, it would be reset to
> > the default value but the config source does not change back to
> > DEFAULT_CONFIG. For example:
> >
> > ./kafka-configs.sh --alter --entity-type topics --entity-name
> > quickstart-events --add-config retention.ms=72 --bootstrap-server
> > localhost:9092
> >
> >
> > /kafka-configs.sh --describe --entity-type topics --entity-name
> > quickstart-events --bootstrap-server localhost:9092
> >
> > Dynamic configs for topic quickstart-events are:
> >
> >   retention.ms=72 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:
> > retention.ms=72}
> >
> >
> > ./kafka-configs.sh --alter --entity-type topics --entity-name
> > quickstart-events --delete-config retention.ms  --bootstrap-server
> > localhost:9092
> >
> >
> > /kafka-configs.sh --describe --entity-type topics --entity-name
> > quickstart-events --bootstrap-server localhost:9092
> >
> >   retention.ms=60480 sensitive=false synonyms={}
> >
> >
> > Therefore, even with the legacy API, the deletion of the source topic
> does
> > not actually get propagated because isDefault() check here
> > <
> https://github.com/apache/kafka/blob/6e2b86597d9cd7c8b2019cffb895522deb63c93a/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java#L460
> >
> > would no longer return true. Basically, if a user deletes a config for
> the
> > source top

[jira] [Created] (KAFKA-14716) Connect schema does not allow struct default values

2023-02-14 Thread Daniel Urban (Jira)
Daniel Urban created KAFKA-14716:


 Summary: Connect schema does not allow struct default values
 Key: KAFKA-14716
 URL: https://issues.apache.org/jira/browse/KAFKA-14716
 Project: Kafka
  Issue Type: Bug
Reporter: Daniel Urban
Assignee: Daniel Urban


The ConnectSchema API should allow specifying a composite (struct) default 
value for a field, but with the current API, it is impossible to do so.
 # There is a circular dependency between creating a struct as a default value 
and creating the schema which holds it as the default value. The Struct 
constructor expects a Schema object, and the default value setter of 
SchemaBuilder checks schema conformity by using the 
ConnectSchema.validateValue, which in turn uses ConnectSchema.equals. This can 
only be bypassed if the struct references a SchemaBuilder instance, and 
defaultValue is called on that builder instance, but this goes against the 
Struct docs stating that "Struct objects must specify a complete \{@link 
Schema} up front".
 # ConnectSchema.equals is not prepared to be used with other Schema 
implementations, so equals checks between ConnectSchema and SchemaBuilder 
instances will always fail. This is only causing an issue if equals has to be 
used for schema conformity checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-02-14 Thread Mickael Maison
Hi,

I've created a release plan for 3.5.0 in the wiki:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0

Current dates are:
1) KIP Freeze: 07 Apr 2023
2) Feature Freeze: 27 Apr 2023
3) Code Freeze: 11 May 2023

Please take a look at the plan. Let me know if there are other KIPs
targeting 3.5.0.
Also if you are the author of one of the KIPs that's missing a status
(or the status is incorrect) please update it and let me know.

Thanks,
Mickael


On Thu, Feb 9, 2023 at 9:23 AM Bruno Cadonna  wrote:
>
> Thanks, Mickael!
>
> Best,
> Bruno
>
> On 09.02.23 03:15, Luke Chen wrote:
> > Hi Mickael,
> > Thanks for volunteering!
> >
> > Luke
> >
> > On Thu, Feb 9, 2023 at 6:23 AM Chris Egerton 
> > wrote:
> >
> >> Thanks for volunteering, Mickael!
> >>
> >> On Wed, Feb 8, 2023 at 1:12 PM José Armando García Sancio
> >>  wrote:
> >>
> >>> Thanks for volunteering Mickael.
> >>>
> >>> --
> >>> -José
> >>>
> >>
> >


[jira] [Created] (KAFKA-14715) Reduce Fetcher#parseRecord() memory copy

2023-02-14 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-14715:
--

 Summary: Reduce Fetcher#parseRecord() memory copy
 Key: KAFKA-14715
 URL: https://issues.apache.org/jira/browse/KAFKA-14715
 Project: Kafka
  Issue Type: Improvement
Reporter: Mickael Maison
Assignee: LinShunkang


Jira ticket for KIP-863: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=225152035



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-02-14 Thread Divij Vaidya
Hey Jun

It has been a while since this KIP got some attention. While we wait for
Satish to chime in here, perhaps I can answer your question.

> Could you explain how you exposed the log size in your KIP-405
implementation?

The APIs available in RLMM as per KIP405
are, addRemoteLogSegmentMetadata(), updateRemoteLogSegmentMetadata(),
remoteLogSegmentMetadata(), highestOffsetForEpoch(),
putRemotePartitionDeleteMetadata(), listRemoteLogSegments(),
onPartitionLeadershipChanges()
and onStopPartitions(). None of these APIs allow us to expose the log size,
hence, the only option that remains is to list all segments using
listRemoteLogSegments() and aggregate them every time we require to
calculate the size. Based on our prior discussion, this requires reading
all segment metadata which won't work for non-local RLMM implementations.
Satish's implementation also performs a full scan and calculates the
aggregate. see:
https://github.com/satishd/kafka/blob/2.8.x-tiered-storage/core/src/main/scala/kafka/log/remote/RemoteLogManager.scala#L619


Does this answer your question?

--
Divij Vaidya



On Tue, Dec 20, 2022 at 8:40 PM Jun Rao  wrote:

> Hi, Divij,
>
> Thanks for the explanation.
>
> Good question.
>
> Hi, Satish,
>
> Could you explain how you exposed the log size in your KIP-405
> implementation?
>
> Thanks,
>
> Jun
>
> On Tue, Dec 20, 2022 at 4:59 AM Divij Vaidya 
> wrote:
>
> > Hey Jun
> >
> > Yes, it is possible to maintain the log size in the cache (see rejected
> > alternative#3 in the KIP) but I did not understand how it is possible to
> > retrieve it without the new API. The log size could be calculated on
> > startup by scanning through the segments (though I would disagree that
> this
> > is the right approach since scanning itself takes order of minutes and
> > hence delay the start of archive process), and incrementally maintained
> > afterwards, even then, we would need an API in RemoteLogMetadataManager
> so
> > that RLM could fetch the cached size!
> >
> > If we wish to cache the size without adding a new API, then we need to
> > cache the size in RLM itself (instead of RLMM implementation) and
> > incrementally manage it. The downside of longer archive time at startup
> > (due to initial scale) still remains valid in this situation.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Dec 16, 2022 at 12:43 AM Jun Rao 
> wrote:
> >
> > > Hi, Divij,
> > >
> > > Thanks for the explanation.
> > >
> > > If there is in-memory cache, could we maintain the log size in the
> cache
> > > with the existing API? For example, a replica could make a
> > > listRemoteLogSegments(TopicIdPartition topicIdPartition) call on
> startup
> > to
> > > get the remote segment size before the current leaderEpoch. The leader
> > > could then maintain the size incrementally afterwards. On leader
> change,
> > > other replicas can make a listRemoteLogSegments(TopicIdPartition
> > > topicIdPartition, int leaderEpoch) call to get the size of newly
> > generated
> > > segments.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Dec 14, 2022 at 3:27 AM Divij Vaidya 
> > > wrote:
> > >
> > > > > Is the new method enough for doing size-based retention?
> > > >
> > > > Yes. You are right in assuming that this API only provides the Remote
> > > > storage size (for current epoch chain). We would use this API for
> size
> > > > based retention along with a value of localOnlyLogSegmentSize which
> is
> > > > computed as Log.sizeInBytes(logSegments.filter(_.baseOffset >
> > > > highestOffsetWithRemoteIndex)). Hence, (total_log_size =
> > > > remoteLogSizeBytes + log.localOnlyLogSegmentSize). I have updated the
> > KIP
> > > > with this information. You can also check an example implementation
> at
> > > >
> > > >
> > >
> >
> https://github.com/satishd/kafka/blob/2.8.x-tiered-storage/core/src/main/scala/kafka/log/Log.scala#L2077
> > > >
> > > >
> > > > > Do you imagine all accesses to remote metadata will be across the
> > > network
> > > > or will there be some local in-memory cache?
> > > >
> > > > I would expect a disk-less implementation to maintain a finite
> > in-memory
> > > > cache for segment metadata to optimize the number of network calls
> made
> > > to
> > > > fetch the data. In future, we can think about bringing this finite
> size
> > > > cache into RLM itself but that's probably a conversation for a
> > different
> > > > KIP. There are many other things we would like to do to optimize the
> > > Tiered
> > > > storage interface such as introducing a circular buffer / streaming
> > > > interface from RSM (so that we don't have to wait to fetch the entire
> > > > segment before starting to send records to the consumer), caching the
> > > > segments fetched from RSM locally (I would assume all RSM plugin
> > > > implementations to do this, might as well add it to RLM) etc.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Mon, Dec 12, 2022 at 7:35 PM Jun Rao 
> > > wrote:
> > > >
> > > > >

[jira] [Resolved] (KAFKA-14696) CVE-2023-25194: Apache Kafka: Possible RCE/Denial of service attack via SASL JAAS JndiLoginModule configuration using Kafka Connect

2023-02-14 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-14696.

Resolution: Fixed

> CVE-2023-25194: Apache Kafka: Possible RCE/Denial of service attack via SASL 
> JAAS JndiLoginModule configuration using Kafka Connect
> ---
>
> Key: KAFKA-14696
> URL: https://issues.apache.org/jira/browse/KAFKA-14696
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.8.1, 2.8.2
>Reporter: MillieZhang
>Priority: Major
> Fix For: 3.4.0
>
>
> CVE Reference: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-34917]
>  
> Will Kafka 2.8.X provide a patch to fix this vulnerability?
> If yes, when will the patch be provided?
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14714) Move/Rewrite RollParams, LogAppendInfo, and LeaderHwChange to storage module.

2023-02-14 Thread Satish Duggana (Jira)
Satish Duggana created KAFKA-14714:
--

 Summary: Move/Rewrite RollParams, LogAppendInfo, and 
LeaderHwChange to storage module.
 Key: KAFKA-14714
 URL: https://issues.apache.org/jira/browse/KAFKA-14714
 Project: Kafka
  Issue Type: Sub-task
Reporter: Satish Duggana
Assignee: Satish Duggana






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1584

2023-02-14 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-14 Thread Tamas (Jira)
Tamas created KAFKA-14713:
-

 Summary: Kafka Streams global table startup takes too long
 Key: KAFKA-14713
 URL: https://issues.apache.org/jira/browse/KAFKA-14713
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Tamas


*Some context first*

We have a spring based kafka streams application. This application is listening 
to two topics. Let's call them apartment and visitor. The apartments are stored 
in a global table, while the visitors are in the stream we are processing, and 
at one point we are joining the visitor stream together with the apartment 
table. In our test environment, both topics contain 10 partitions.

*Issue*

At first deployment, everything goes fine, the global table is built and all 
entries in the stream are processed.

After everything is finished, we shut down the application, restart it and send 
out a new set of visitors. The application seemingly does not respond.

After some more debugging it turned out that it simply takes 5 minutes to start 
up, because the global table takes 30 seconds (default value for the global 
request timeout) to accept that there are no messages in the apartment topics, 
for each and every partition. If we send out the list of apartments as new 
messages, the application starts up immediately.

To make matters worse, we have clients with 96 partitions, where the startup 
time would be 48 minutes. Not having messages in the topics between application 
shutdown and restart is a valid use case, so this is quite a big problem.

*Possible workarounds*

We could reduce the request timeout, but since this value is not specific for 
the global table initialization, but a global request timeout for a lot of 
things, we do not know what else it will affect, so we are not very keen on 
doing that. Even then, it would mean a 1.5 minute delay for this particular 
client (more if we will have other use cases in the future where we will need 
to use more global tables), which is far too much, considering that the 
application would be able to otherwise start in about 20 seconds.

*Potential solutions we see*
 # Introduce a specific global table initialization timeout in 
GlobalStateManagerImpl. Then we would be able to safely modify that value 
without fear of making some other part of kafka unstable.
 # Parallelize the initialization of the global table partitions in 
GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
of linear with the number of partitions would be a huge help.
 # As long as we receive a response, accept the empty map in the KafkaConsumer, 
and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.4 #81

2023-02-14 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #152

2023-02-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 418986 lines...]
[2023-02-14T11:50:05.464Z] > Task :connect:api:compileTestJava UP-TO-DATE
[2023-02-14T11:50:05.464Z] > Task :connect:api:testClasses UP-TO-DATE
[2023-02-14T11:50:05.464Z] > Task :connect:api:testJar
[2023-02-14T11:50:05.464Z] > Task :connect:api:testSrcJar
[2023-02-14T11:50:05.464Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2023-02-14T11:50:05.464Z] > Task :connect:api:publishToMavenLocal
[2023-02-14T11:50:05.464Z] > Task 
:connect:json:publishMavenJavaPublicationToMavenLocal
[2023-02-14T11:50:05.464Z] > Task :connect:json:publishToMavenLocal
[2023-02-14T11:50:06.471Z] 
[2023-02-14T11:50:06.471Z] > Task :streams:javadoc
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:84:
 warning - Tag @link: reference not found: DefaultPartitioner
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:136:
 warning - Tag @link: reference not found: DefaultPartitioner
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:147:
 warning - Tag @link: reference not found: DefaultPartitioner
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:101:
 warning - Tag @link: reference not found: DefaultPartitioner
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:167:
 warning - Tag @link: reference not found: DefaultPartitioner
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:58:
 warning - Tag @link: missing '#': "org.apache.kafka.streams.StreamsBuilder()"
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:58:
 warning - Tag @link: can't find org.apache.kafka.streams.StreamsBuilder() in 
org.apache.kafka.streams.TopologyConfig
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/TopologyDescription.java:38:
 warning - Tag @link: reference not found: ProcessorContext#forward(Object, 
Object) forwards
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/Position.java:44:
 warning - Tag @link: can't find query(Query,
[2023-02-14T11:50:06.471Z]  PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:44:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:36:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:57:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:74:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:110:
 warning - Tag @link: reference not found: this#getResult()
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:117:
 warning - Tag @link: reference not found: this#getFailureReason()
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:117:
 warning - Tag @link: reference not found: this#getFailureMessage()
[2023-02-14T11:50:06.471Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:155:
 warning - Tag @link: reference not found: this#isSuccess()
[2023-02-14T11:50:06.471Z] 
/home/jen

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1583

2023-02-14 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-14704) Follower should truncate before incrementing high watermark

2023-02-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14704.
-
Fix Version/s: 3.5.0
   3.4.1
   3.3.3
 Reviewer: Jason Gustafson
   Resolution: Fixed

> Follower should truncate before incrementing high watermark
> ---
>
> Key: KAFKA-14704
> URL: https://issues.apache.org/jira/browse/KAFKA-14704
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.5.0, 3.4.1, 3.3.3
>
>
> When a leader becomes a follower, it is likely that it has uncommitted 
> records in its log. When it reaches out to the leader, the leader will detect 
> that they have diverged and it will return the diverging epoch and offset. 
> The follower truncates it log based on this.
> There is a small caveat in this process. When the leader return the diverging 
> epoch and offset, it also includes its high watermark, low watermark, start 
> offset and end offset. The current code in the `AbstractFetcherThread` works 
> as follow. First it process the partition data and then it checks whether 
> there is a diverging epoch/offset. The former may accidentally expose 
> uncommitted records as this step updates the local watermark to whatever is 
> received from the leader. As the follower, or the former leader, may have 
> uncommitted records, it will be able to updated the high watermark to a 
> larger offset if the leader has a higher watermark than the current local 
> one. This result in exposing uncommitted records until the log is finally 
> truncated. The time window is short but a fetch requests coming at the right 
> time to the follower could read those records. This is especially true for 
> clients out there which uses recent versions of the fetch request but without 
> implementing KIP-320.
> When this happens, the follower logs the following message: `Non-monotonic 
> update of high watermark from (offset=21437 segment=[20998:98390]) to 
> (offset=21434 segment=[20998:97843])`.
> This patch proposes to mitigate the issue by starting by checking on whether 
> a diverging epoch/offset is provided by the leader and skip processing the 
> partition data if it is. This basically means that the first fetch request 
> will result in truncating the log and a subsequent fetch request will update 
> the log/high watermarks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-14 Thread David Jacot
+1 (binding). Thanks for the KIP, Calvin!

On Tue, Feb 14, 2023 at 1:10 AM Jun Rao  wrote:

> Hi, Calvin,
>
> Thanks for the KIP. +1
>
> Jun
>
> On Fri, Feb 10, 2023 at 11:03 AM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
> > +1 (non-binding).
> >
> > Le ven. 10 févr. 2023 à 19:02, Calvin Liu  a
> > écrit :
> > >
> > > Hi all,
> > >
> > > I'd like to call for a vote on KIP-903, which proposes a fix to the
> > broker
> > > reboot data loss KAFKA-14139
> > > 
> > > It changes the Fetch and AlterPartition requests to include the broker
> > > epochs. Then the controller can use these epochs to help reject the
> stale
> > > AlterPartition request.
> > >
> > > KIP:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> > >
> > > Discussion thread:
> > > https://lists.apache.org/thread/vk9h68qpqqq69nlzbvxqx2yfbmzcd0mo
> >
>


[jira] [Resolved] (KAFKA-14254) Format timestamps in assignor logs as dates instead of integers

2023-02-14 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-14254.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Format timestamps in assignor logs as dates instead of integers
> ---
>
> Key: KAFKA-14254
> URL: https://issues.apache.org/jira/browse/KAFKA-14254
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: Ashmeet Lamba
>Priority: Minor
>  Labels: newbie, newbie++
> Fix For: 3.4.0
>
>
> This is a follow-on task from [https://github.com/apache/kafka/pull/12582]
> There is another log line that prints the same timestamp: "Triggering the 
> followup rebalance scheduled for ...", which should also be printed as a 
> date/time in the same manner as PR 12582.
> We should also search the codebase a little to see if we're printing 
> timestamps in other log lines that would be better off as date/times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)