[PR] MINOR: add docs for tiered storage quick start [kafka-site]

2023-10-16 Thread via GitHub


showuon opened a new pull request, #562:
URL: https://github.com/apache/kafka-site/pull/562

   Updated docs from kafka repo:
   1. tiered storage quick start guide: 
https://github.com/apache/kafka/pull/14528
   2. Fix docs for ReplicationBytes(Out|In)PerSec metrics: 
https://github.com/apache/kafka/pull/14228


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2294

2023-10-16 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 315340 lines...]
Gradle Test Run :core:test > Gradle Test Executor 88 > ZkAclMigrationClientTest 
> testAclsMigrateAndDualWrite() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkAclMigrationClientTest 
> testAclsMigrateAndDualWrite() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testUpdateExistingPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testUpdateExistingPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testEmptyWrite() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testEmptyWrite() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testReadMigrateAndWriteProducerId() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testReadMigrateAndWriteProducerId() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testExistingKRaftControllerClaim() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testExistingKRaftControllerClaim() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testMigrateTopicConfigs() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testMigrateTopicConfigs() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testNonIncreasingKRaftEpoch() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testNonIncreasingKRaftEpoch() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testMigrateEmptyZk() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testMigrateEmptyZk() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testTopicAndBrokerConfigsMigrationWithSnapshots() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testTopicAndBrokerConfigsMigrationWithSnapshots() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testClaimAndReleaseExistingController() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testClaimAndReleaseExistingController() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testClaimAbsentController() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testClaimAbsentController() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testIdempotentCreateTopics() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testIdempotentCreateTopics() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testCreateNewTopic() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testCreateNewTopic() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testUpdateExistingTopicWithNewAndChangedPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZkMigrationClientTest > 
testUpdateExistingTopicWithNewAndChangedPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testZNodeChangeHandlerForDataChange() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testZNodeChangeHandlerForDataChange() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testZooKeeperSessionStateMetric() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testZooKeeperSessionStateMetric() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testExceptionInBeforeInitializingSession() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testExceptionInBeforeInitializingSession() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testGetChildrenExistingZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testGetChildrenExistingZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testConnection() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testConnection() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testZNodeChangeHandlerForCreation() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > ZooKeeperClientTest > 
testZNodeChangeHandlerForCreation() PASSED

Gradle 

[jira] [Created] (KAFKA-15619) Kafka

2023-10-16 Thread Deng Ziming (Jira)
Deng Ziming created KAFKA-15619:
---

 Summary: Kafka
 Key: KAFKA-15619
 URL: https://issues.apache.org/jira/browse/KAFKA-15619
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.5.1, 3.5.0
Reporter: Deng Ziming


Deleted topics will come back again in Spark structured streaming test after 
upgrade Kafka from 3.4.0 to 3.5.0, related ticket is:  
https://issues.apache.org/jira/browse/SPARK-45529

 

I have basically inferred that this bug comes from 
https://github.com/apache/kafka/pull/12590



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2293

2023-10-16 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 211080 lines...]
Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldRemoveTasksFromAndClearInputQueueOnShutdown() 
STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldRemoveTasksFromAndClearInputQueueOnShutdown() 
PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldThrowIfAddingActiveTasksWithSameId() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldThrowIfAddingActiveTasksWithSameId() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > 
shouldRestoreActiveStatefulTasksAndUpdateStandbyTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > 
shouldRestoreActiveStatefulTasksAndUpdateStandbyTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldPauseStandbyTask() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldPauseStandbyTask() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldThrowIfStatefulTaskNotInStateRestoring() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultStateUpdaterTest > shouldThrowIfStatefulTaskNotInStateRestoring() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldSetUncaughtStreamsException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldSetUncaughtStreamsException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldProcessTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldProcessTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 85 > 

Re: [VOTE] KIP-985 Add reverseRange and reverseAll query over kv-store in IQv2

2023-10-16 Thread Matthias J. Sax

+1 (binding)


On 10/13/23 9:24 AM, Hanyu (Peter) Zheng wrote:

Hello everyone,

I would like to start a vote for KIP-985 that Add reverseRange and
reverseAll query over kv-store in IQv2.

Sincerely,
Hanyu

On Fri, Oct 13, 2023 at 9:15 AM Hanyu (Peter) Zheng 
wrote:



https://cwiki.apache.org/confluence/display/KAFKA/KIP-985:+Add+reverseRange+and+reverseAll+query+over+kv-store+in+IQv2

--

[image: Confluent] 
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
[image:
Twitter] [image: LinkedIn]
[image: Slack]
[image: YouTube]


[image: Try Confluent Cloud for Free]







[jira] [Created] (KAFKA-15618) Implement MetricsCollector which collects metrics in OTLP format

2023-10-16 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15618:
-

 Summary: Implement MetricsCollector which collects metrics in OTLP 
format
 Key: KAFKA-15618
 URL: https://issues.apache.org/jira/browse/KAFKA-15618
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


The KIP requires metrics to be collected and emitted to broker over OTLP format.

 

The changes should include:
 # Implementation of MetricsCollector which can collect metrics.
 # Implementation for serializing metrics in OTLP format.
 # Implementation to collect both delta and cumulative metrics. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15617) Determine if testFetchingPendingPartitions and testInflightFetchOnPendingPartitions overlap

2023-10-16 Thread Kirk True (Jira)
Kirk True created KAFKA-15617:
-

 Summary: Determine if testFetchingPendingPartitions and 
testInflightFetchOnPendingPartitions overlap
 Key: KAFKA-15617
 URL: https://issues.apache.org/jira/browse/KAFKA-15617
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kirk True


In FetcherTest, the two tests testFetchingPendingPartitions and 
testInflightFetchOnPendingPartitions have significant overlap. Perhaps the 
former subsumes the latter?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15616) Define client telemetry states and their transitions

2023-10-16 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15616:
-

 Summary: Define client telemetry states and their transitions
 Key: KAFKA-15616
 URL: https://issues.apache.org/jira/browse/KAFKA-15616
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


The client emitting metrics to broker needs to maintain states which specifies 
what next action client should take i.e. request subscriptions, push telemetry, 
etc.

 

The changes should include comprehensive definition of all states a client can 
move into and their transitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15615) Improve handling of fetching during metadata updates

2023-10-16 Thread Kirk True (Jira)
Kirk True created KAFKA-15615:
-

 Summary: Improve handling of fetching during metadata updates
 Key: KAFKA-15615
 URL: https://issues.apache.org/jira/browse/KAFKA-15615
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kirk True
Assignee: Kirk True






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15614) Define interfaces/classes for capturing telemetry metrics

2023-10-16 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15614:
-

 Summary: Define interfaces/classes for capturing telemetry metrics 
 Key: KAFKA-15614
 URL: https://issues.apache.org/jira/browse/KAFKA-15614
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


Define classes/interfaces for capturing metrics required for KIP-714.

The changes should include major interfaces and modification to existing 
interfaces which outlines the behaviour for KIP-174 client metrics. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2292

2023-10-16 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 317082 lines...]
> Task :connect:api:copyDependantLibs UP-TO-DATE
> Task :connect:api:jar UP-TO-DATE
> Task :connect:json:testJar
> Task :connect:api:generateMetadataFileForMavenJavaPublication
> Task :connect:json:copyDependantLibs UP-TO-DATE
> Task :connect:json:jar UP-TO-DATE
> Task :connect:api:compileTestJava UP-TO-DATE
> Task :connect:api:testClasses UP-TO-DATE
> Task :connect:json:generateMetadataFileForMavenJavaPublication
> Task :connect:json:testSrcJar
> Task :connect:api:testJar
> Task :connect:api:testSrcJar
> Task :connect:api:publishMavenJavaPublicationToMavenLocal
> Task :connect:json:publishMavenJavaPublicationToMavenLocal
> Task :connect:json:publishToMavenLocal
> Task :connect:api:publishToMavenLocal
> Task :clients:generateMetadataFileForMavenJavaPublication
> Task :storage:storage-api:compileTestJava
> Task :storage:storage-api:testClasses
> Task :server-common:compileTestJava
> Task :server-common:testClasses
> Task :raft:compileTestJava
> Task :raft:testClasses
> Task :group-coordinator:compileTestJava
> Task :group-coordinator:testClasses

> Task :clients:javadoc
/home/jenkins/workspace/Kafka_kafka_trunk/clients/src/main/java/org/apache/kafka/clients/admin/ScramMechanism.java:32:
 warning - Tag @see: missing final '>': "https://cwiki.apache.org/confluence/display/KAFKA/KIP-554%3A+Add+Broker-side+SCRAM+Config+API;>KIP-554:
 Add Broker-side SCRAM Config API

 This code is duplicated in 
org.apache.kafka.common.security.scram.internals.ScramMechanism.
 The type field in both files must match and must not change. The type field
 is used both for passing ScramCredentialUpsertion and for the internal
 UserScramCredentialRecord. Do not change the type field."
/home/jenkins/workspace/Kafka_kafka_trunk/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/secured/package-info.java:21:
 warning - Tag @link: reference not found: 
org.apache.kafka.common.security.oauthbearer
2 warnings

> Task :metadata:compileTestJava
> Task :metadata:testClasses
> Task :core:compileScala
> Task :clients:javadocJar
> Task :clients:srcJar
> Task :clients:testJar
> Task :clients:testSrcJar
> Task :clients:publishMavenJavaPublicationToMavenLocal
> Task :clients:publishToMavenLocal
> Task :core:classes
> Task :core:compileTestJava NO-SOURCE
> Task :core:compileTestScala
> Task :core:testClasses
> Task :streams:compileTestJava
> Task :streams:testClasses
> Task :streams:testJar
> Task :streams:testSrcJar
> Task :streams:publishMavenJavaPublicationToMavenLocal
> Task :streams:publishToMavenLocal

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

For more on this, please refer to 
https://docs.gradle.org/8.3/userguide/command_line_interface.html#sec:command_line_warnings
 in the Gradle documentation.

BUILD SUCCESSFUL in 4m 33s
94 actionable tasks: 41 executed, 53 up-to-date

Publishing build scan...
https://ge.apache.org/s/o6xxa2d326dyq

[Pipeline] sh
+ grep ^version= gradle.properties
+ cut -d= -f 2
[Pipeline] dir
Running in /home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart
[Pipeline] {
[Pipeline] sh
+ mvn clean install -Dgpg.skip
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Kafka Streams :: Quickstart[pom]
[INFO] streams-quickstart-java[maven-archetype]
[INFO] 
[INFO] < org.apache.kafka:streams-quickstart >-
[INFO] Building Kafka Streams :: Quickstart 3.7.0-SNAPSHOT[1/2]
[INFO]   from pom.xml
[INFO] [ pom ]-
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ streams-quickstart ---
[INFO] 
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
streams-quickstart ---
[INFO] 
[INFO] --- site:3.5.1:attach-descriptor (attach-descriptor) @ 
streams-quickstart ---
[INFO] 
[INFO] --- gpg:1.6:sign (sign-artifacts) @ streams-quickstart ---
[INFO] 
[INFO] --- install:2.5.2:install (default-install) @ streams-quickstart ---
[INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.7.0-SNAPSHOT/streams-quickstart-3.7.0-SNAPSHOT.pom
[INFO] 
[INFO] --< org.apache.kafka:streams-quickstart-java >--
[INFO] Building streams-quickstart-java 3.7.0-SNAPSHOT[2/2]
[INFO]   from java/pom.xml
[INFO] --[ maven-archetype ]---
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ 

[jira] [Created] (KAFKA-15613) Add Client API in interfaces and client configurations.

2023-10-16 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15613:
-

 Summary: Add Client API in interfaces and client configurations.
 Key: KAFKA-15613
 URL: https://issues.apache.org/jira/browse/KAFKA-15613
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


Define [client API|[KIP-714: Client metrics and observability - Apache Kafka - 
Apache Software 
Foundation|https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-ClientAPI]]
 and [client configurations|[KIP-714: Client metrics and observability - Apache 
Kafka - Apache Software 
Foundation|https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-Clientconfiguration]].

 

The change shall require:
 # Defining Client configurations in Producer, Consumer and Admin client config.
 # Defining method in client interfaces and respective implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-10-16 Thread Qichao Chu
Hi Divij and Kirk,

Thank you both for providing the valuable feedback and sorry for the delay.
I have just updated the KIP to address the comments.

   1. Instead of using a topic-level control, global verbosity control
   makes more sense if we want to extend it in the future. It would be very
   difficult if we want to apply the topic allowlist everywhere
   2. Also, the topic allowlist was not dynamic which makes everything
   quite complex, especially for the topic lifecycle management. By using the
   dynamic global config, debugging could be easier, and management of the
   config is also made easier.
   3. More details are included in the test section.

One thing that still misses is the performance numbers. I will get it ready
with our internal clusters and share out soon.

Many thanks for the review!
Qichao

On Tue, Sep 12, 2023 at 8:31 AM Kirk True  wrote:

> Oh, and does metrics.partition.level.reporting.topics allow for regex?
>
> > On Sep 12, 2023, at 8:26 AM, Kirk True  wrote:
> >
> > Hi Qichao,
> >
> > Thanks for the KIP!
> >
> > Divij—questions/comments inline...
> >
> >> On Sep 11, 2023, at 4:32 AM, Divij Vaidya 
> wrote:
> >>
> >> Thank you for the proposal Qichao.
> >>
> >> I agree with the motivation here and understand the tradeoff here
> >> between observability vs. increased metric dimensions (metric fan-out
> >> as you say in the KIP).
> >>
> >> High level comments:
> >>
> >> 1. I would urge you to consider the extensibility of the proposal for
> >> other types of metrics. Tomorrow, if we want to selectively add
> >> "partition" dimension to another metric, would we have to modify the
> >> code where each metric is emitted? Alternatively, could we abstract
> >> out this config in a "Kafka Metrics" library. The code provides all
> >> information about this library and this library can choose which
> >> dimensions it wants to add to the final metrics that are emitted based
> >> on declarative configuration.
> >
> > I’d agree with this if it doesn’t place a burden on the callers. Are
> there any potential call sites that don’t have the partition information
> readily available?
> >
> >> 2. Can we offload the handling of this dimension filtering to the
> >> metric framework? Have you explored whether prometheus or other
> >> libraries provide the ability to dynamically change dimensions
> >> associated with metrics?
> >
> > I’m not familiar with the downstream metrics providers’ capabilities.
> This is a greatest common denominator scenario, right? We’d have to be
> reasonable sure that the heavily used providers *all* support such dynamic
> filtering.
> >
> > Also—and correct me as needed as I’m not familiar with the area—if we
> relegate partition filtering to a lower layer, we’d still need to store the
> metric data in memory until it’s flushed, yes? If so, is that overhead of
> any concern?
> >
> >> Implementation level comments:
> >>
> >> 1. In the test plan section, please mention what kind of integ and/or
> >> unit tests will be added and what they will assert. As an example, you
> >> can add a section, "functionality tests", which would assert that new
> >> metric config is being respected and another section, "performance
> >> tests", which could be a system test and assert that no regression
> >> caused wrt resources occupied by metrics from one version to another.
> >> 2. Please mention why or why not are we considering dynamically
> >> setting the configuration (i.e. without a broker restart)? I would
> >> imagine that the ability to dynamically configure for a specific topic
> >> will be very useful especially to debug production situations that you
> >> mention in the motivation.
> >
> > +1
> >
> >> 3. You mention that we want to start with metrics closely related to
> >> producer & consumers first, which is fair. Could you please add a
> >> statement on the work required to extend this to other metrics in
> >> future?
> >
> > +1
> >
> >> 4. In the compatibility section, you mention that this change is
> >> backward compatible. I don't fully understand that. During a version
> >> upgrade, we will start with an empty list of topics to maintain
> >> backward compatibility. I assume after the upgrade, we will update the
> >> new config with topic names that we desire to monitor. But updating
> >> the config will require a broker restart (a rolling restart since
> >> config is read-only). We will be in a situation where some brokers are
> >> sending metrics with a new "partition" dimension and some brokers are
> >> sending metrics with no partition dimension. Is that acceptable to JMX
> >> / prometheus collectors? Would it break them? Please clarify how
> >> upgrades will work in the compatibility section.
> >> 5. Could you please quantify (with an experiment) the expected perf
> >> impact of adding the partition dimension? This could be done as part
> >> of "test plan" section and would serve as a data point for users to
> >> understand the potential impact if they decide 

Re: [DISCUSS] KIP-982: Access SslPrincipalMapper and kerberosShortNamer in Custom KafkaPrincipalBuilder

2023-10-16 Thread Manikumar
Hi Raghu,

Thanks for the KIP. Proposed changes look good to me.

Thanks,
Manikumar

On Fri, Sep 22, 2023 at 11:44 PM Raghu B  wrote:

> Hi everyone,
>
> I would like to start the discussion on the KIP-982 to Access
> SslPrincipalMapper and kerberosShortNamer in Custom KafkaPrincipalBuilder
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-982%3A+Access+SslPrincipalMapper+and+kerberosShortNamer+in+Custom+KafkaPrincipalBuilder
>
> Looking forward to your feedback!
>
> Thanks,
> Raghu
>


[jira] [Resolved] (KAFKA-15543) Send HB request right after reconciliation completes

2023-10-16 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15543.
---
Resolution: Duplicate

> Send HB request right after reconciliation completes
> 
>
> Key: KAFKA-15543
> URL: https://issues.apache.org/jira/browse/KAFKA-15543
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Priority: Blocker
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
>
> HeartbeatRequest manager should send HB request outside of the interval, 
> right after the reconciliation process completes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14761) Integration Tests for the New Consumer Implementation

2023-10-16 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-14761.
---
Fix Version/s: 3.6.0
   Resolution: Fixed

> Integration Tests for the New Consumer Implementation
> -
>
> Key: KAFKA-14761
> URL: https://issues.apache.org/jira/browse/KAFKA-14761
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: consumer-threading-refactor, kip-848-preview
> Fix For: 3.6.0
>
>
> This Jira tracks the efforts of integratoin testing for the new consumer we 
> are implementing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2291

2023-10-16 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-980: Allow creating connectors in a stopped state

2023-10-16 Thread Yash Mayya
Hi all,

Thanks for participating in the discussion and voting! KIP-980 has been
accepted with the following +1 votes:

   - Chris Egerton (binding)
   - Knowles Atchison Jr (non-binding)
   - Greg Harris (binding)
   - Mickael Maison (binding)
   - Yash Mayya (binding)

The target release for this KIP is 3.7.0

Thanks,
Yash

On Mon, Oct 16, 2023 at 7:39 PM Mickael Maison 
wrote:

> +1 (binding)
> Thanks for the KIP
>
> Mickael
>
> On Mon, Oct 16, 2023 at 9:05 AM Yash Mayya  wrote:
> >
> > Hi all,
> >
> > Bumping up this vote thread - we have two binding +1 votes and one
> > non-binding +1 vote so far.
> >
> > Thanks,
> > Yash
> >
> > On Mon, Oct 9, 2023 at 11:57 PM Greg Harris  >
> > wrote:
> >
> > > Thanks Yash for the well written KIP!
> > >
> > > And thank you for finally adding JSON support to the standalone mode
> > > that isn't file extension sensitive. That will be very useful.
> > >
> > > +1 (binding)
> > >
> > > On Mon, Oct 9, 2023 at 10:45 AM Knowles Atchison Jr
> > >  wrote:
> > > >
> > > > This is super useful for pipeline setup!
> > > >
> > > > +1 (non binding)
> > > >
> > > > On Mon, Oct 9, 2023, 7:57 AM Chris Egerton 
> > > wrote:
> > > >
> > > > > Thanks for the KIP, Yash!
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Mon, Oct 9, 2023, 01:12 Yash Mayya 
> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start a vote on KIP-980 which proposes allowing the
> > > creation
> > > > > of
> > > > > > connectors in a stopped (or paused) state.
> > > > > >
> > > > > > KIP -
> > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > >
> > > > > > Discussion Thread -
> > > > > > https://lists.apache.org/thread/om803vl191ysf711qm7czv94285rtt5d
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > >
> > >
>


[jira] [Created] (KAFKA-15612) Followup on whether the indexes need to be materialized before they are passed to RSM for writing them to tiered storage.

2023-10-16 Thread Satish Duggana (Jira)
Satish Duggana created KAFKA-15612:
--

 Summary: Followup on whether the indexes need to be materialized 
before they are passed to RSM for writing them to tiered storage. 
 Key: KAFKA-15612
 URL: https://issues.apache.org/jira/browse/KAFKA-15612
 Project: Kafka
  Issue Type: Task
Reporter: Satish Duggana


Followup on the [PR 
comment|https://github.com/apache/kafka/pull/14529#discussion_r1355263700]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15611) Use virtual threads in the Connect framework

2023-10-16 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15611:
---

 Summary: Use virtual threads in the Connect framework
 Key: KAFKA-15611
 URL: https://issues.apache.org/jira/browse/KAFKA-15611
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Greg Harris


Virtual threads have been finalized in JDK21, so we may include optional 
support for them in Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-980: Allow creating connectors in a stopped state

2023-10-16 Thread Mickael Maison
+1 (binding)
Thanks for the KIP

Mickael

On Mon, Oct 16, 2023 at 9:05 AM Yash Mayya  wrote:
>
> Hi all,
>
> Bumping up this vote thread - we have two binding +1 votes and one
> non-binding +1 vote so far.
>
> Thanks,
> Yash
>
> On Mon, Oct 9, 2023 at 11:57 PM Greg Harris 
> wrote:
>
> > Thanks Yash for the well written KIP!
> >
> > And thank you for finally adding JSON support to the standalone mode
> > that isn't file extension sensitive. That will be very useful.
> >
> > +1 (binding)
> >
> > On Mon, Oct 9, 2023 at 10:45 AM Knowles Atchison Jr
> >  wrote:
> > >
> > > This is super useful for pipeline setup!
> > >
> > > +1 (non binding)
> > >
> > > On Mon, Oct 9, 2023, 7:57 AM Chris Egerton 
> > wrote:
> > >
> > > > Thanks for the KIP, Yash!
> > > >
> > > > +1 (binding)
> > > >
> > > > On Mon, Oct 9, 2023, 01:12 Yash Mayya  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a vote on KIP-980 which proposes allowing the
> > creation
> > > > of
> > > > > connectors in a stopped (or paused) state.
> > > > >
> > > > > KIP -
> > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > >
> > > > > Discussion Thread -
> > > > > https://lists.apache.org/thread/om803vl191ysf711qm7czv94285rtt5d
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> > > >
> >


Re: [DISCUSS] KIP-988 Streams Standby Task Update Listener

2023-10-16 Thread Bruno Cadonna

Thanks for the KIP, Colt and Eduwer,

Are you sure there is also not a significant performance impact for 
passing into the callback `currentEndOffset`?


I am asking because the comment here: 
https://github.com/apache/kafka/blob/c32d2338a7e0079e539b74eb16f0095380a1ce85/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java#L129


says that the end-offset is only updated once for standby tasks whose 
changelog topic is not piggy-backed on input topics. I could also not 
find the update of end-offset for those standbys.



Best,
Bruno

On 10/16/23 10:55 AM, Lucas Brutschy wrote:

Hi all,

it's a nice improvement! I don't have anything to add on top of the
previous comments, just came here to say that it seems to me consensus
has been reached and the result looks good to me.

Thanks Colt and Eduwer!
Lucas

On Sun, Oct 15, 2023 at 9:11 AM Colt McNealy  wrote:


Thanks, Guozhang. I've updated the KIP and will start a vote.

Colt McNealy

*Founder, LittleHorse.dev*


On Sat, Oct 14, 2023 at 10:27 AM Guozhang Wang 
wrote:


Thanks for the summary, that looks good to me.

Guozhang

On Fri, Oct 13, 2023 at 8:57 PM Colt McNealy  wrote:


Hello there!

Thanks everyone for the comments. There's a lot of back-and-forth going

on,

so I'll do my best to summarize what everyone's said in TLDR format:

1. Rename `onStandbyUpdateStart()` -> `onUpdateStart()`,  and do

similarly

for the other methods.
2. Keep `SuspendReason.PROMOTED` and `SuspendReason.MIGRATED`.
3. Remove the `earliestOffset` parameter for performance reasons.

If that's all fine with everyone, I'll update the KIP and we—well, mostly
Edu (:  —will open a PR.

Cheers,
Colt McNealy

*Founder, LittleHorse.dev*


On Fri, Oct 13, 2023 at 7:58 PM Eduwer Camacaro 
wrote:


Hello everyone,

Thanks for all your feedback for this KIP!

I think that the key to choosing proper names for this API is

understanding

the terms used inside the StoreChangelogReader. Currently, this class

has

two possible states: ACTIVE_RESTORING and STANDBY_UPDATING. In my

opinion,

using StandbyUpdateListener for the interface fits better on these

terms.

Same applies for onUpdateStart/Suspended.

StoreChangelogReader uses "the same mechanism" for active task

restoration

and standby task updates, but this is an implementation detail. Under
normal circumstances (no rebalances or task migrations), the changelog
reader will be in STANDBY_UPDATING, which means it will be updating

standby

tasks as long as there are new records in the changelog topic. That's

why I

prefer onStandbyUpdated instead of onBatchUpdated, even if it doesn't

100%

align with StateRestoreListener, but either one is fine.

Edu

On Fri, Oct 13, 2023 at 8:53 PM Guozhang Wang <

guozhang.wang...@gmail.com>

wrote:


Hello Colt,

Thanks for writing the KIP! I have read through the updated KIP and
overall it looks great. I only have minor naming comments (well,
aren't naming the least boring stuff to discuss and that takes the
most of the time for KIPs :P):

1. I tend to agree with Sophie regarding whether or not to include
"Standby" in the functions of "onStandbyUpdateStart/Suspended", since
it is also more consistent with the functions of
"StateRestoreListener" where we do not name it as
"onStateRestoreState" etc.

2. I know in community discussions we sometimes say "a standby is
promoted to active", but in the official code / java docs we did not
have a term of "promotion", since what the code does is really

recycle

the task (while keeping its state stores open), and create a new
active task that takes in the recycled state stores and just changing
the other fields like task type etc. After thinking about this for a
bit, I tend to feel that "promoted" is indeed a better name for user
facing purposes while "recycle" is more of a technical detail inside
the code and could be abstracted away from users. So I feel keeping
the name "PROMOTED" is fine.

3. Regarding "earliestOffset", it does feel like we cannot always
avoid another call to the Kafka API. And on the other hand, I also
tend to think that such bookkeeping may be better done at the app
level than from the Streams' public API level. I.e. the app could

keep

a "first ever starting offset" per "topic-partition-store" key, and a
when we have rolling restart and hence some standby task keeps
"jumping" from one client to another via task assignment, the app
would update this value just one when it finds the
""topic-partition-store" was never triggered before. What do you
think?

4. I do not have a strong opinion either, but what about

"onBatchUpdated" ?



Guozhang

On Wed, Oct 11, 2023 at 9:31 PM Colt McNealy 

wrote:


Sohpie—

Thank you very much for such a detailed review of the KIP. It might
actually be longer than the original KIP in the first place!

1. Ack'ed and fixed.

2. Correct, this is a confusing passage and requires context:

One thing on our list of TODO's regarding reliability is to

determine

how


[jira] [Created] (KAFKA-15610) Fix `CoreUtils.swallow()` test gaps

2023-10-16 Thread Ismael Juma (Jira)
Ismael Juma created KAFKA-15610:
---

 Summary: Fix `CoreUtils.swallow()` test gaps
 Key: KAFKA-15610
 URL: https://issues.apache.org/jira/browse/KAFKA-15610
 Project: Kafka
  Issue Type: Bug
Reporter: Ismael Juma


For example, it should verify that the passed in `logging` is used in case of 
an exception. We found that there is no test for this in 
https://github.com/apache/kafka/pull/14529#discussion_r1355277747.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] MINOR: Fix docs for ReplicationBytes(Out|In)PerSec metrics [kafka-site]

2023-10-16 Thread via GitHub


mimaison merged PR #561:
URL: https://github.com/apache/kafka-site/pull/561


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Reopened] (KAFKA-15355) Message schema changes

2023-10-16 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-15355:
-

closed by mistake

> Message schema changes
> --
>
> Key: KAFKA-15355
> URL: https://issues.apache.org/jira/browse/KAFKA-15355
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
> Fix For: 3.7.0
>
>
> Metadata records changes:
>  * BrokerRegistrationChangeRecord
>  * PartitionChangeRecord
>  * PartitionRecord
>  * RegisterBrokerRecord
> New RPCs
>  * AssignReplicasToDirsRequest
>  * AssignReplicasToDirsResponse
> RPC changes:
>  * BrokerHeartbeatRequest
>  * BrokerRegistrationRequest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] MINOR: Fix docs for ReplicationBytes(Out|In)PerSec metrics [kafka-site]

2023-10-16 Thread via GitHub


bachmanity1 opened a new pull request, #561:
URL: https://github.com/apache/kafka-site/pull/561

   Related to the https://github.com/apache/kafka/pull/14228


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-989: RocksDB Iterator Metrics

2023-10-16 Thread Lucas Brutschy
Hi Nick!

thanks for the KIP! I think this could be quite useful, given the
problems that we had with leaks due to RocksDB resources not being
closed.

I don't have any pressing issues why we can't accept it like it is,
just one minor point for discussion: would it also make sense to make
it possible to identify a few very long-running / leaked iterators? I
can imagine on a busy node, it would be hard to spot that 1% of the
iterators never close when looking only at closed iterator or the
number of iterators. But it could still be good to identify those
leaks early. One option would be to add `iterator-duration-max` and
take open iterators into account when computing the metric.

Cheers,
Lucas


On Thu, Oct 5, 2023 at 3:50 PM Nick Telford  wrote:
>
> Hi Colt,
>
> I kept the details out of the KIP, because KIPs generally document
> high-level design, but the way I'm doing it is like this:
>
> final ManagedKeyValueIterator
> rocksDbPrefixSeekIterator = cf.prefixScan(accessor, prefixBytes);
> --> final long startedAt = System.nanoTime();
> openIterators.add(rocksDbPrefixSeekIterator);
> rocksDbPrefixSeekIterator.onClose(() -> {
> --> metricsRecorder.recordIteratorDuration(System.nanoTime() -
> startedAt);
> openIterators.remove(rocksDbPrefixSeekIterator);
> });
>
> The lines with the arrow are the new code. This pattern is repeated
> throughout RocksDBStore, wherever a new RocksDbIterator is created.
>
> Regards,
> Nick
>
> On Thu, 5 Oct 2023 at 12:32, Colt McNealy  wrote:
>
> > Thank you for the KIP, Nick!
> >
> > This would be highly useful for many reasons. Much more sane than checking
> > for leaked iterators by profiling memory usage while running 100's of
> > thousands of range scans via interactive queries (:
> >
> > One small question:
> >
> > >The iterator-duration metrics will be updated whenever an Iterator's
> > close() method is called
> >
> > Does the Iterator have its own "createdAt()" or equivalent field, or do we
> > need to keep track of the Iterator's start time upon creation?
> >
> > Cheers,
> > Colt McNealy
> >
> > *Founder, LittleHorse.dev*
> >
> >
> > On Wed, Oct 4, 2023 at 9:07 AM Nick Telford 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > KIP-989 is a small Kafka Streams KIP to add a few new metrics around the
> > > creation and use of RocksDB Iterators, to aid users in identifying
> > > "Iterator leaks" that could cause applications to leak native memory.
> > >
> > > Let me know what you think!
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics
> > >
> > > P.S. I'm not too sure about the formatting of the "New Metrics" table,
> > any
> > > advice there would be appreciated.
> > >
> > > Regards,
> > > Nick
> > >
> >


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2290

2023-10-16 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-10-16 Thread Lucas Brutschy
Hi all,

I think I liked your suggestion of allowing EOS with READ_UNCOMMITTED,
but keep wiping the state on error, and I'd vote for this solution
when introducing `default.state.isolation.level`. This way, we'd have
the most low-risk roll-out of this feature (no behavior change without
reconfiguration), with the possibility of switching to the most sane /
battle-tested default settings in 4.0. Essentially, we'd have a
feature flag but call it `default.state.isolation.level` and don't
have to deprecate it later.

So the possible configurations would then be this:

1. ALOS/READ_UNCOMMITTED (default) = processing uses direct-to-DB, IQ
reads from DB.
2. ALOS/READ_COMMITTED = processing uses WriteBatch, IQ reads from
WriteBatch/DB. Flush on error (see note below).
3. EOS/READ_UNCOMMITTED (default) = processing uses direct-to-DB, IQ
reads from DB. Wipe state on error.
4. EOS/READ_COMMITTED = processing uses WriteBatch, IQ reads from WriteBatch/DB.

I believe the feature is important enough that we will see good
adoption even without changing the default. In 4.0, when we have seen
this being adopted and is battle-tested, we make READ_COMMITTED the
default for EOS, or even READ_COMITTED always the default, depending
on our experiences. And we could add a clever implementation of
READ_UNCOMITTED with WriteBatches later.

The only smell here is that `default.state.isolation.level` wouldn't
be purely an IQ setting, but it would also (slightly) change the
behavior of the processing, but that seems unavoidable as long as we
haven't solve READ_UNCOMITTED IQ with WriteBatches.

Minor: As for Bruno's point 4, I think if we are concerned about this
behavior (we don't necessarily have to be, because it doesn't violate
ALOS guarantees as far as I can see), we could make
ALOS/READ_COMMITTED more similar to ALOS/READ_UNCOMITTED by flushing
the WriteBatch on error (obviously, only if we have a chance to do
that).

Cheers,
Lucas

On Mon, Oct 16, 2023 at 12:19 PM Nick Telford  wrote:
>
> Hi Guozhang,
>
> The KIP as it stands introduces a new configuration,
> default.state.isolation.level, which is independent of processing.mode.
> It's intended that this new configuration be used to configure a global IQ
> isolation level in the short term, with a future KIP introducing the
> capability to change the isolation level on a per-query basis, falling back
> to the "default" defined by this config. That's why I called it "default",
> for future-proofing.
>
> However, it currently includes the caveat that READ_UNCOMMITTED is not
> available under EOS. I think this is the coupling you are alluding to?
>
> This isn't intended to be a restriction of the API, but is currently a
> technical limitation. However, after discussing with some users about
> use-cases that would require READ_UNCOMMITTED under EOS, I'm inclined to
> remove that clause and put in the necessary work to make that combination
> possible now.
>
> I currently see two possible approaches:
>
>1. Disable TX StateStores internally when the IsolationLevel is
>READ_UNCOMMITTED and the processing.mode is EOS. This is more difficult
>than it sounds, as there are many assumptions being made throughout the
>internals about the guarantees StateStores provide. It would definitely add
>a lot of extra "if (read_uncommitted && eos)" branches, complicating
>maintenance and testing.
>2. Invest the time *now* to make READ_UNCOMMITTED of EOS StateStores
>possible. I have some ideas on how this could be achieved, but they would
>need testing and could introduce some additional issues. The benefit of
>this approach is that it would make query-time IsolationLevels much simpler
>to implement in the future.
>
> Unfortunately, both will require considerable work that will further delay
> this KIP, which was the reason I placed the restriction in the KIP in the
> first place.
>
> Regards,
> Nick
>
> On Sat, 14 Oct 2023 at 03:30, Guozhang Wang 
> wrote:
>
> > Hello Nick,
> >
> > First of all, thanks a lot for the great effort you've put in driving
> > this KIP! I really like it coming through finally, as many people in
> > the community have raised this. At the same time I honestly feel a bit
> > ashamed for not putting enough of my time supporting it and pushing it
> > through the finish line (you raised this KIP almost a year ago).
> >
> > I briefly passed through the DISCUSS thread so far, not sure I've 100
> > percent digested all the bullet points. But with the goal of trying to
> > help take it through the finish line in mind, I'd want to throw
> > thoughts on top of my head only on the point #4 above which I felt may
> > be the main hurdle for the current KIP to drive to a consensus now.
> >
> > The general question I asked myself is, whether we want to couple "IQ
> > reading mode" with "processing mode". While technically I tend to
> > agree with you that, it's feels like a bug if some single user chose
> > "EOS" for processing mode while 

[jira] [Created] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-10-16 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-15609:


 Summary: Corrupted index uploaded to remote tier
 Key: KAFKA-15609
 URL: https://issues.apache.org/jira/browse/KAFKA-15609
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.6.0
Reporter: Divij Vaidya


While testing Tiered Storage, we have observed corrupt indexes being present in 
remote tier. One such situation is covered here at 
https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
such possible case of corruption.

Potential cause of index corruption:

We want to ensure that the file we are passing to RSM plugin contains all the 
data which is present in MemoryByteBuffer i.e. we should have flushed the 
MemoryByteBuffer to the file using force(). In Kafka, when we close a segment, 
indexes are flushed asynchronously. Hence, it might be possible that when we 
are passing the file to RSM, the file doesn't contain flushed data. Hence, we 
may end up uploading indexes which haven't been flushed yet. Ideally, the 
contract should enforce that we force flush the content of MemoryByteBuffer 
before we give the file for RSM. This will ensure that indexes are not 
corrupted/incomplete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2289

2023-10-16 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 314419 lines...]

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testAclMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testCreateSequentialPersistentPath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testCreateSequentialPersistentPath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testConditionalUpdatePath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testConditionalUpdatePath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testDeleteTopicZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testDeleteTopicZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testDeletePath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testDeletePath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testGetBrokerMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testGetBrokerMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testCreateTokenChangeNotification() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testCreateTokenChangeNotification() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testGetTopicsAndPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testGetTopicsAndPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testConsumerOffsetPath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testConsumerOffsetPath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testTopicAssignments() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testTopicAssignments() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testControllerManagementMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testControllerManagementMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testTopicAssignmentMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testTopicAssignmentMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testConnectionViaNettyClient() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testConnectionViaNettyClient() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testPropagateIsrChanges() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testPropagateIsrChanges() PASSED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testControllerEpochMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 88 > KafkaZkClientTest > 
testControllerEpochMethods() PASSED

Gradle Test Run :core:test > Gradle Test 

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-10-16 Thread Nick Telford
Hi Guozhang,

The KIP as it stands introduces a new configuration,
default.state.isolation.level, which is independent of processing.mode.
It's intended that this new configuration be used to configure a global IQ
isolation level in the short term, with a future KIP introducing the
capability to change the isolation level on a per-query basis, falling back
to the "default" defined by this config. That's why I called it "default",
for future-proofing.

However, it currently includes the caveat that READ_UNCOMMITTED is not
available under EOS. I think this is the coupling you are alluding to?

This isn't intended to be a restriction of the API, but is currently a
technical limitation. However, after discussing with some users about
use-cases that would require READ_UNCOMMITTED under EOS, I'm inclined to
remove that clause and put in the necessary work to make that combination
possible now.

I currently see two possible approaches:

   1. Disable TX StateStores internally when the IsolationLevel is
   READ_UNCOMMITTED and the processing.mode is EOS. This is more difficult
   than it sounds, as there are many assumptions being made throughout the
   internals about the guarantees StateStores provide. It would definitely add
   a lot of extra "if (read_uncommitted && eos)" branches, complicating
   maintenance and testing.
   2. Invest the time *now* to make READ_UNCOMMITTED of EOS StateStores
   possible. I have some ideas on how this could be achieved, but they would
   need testing and could introduce some additional issues. The benefit of
   this approach is that it would make query-time IsolationLevels much simpler
   to implement in the future.

Unfortunately, both will require considerable work that will further delay
this KIP, which was the reason I placed the restriction in the KIP in the
first place.

Regards,
Nick

On Sat, 14 Oct 2023 at 03:30, Guozhang Wang 
wrote:

> Hello Nick,
>
> First of all, thanks a lot for the great effort you've put in driving
> this KIP! I really like it coming through finally, as many people in
> the community have raised this. At the same time I honestly feel a bit
> ashamed for not putting enough of my time supporting it and pushing it
> through the finish line (you raised this KIP almost a year ago).
>
> I briefly passed through the DISCUSS thread so far, not sure I've 100
> percent digested all the bullet points. But with the goal of trying to
> help take it through the finish line in mind, I'd want to throw
> thoughts on top of my head only on the point #4 above which I felt may
> be the main hurdle for the current KIP to drive to a consensus now.
>
> The general question I asked myself is, whether we want to couple "IQ
> reading mode" with "processing mode". While technically I tend to
> agree with you that, it's feels like a bug if some single user chose
> "EOS" for processing mode while choosing "read uncommitted" for IQ
> reading mode, at the same time, I'm thinking if it's possible that
> there could be two different persons (or even two teams) that would be
> using the stream API to build the app, and the IQ API to query the
> running state of the app. I know this is less of a technical thing but
> rather a more design stuff, but if it could be ever the case, I'm
> wondering if the personale using the IQ API knows about the risks of
> using read uncommitted but still chose so for the favor of
> performance, no matter if the underlying stream processing mode
> configured by another personale is EOS or not. In that regard, I'm
> leaning towards a "leaving the door open, and close it later if we
> found it's a bad idea" aspect with a configuration that we can
> potentially deprecate than "shut the door, clean for everyone". More
> specifically, allowing the processing mode / IQ read mode to be
> decoupled, and if we found that there's no such cases as I speculated
> above or people started complaining a lot, we can still enforce
> coupling them.
>
> Again, just my 2c here. Thanks again for the great patience and
> diligence on this KIP.
>
>
> Guozhang
>
>
>
> On Fri, Oct 13, 2023 at 8:48 AM Nick Telford 
> wrote:
> >
> > Hi Bruno,
> >
> > 4.
> > I'll hold off on making that change until we have a consensus as to what
> > configuration to use to control all of this, as it'll be affected by the
> > decision on EOS isolation levels.
> >
> > 5.
> > Done. I've chosen "committedOffsets".
> >
> > Regards,
> > Nick
> >
> > On Fri, 13 Oct 2023 at 16:23, Bruno Cadonna  wrote:
> >
> > > Hi Nick,
> > >
> > > 1.
> > > Yeah, you are probably right that it does not make too much sense.
> > > Thanks for the clarification!
> > >
> > >
> > > 4.
> > > Yes, sorry for the back and forth, but I think for the sake of the KIP
> > > it is better to let the ALOS behavior as it is for now due to the
> > > possible issues you would run into. Maybe we can find a solution in the
> > > future. Now the question returns to whether we really need
> > > default.state.isolation.level. Maybe the 

[jira] [Created] (KAFKA-15608) Uncopied leader epoch cache cause repeated OffsetOutOfRangeException

2023-10-16 Thread Drawxy (Jira)
Drawxy created KAFKA-15608:
--

 Summary: Uncopied leader epoch cache cause repeated 
OffsetOutOfRangeException
 Key: KAFKA-15608
 URL: https://issues.apache.org/jira/browse/KAFKA-15608
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.1.0
Reporter: Drawxy


Recently, I encountered a issue that there was always 1 partition having only 1 
ISR (no produce traffic on this topic). The bug is related to altering log dir. 
When replacing current log with future log, broker doesn't copy the leader 
epoch checkpoint cache, which records the current leader epoch and log start 
offset. The cache for each partition is updated only when appending new 
messages or becoming leader. If there is no traffic and the replica is already 
the leader, the cache will not be updated any more. However, the partition 
leader will fetch its leader epoch from the cache and compare with the leader 
epoch sent by follower when handling fetch request. If the former one is missed 
or less than the latter one, the leader will interrupt the process and return 
an OffsetOutOfRangeException to follower. The follower might be out of sync 
over time.

Take the following case as an example, all the key points are listed in 
chronological order:
 # Reassigner submitted a partition reassignment for partition foo-1
{quote}{"topic": "foo","partition": 1,"replicas": [5002,3003,4001],"logDirs": 
["\data\kafka-logs-0","any","any"]}{quote}
 # Reassignment completed immediately due to there is no traffic on this topic.
 # Controller sent LeaderAndISR requests to all the replicas.
 # Newly added replica 5002 became the new leader and the current log updated 
the leader epoch offset cache. Replica 5002 successfully handled the 
LeaderAndISR request.
 # Altering log dir completed and the newly updated current log didn't have 
leader epoch offset information.
 # Replica 5002 handled fetch requests (include fetch offset and current leader 
epoch) from followers and returned OffsetOutOfRangeException due to leader 
epoch offset cache hadn't been updated. So, the replica 5002 couldn't update 
the fetch state for each follower and reported ISRShrink later. The followers 
3003 and 4001 would repeatedly print the following log:

{quote}WARN [ReplicaFetcher replicaId=4001, leaderId=5002, fetcherId=2] Reset 
fetch offset for partition foo-1 from 231196 to current leader's start offset 
231196 (kafka.server.ReplicaFetcherThread)
INFO [ReplicaFetcher replicaId=4001, leaderId=5002, fetcherId=2] Current offset 
231196 for partition foo-1 is out of range, which typically implies a leader 
change. Reset fetch offset to 231196 (kafka.server.ReplicaFetcherThread)
{quote}
This issue arises only when all the three conditions are met:
 # No produce traffic on the partition.
 # Newly added replica become new leader.
 # LeaderAndISR request is handled successfully before altering log dir 
completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14684) Replace EasyMock and PowerMock with Mockito in WorkerSinkTaskThreadedTest

2023-10-16 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-14684.

Fix Version/s: 3.7.0
   Resolution: Fixed

> Replace EasyMock and PowerMock with Mockito in WorkerSinkTaskThreadedTest
> -
>
> Key: KAFKA-14684
> URL: https://issues.apache.org/jira/browse/KAFKA-14684
> Project: Kafka
>  Issue Type: Sub-task
>  Components: KafkaConnect
>Reporter: Hector Geraldino
>Assignee: Hector Geraldino
>Priority: Minor
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-988 Streams StandbyUpdateListener

2023-10-16 Thread Lucas Brutschy
Hi,

thanks again for the KIP!

+1 (binding)

Cheers,
Lucas



On Sun, Oct 15, 2023 at 9:13 AM Colt McNealy  wrote:
>
> Hello there,
>
> I'd like to call a vote on KIP-988 (co-authored by my friend and colleague
> Eduwer Camacaro). We are hoping to get it in before the 3.7.0 release.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-988%3A+Streams+Standby+Task+Update+Listener
>
> Cheers,
> Colt McNealy
>
> *Founder, LittleHorse.dev*


Re: [DISCUSS] KIP-988 Streams Standby Task Update Listener

2023-10-16 Thread Lucas Brutschy
Hi all,

it's a nice improvement! I don't have anything to add on top of the
previous comments, just came here to say that it seems to me consensus
has been reached and the result looks good to me.

Thanks Colt and Eduwer!
Lucas

On Sun, Oct 15, 2023 at 9:11 AM Colt McNealy  wrote:
>
> Thanks, Guozhang. I've updated the KIP and will start a vote.
>
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>
>
> On Sat, Oct 14, 2023 at 10:27 AM Guozhang Wang 
> wrote:
>
> > Thanks for the summary, that looks good to me.
> >
> > Guozhang
> >
> > On Fri, Oct 13, 2023 at 8:57 PM Colt McNealy  wrote:
> > >
> > > Hello there!
> > >
> > > Thanks everyone for the comments. There's a lot of back-and-forth going
> > on,
> > > so I'll do my best to summarize what everyone's said in TLDR format:
> > >
> > > 1. Rename `onStandbyUpdateStart()` -> `onUpdateStart()`,  and do
> > similarly
> > > for the other methods.
> > > 2. Keep `SuspendReason.PROMOTED` and `SuspendReason.MIGRATED`.
> > > 3. Remove the `earliestOffset` parameter for performance reasons.
> > >
> > > If that's all fine with everyone, I'll update the KIP and we—well, mostly
> > > Edu (:  —will open a PR.
> > >
> > > Cheers,
> > > Colt McNealy
> > >
> > > *Founder, LittleHorse.dev*
> > >
> > >
> > > On Fri, Oct 13, 2023 at 7:58 PM Eduwer Camacaro 
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > Thanks for all your feedback for this KIP!
> > > >
> > > > I think that the key to choosing proper names for this API is
> > understanding
> > > > the terms used inside the StoreChangelogReader. Currently, this class
> > has
> > > > two possible states: ACTIVE_RESTORING and STANDBY_UPDATING. In my
> > opinion,
> > > > using StandbyUpdateListener for the interface fits better on these
> > terms.
> > > > Same applies for onUpdateStart/Suspended.
> > > >
> > > > StoreChangelogReader uses "the same mechanism" for active task
> > restoration
> > > > and standby task updates, but this is an implementation detail. Under
> > > > normal circumstances (no rebalances or task migrations), the changelog
> > > > reader will be in STANDBY_UPDATING, which means it will be updating
> > standby
> > > > tasks as long as there are new records in the changelog topic. That's
> > why I
> > > > prefer onStandbyUpdated instead of onBatchUpdated, even if it doesn't
> > 100%
> > > > align with StateRestoreListener, but either one is fine.
> > > >
> > > > Edu
> > > >
> > > > On Fri, Oct 13, 2023 at 8:53 PM Guozhang Wang <
> > guozhang.wang...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Colt,
> > > > >
> > > > > Thanks for writing the KIP! I have read through the updated KIP and
> > > > > overall it looks great. I only have minor naming comments (well,
> > > > > aren't naming the least boring stuff to discuss and that takes the
> > > > > most of the time for KIPs :P):
> > > > >
> > > > > 1. I tend to agree with Sophie regarding whether or not to include
> > > > > "Standby" in the functions of "onStandbyUpdateStart/Suspended", since
> > > > > it is also more consistent with the functions of
> > > > > "StateRestoreListener" where we do not name it as
> > > > > "onStateRestoreState" etc.
> > > > >
> > > > > 2. I know in community discussions we sometimes say "a standby is
> > > > > promoted to active", but in the official code / java docs we did not
> > > > > have a term of "promotion", since what the code does is really
> > recycle
> > > > > the task (while keeping its state stores open), and create a new
> > > > > active task that takes in the recycled state stores and just changing
> > > > > the other fields like task type etc. After thinking about this for a
> > > > > bit, I tend to feel that "promoted" is indeed a better name for user
> > > > > facing purposes while "recycle" is more of a technical detail inside
> > > > > the code and could be abstracted away from users. So I feel keeping
> > > > > the name "PROMOTED" is fine.
> > > > >
> > > > > 3. Regarding "earliestOffset", it does feel like we cannot always
> > > > > avoid another call to the Kafka API. And on the other hand, I also
> > > > > tend to think that such bookkeeping may be better done at the app
> > > > > level than from the Streams' public API level. I.e. the app could
> > keep
> > > > > a "first ever starting offset" per "topic-partition-store" key, and a
> > > > > when we have rolling restart and hence some standby task keeps
> > > > > "jumping" from one client to another via task assignment, the app
> > > > > would update this value just one when it finds the
> > > > > ""topic-partition-store" was never triggered before. What do you
> > > > > think?
> > > > >
> > > > > 4. I do not have a strong opinion either, but what about
> > > > "onBatchUpdated" ?
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Oct 11, 2023 at 9:31 PM Colt McNealy 
> > > > wrote:
> > > > > >
> > > > > > Sohpie—
> > > > > >
> > > > > > Thank you very much for such a detailed review of the KIP. It might
> > > > > > actually be 

Re: [VOTE] KIP-714: Client metrics and observability

2023-10-16 Thread Andrew Schofield
The vote for KIP-714 has now concluded and the KIP is APPROVED.

The votes are:
Binding:
   +4 (Jason, Matthias, Sophie, Jun)
Non-binding:
   +3 (Milind, Kirk, Philip)
   -1 (Ryanne)

This KIP aims to improve monitoring and troubleshooting of client
performance by enabling clients to push metrics to brokers. The lack of
consistent telemetry across clients is an operational gap, and many cluster
operators do not have control over the clients. Often, asking the client owner
to change the configuration or even application code in order to troubleshoot
problems is not workable. This is why the KIP enables the broker to request
metrics from clients, giving a consistent, cross-platform mechanism.

The feature is enabled by configuring a metrics plugin on the brokers which
implements the ClientTelemetry interface. In the absence of a plugin with this
interface, the brokers do not even support the new RPCs in this KIP and the
clients will not attempt or be able to push metrics. So, a vanilla Apache Kafka
broker will not collect metrics.

I would like to make available an open-source implementation of the 
ClientTelemetry
interface that works with an open-source monitoring solution.

The KIP does put support for OTLP serialisation into the client, so there are
new dependencies in the Java client, which are bundled and relocated (shaded).
OTLP also opens up other use cases involving OpenTelemetry in the future, which
is emerging as the de facto standard for telemetry, and observability in 
general.

Thanks to everyone who has contributed to KIP-714 since Magnus Edenhill
kicked it all off in February 2021.

Andrew

> On 14 Oct 2023, at 01:52, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the KIP. +1 from me too.
>
> Jun
>
> On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman 
> wrote:
>
>> This looks great! +1 (binding)
>>
>> Sophie
>>
>> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax  wrote:
>>
>>> +1 (binding)
>>>
>>> On 9/13/23 5:48 PM, Jason Gustafson wrote:
 Hey Andrew,

 +1 on the KIP. For many users of Kafka, it may not be fully understood
>>> how
 much of a challenge client monitoring is. With tens of clients in a
 cluster, it is already difficult to coordinate metrics collection. When
 there are thousands of clients, and when the cluster operator has no
 control over them, it is essentially impossible. For the fat clients
>> that
 we have, the lack of useful telemetry is a huge operational gap.
 Consistency between clients has also been a major challenge. I think
>> the
 effort toward standardization in this KIP will have some positive
>> impact
 even in deployments which have effective client-side monitoring.
>>> Overall, I
 think this proposal will provide a lot of value across the board.

 Best,
 Jason

 On Wed, Sep 13, 2023 at 9:50 AM Philip Nee 
>> wrote:

> Hey Andrew -
>
> Thank you for taking the time to reply to my questions. I'm just
>> adding
> some notes to this discussion.
>
> 1. epoch: It can be helpful to know the delta of the client side and
>> the
> actual leader epoch.  It is helpful to understand why sometimes commit
> fails/client not making progress.
> 2. Client connection: If the client selects the "wrong" connection to
>>> push
> out the data, I assume the request would timeout; which should lead to
> disconnecting from the node and reselecting another node as you
>>> mentioned,
> via the least loaded node.
>
> Cheers,
> P
>
>
> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Philip,
>> Thanks for your vote and interest in the KIP.
>>
>> KIP-714 does not introduce any new client metrics, and that’s
> intentional.
>> It does
>> tell how that all of the client metrics can have their names
>>> transformed
>> into
>> equivalent "telemetry metric names”, and then potentially used in
>>> metrics
>> subscriptions.
>>
>> I am interested in the idea of client’s leader epoch in this context,
>>> but
>> I don’t have
>> an immediate plan for how best to do this, and it would take another
>>> KIP
>> to enhance
>> existing metrics or introduce some new ones. Those would then
>> naturally
> be
>> applicable to the metrics push introduced in KIP-714.
>>
>> In a similar vein, there are no existing client metrics specifically
>>> for
>> auto-commit.
>> We could add them to Kafka, but I really think this is just an
>> example
>>> of
>> asynchronous
>> commit in which the application has decided not to specify when the
> commit
>> should
>> begin.
>>
>> It is possible to increase the cadence of pushing by modifying the
>> interval.ms
>> configuration property of the CLIENT_METRICS resource.
>>
>> There is an “assigned-partitions” metric for each consumer, but not
>> 

[jira] [Resolved] (KAFKA-15572) Race condition between log roll and log rename

2023-10-16 Thread Lucian Ilie (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucian Ilie resolved KAFKA-15572.
-
Resolution: Fixed

> Race condition between log roll and log rename
> --
>
> Key: KAFKA-15572
> URL: https://issues.apache.org/jira/browse/KAFKA-15572
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.0.0
>Reporter: Lucian Ilie
>Priority: Major
> Attachments: kafka-alter-log-dir-nosuchfileexception.log
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> We are using a Kafka cluster with Zookeeper, deployed on top of Kubernetes, 
> using banzaicloud/koperator.
> We have multiple disks per broker.
> We are using Cruise Control remove disk operation in order to aggregate 
> multiple smaller disks into a single bigger disk. This CC operation is 
> calling Kafka admin with alter replica log dirs operation.
> During this operation, *the flush operation fails apparently randomly with 
> NoSuchFileException, while alterReplicaLogDirs is executed*. Attached a 
> sample of logs for the exception and the previous operations taking place.
> Will further detail the cause of this issue.
> Say we have 3 brokers:
>  * broker 101 with disks /kafka-logs1/kafka, /kafka-logs2/kafka and a bigger 
> disk /new-kafka-logs1/kafka
>  * broker 201 with same disks
>  * broker 301 with same disks
> When Cruise Control executes a remove disk operation, it calls Kafka 
> "adminClient.alterReplicaLogDirs(replicaAssignment)" with such an assignment 
> as to move all data from /kafka-logs1/kafka and /kafka-logs2/kafka to 
> /new-kafka-logs1/kafka.
> During the alter log dir operation, future logs are created (to move data 
> from e.g. "/kafka-logs1/kafka/topic-partition" to 
> "/new-kafka-logs1/kafka/topic-partition.hash-future"), data is moved and 
> finally the log dir will be renamed from 
> "/new-kafka-logs1/kafka/topic-partition.hash-future" to 
> "/new-kafka-logs1/kafka/topic-partition". This operation is started in 
> [UnifiedLog.renameDir|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/UnifiedLog.scala#L713]
>  and is locked using the [UnifiedLog 
> lock|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/UnifiedLog.scala#L268].
>  The rename is then delegated to 
> [LocalLog.renameDir|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/LocalLog.scala#L111-L113].
>  This is the 1st code part that is involved in the race condition.
> Meanwhile, log dirs can be rolled based on known conditions (e.g. getting 
> full), which will call 
> [UnifiedLog.roll|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/UnifiedLog.scala#L1547],
>  which is locked using the [UnifiedLog 
> lock|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/UnifiedLog.scala#L268].
>  However, the further delegation to UnifiedLog.flushUptoOffsetExclusive is 
> not sharing that lock, since it is [done as a scheduled task in a separate 
> thread|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/UnifiedLog.scala#L1547].
>  This means that further operations are [not locked at UnifiedLog 
> level|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/UnifiedLog.scala#L268].
>  The operation is further delegated to 
> [LocalLog.flush|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/LocalLog.scala#L177],
>  which will also try to [flush the log 
> dir|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/LocalLog.scala#L177].
>  This is the 2nd code part that is involved in the race condition.
> Since the log dir flush does not share the lock with the rename dir operation 
> (as it is scheduled via the scheduler), the rename dir operation might 
> succeed in moving the log dir on disk to "topic-partition", but the 
> LocalLog._dir will remain set to "topic-partition.hash-future", and when the 
> flush will attempt to flush the "topic-partition.hash-future" directory, it 
> will throw NoSuchFileException: "topic-partition.hash-future". Basically, 
> [this 
> line|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/LocalLog.scala#L111]
>  might succeed, and before [this other 
> line|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/LocalLog.scala#L111]
>  is executed, flush tries to [flush the future 
> dir|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/LocalLog.scala#L177].
> We tested a fix with a patch on Kafka 3.4.1, on our clusters and it solved 
> the issue by synchronizing the flush dir operation. Will reply with a link to 
> a PR.
> Note that this bug replicates for every version since 3.0.0, caused by [this 
> 

Re: KIP-991: Allow DropHeaders SMT to drop headers by wildcard/regexp

2023-10-16 Thread Roman Schmitz
Hi Andrew,

Ok, thanks for the feedback! I added a few more details and code examples
to explain the proposed changes.

Thanks,
Roman

Am So., 15. Okt. 2023 um 22:12 Uhr schrieb Andrew Schofield <
andrew_schofield_j...@outlook.com>:

> Hi Roman,
> Thanks for the KIP. I think it’s an interesting idea, but I think the KIP
> document needs some
> more details added before it’s ready for review. For example, here’s a KIP
> in the same
> area which was delivered in an earlier version of Kafka. I think this is a
> good KIP to copy
> for a suitable level of detail and description (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
> ).
>
> Hope this helps.
>
> Thanks,
> Andrew
>
> > On 15 Oct 2023, at 21:02, Roman Schmitz  wrote:
> >
> > Hi all,
> >
> > While working with different customers I came across the case several
> times
> > that we'd like to not only explicitly remove headers by name but by
> pattern
> > / regexp. Here is a KIP for this feature!
> >
> > Please let me know if you have any comments, questions, or suggestions!
> >
> > https://cwiki.apache.org/confluence/x/oYtEE
> >
> > Thanks,
> > Roman
>
>


Re: [VOTE] KIP-980: Allow creating connectors in a stopped state

2023-10-16 Thread Yash Mayya
Hi all,

Bumping up this vote thread - we have two binding +1 votes and one
non-binding +1 vote so far.

Thanks,
Yash

On Mon, Oct 9, 2023 at 11:57 PM Greg Harris 
wrote:

> Thanks Yash for the well written KIP!
>
> And thank you for finally adding JSON support to the standalone mode
> that isn't file extension sensitive. That will be very useful.
>
> +1 (binding)
>
> On Mon, Oct 9, 2023 at 10:45 AM Knowles Atchison Jr
>  wrote:
> >
> > This is super useful for pipeline setup!
> >
> > +1 (non binding)
> >
> > On Mon, Oct 9, 2023, 7:57 AM Chris Egerton 
> wrote:
> >
> > > Thanks for the KIP, Yash!
> > >
> > > +1 (binding)
> > >
> > > On Mon, Oct 9, 2023, 01:12 Yash Mayya  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to start a vote on KIP-980 which proposes allowing the
> creation
> > > of
> > > > connectors in a stopped (or paused) state.
> > > >
> > > > KIP -
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > >
> > > > Discussion Thread -
> > > > https://lists.apache.org/thread/om803vl191ysf711qm7czv94285rtt5d
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > >
>