[jira] [Created] (KAFKA-16849) ERROR Failed to read /opt/kafka/data/meta.properties

2024-05-28 Thread Agostino Sarubbo (Jira)
Agostino Sarubbo created KAFKA-16849:


 Summary: ERROR Failed to read /opt/kafka/data/meta.properties
 Key: KAFKA-16849
 URL: https://issues.apache.org/jira/browse/KAFKA-16849
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.8.1
 Environment: RockyLinux-9
openjdk version "1.8.0_412"
Reporter: Agostino Sarubbo


I'm running a kafka-2.8.1 cluster with 5 machines.
For how it is configured (I have set a replica of 3) I can completely destroy a 
machine and re-create it from scratch. After have re-created it, it joins again 
the cluster and everything works as before.

So, right now I'm trying to migrate the host from CentOS-7 to RockyLinux-9. The 
idea was destroy and re-create the machines one-by-one.

I have increased the loglevel to DEBUG and this is what I get:



```
[2024-05-28 07:35:09,287] INFO Registered kafka:type=kafka.Log4jController 
MBean (kafka.utils.Log4jControllerRegistration$) 
[2024-05-28 07:35:09,676] INFO Registered signal handlers for TERM, INT, HUP 
(org.apache.kafka.common.utils.LoggingSignalHandler) 
[2024-05-28 07:35:09,680] INFO starting (kafka.server.KafkaServer) 
[2024-05-28 07:35:09,680] INFO Connecting to zookeeper on 
zookeeper1.my.domain:2281,zookeeper2.my.domain:2281,zookeeper3.my.domain:2281,zookeeper4.my.domain:2281,zookeeper5.my.domain:
2281 (kafka.server.KafkaServer) 
[2024-05-28 07:35:09,681] DEBUG Checking login config for Zookeeper JAAS 
context [java.security.auth.login.config=null, 
zookeeper.sasl.client=default:true, zookeeper.sasl.clientconfig
=default:Client] (org.apache.kafka.common.security.JaasUtils) 
[2024-05-28 07:35:09,695] INFO [ZooKeeperClient Kafka server] Initializing a 
new session to 
zookeeper1.my.domain:2281,zookeeper2.my.domain:2281,zookeeper3.my.domain:2281,zookeeper4.my
.domain:2281,zookeeper5.my.domain:2281. (kafka.zookeeper.ZooKeeperClient) 
[2024-05-28 07:35:09,758] DEBUG Initializing task scheduler. 
(kafka.utils.KafkaScheduler) 
[2024-05-28 07:35:09,759] INFO [ZooKeeperClient Kafka server] Waiting until 
connected. (kafka.zookeeper.ZooKeeperClient) 
[2024-05-28 07:35:10,051] DEBUG [ZooKeeperClient Kafka server] Received event: 
WatchedEvent state:SyncConnected type:None path:null 
(kafka.zookeeper.ZooKeeperClient) 
[2024-05-28 07:35:10,053] INFO [ZooKeeperClient Kafka server] Connected. 
(kafka.zookeeper.ZooKeeperClient) 
[2024-05-28 07:35:10,139] INFO [feature-zk-node-event-process-thread]: Starting 
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread) 
[2024-05-28 07:35:10,144] DEBUG Reading feature ZK node at path: /feature 
(kafka.server.FinalizedFeatureChangeListener) 
[2024-05-28 07:35:10,251] INFO Updated cache from existing  to latest 
FinalizedFeaturesAndEpoch(features=Features{}, epoch=1). 
(kafka.server.FinalizedFeatureCache) 
[2024-05-28 07:35:10,255] INFO Cluster ID = 3G4teZlrS-uT6-Sk5MkbPQ 
(kafka.server.KafkaServer) 
[2024-05-28 07:35:10,258] ERROR Failed to read /opt/kafka/data/meta.properties 
(kafka.server.BrokerMetadataCheckpoint$) 
java.nio.file.AccessDeniedException: /opt/kafka/data/meta.properties.tmp 
   at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) 
   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) 
   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) 
   at 
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) 
   at 
sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
 
   at java.nio.file.Files.deleteIfExists(Files.java:1165) 
   at 
kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:224) 
   at 
kafka.server.BrokerMetadataCheckpoint$.$anonfun$getBrokerMetadataAndOfflineDirs$2(BrokerMetadataCheckpoint.scala:158)
 
   at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
   at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) 
   at 
kafka.server.BrokerMetadataCheckpoint$.getBrokerMetadataAndOfflineDirs(BrokerMetadataCheckpoint.scala:153)
 
   at kafka.server.KafkaServer.startup(KafkaServer.scala:206) 
   at kafka.Kafka$.main(Kafka.scala:109) 
   at kafka.Kafka.main(Kafka.scala) 
[2024-05-28 07:35:10,326] INFO KafkaConfig values:  
   advertised.host.name = null 
   advertised.listeners = null 
   advertised.port = null 
   alter.config.policy.class.name = null 
   alter.log.dirs.replication.quota.window.num = 11 
   alter.log.dirs.replication.quota.window.size.seconds = 1 
   authorizer.class.name =  
   auto.create.topics.enable = true 
   auto.leader.rebalance.enable = true 
   background.threads = 10 
   broker.heartbeat.interval.ms = 2000 
   broker.id = 1 

Re: Action requested: Changes to CI for JDK 11 & 17 builds on Pull Requests

2024-05-28 Thread Chia-Ping Tsai
Please take a look at following QA. ALL PASS!!!

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15889/25/pipeline

I almost cried, and BIG thanks to Harris!!!

On 2024/05/28 03:20:01 Greg Harris wrote:
> Hello Apache Kafka Developers,
> 
> In order to better utilize scarce CI resources shared with other Apache
> projects, the Kafka project will no longer be running full test suites for
> the JDK 11 & 17 components of PR builds.
> 
> *Action requested: If you have an active pull request, please merge or
> rebase the latest trunk into your branch* before continuing development as
> normal. You may wait to push the resulting branch until you make another
> commit, or push the result immediately.
> 
> What to expect with this change:
> * Trunk (and release branch) builds will not be affected.
> * JDK 8 and 21 builds will not be affected.
> * Compilation will not be affected.
> * Static analysis (spotbugs, checkstyle, etc) will not be affected.
> * Overall build execution time should be similar or slightly better than
> before.
> * You can expect fewer tests to be run on your PRs (~6 instead of
> ~12).
> * Test flakiness should be similar or slightly better than before.
> 
> And as a reminder, build failures (red indicators in CloudBees) are always
> blockers for merging. Starting now, the 11 and 17 builds should always pass
> (green indicators in CloudBees) before merging, as failed tests (yellow
> indicators in CloudBees) should no longer be present.
> 
> Thanks everyone,
> Greg Harris
> 


Re: Action requested: Changes to CI for JDK 11 & 17 builds on Pull Requests

2024-05-28 Thread Luke Chen
Wow! I've never seen this beautiful green on jenkins for years!
Thanks Greg!!

Luke

On Tue, May 28, 2024 at 4:12 PM Chia-Ping Tsai  wrote:

> Please take a look at following QA. ALL PASS!!!
>
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15889/25/pipeline
>
> I almost cried, and BIG thanks to Harris!!!
>
> On 2024/05/28 03:20:01 Greg Harris wrote:
> > Hello Apache Kafka Developers,
> >
> > In order to better utilize scarce CI resources shared with other Apache
> > projects, the Kafka project will no longer be running full test suites
> for
> > the JDK 11 & 17 components of PR builds.
> >
> > *Action requested: If you have an active pull request, please merge or
> > rebase the latest trunk into your branch* before continuing development
> as
> > normal. You may wait to push the resulting branch until you make another
> > commit, or push the result immediately.
> >
> > What to expect with this change:
> > * Trunk (and release branch) builds will not be affected.
> > * JDK 8 and 21 builds will not be affected.
> > * Compilation will not be affected.
> > * Static analysis (spotbugs, checkstyle, etc) will not be affected.
> > * Overall build execution time should be similar or slightly better than
> > before.
> > * You can expect fewer tests to be run on your PRs (~6 instead of
> > ~12).
> > * Test flakiness should be similar or slightly better than before.
> >
> > And as a reminder, build failures (red indicators in CloudBees) are
> always
> > blockers for merging. Starting now, the 11 and 17 builds should always
> pass
> > (green indicators in CloudBees) before merging, as failed tests (yellow
> > indicators in CloudBees) should no longer be present.
> >
> > Thanks everyone,
> > Greg Harris
> >
>


[jira] [Resolved] (KAFKA-16805) Stop using a ClosureBackedAction to configure Spotbugs reports

2024-05-28 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16805.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Stop using a ClosureBackedAction to configure Spotbugs reports
> --
>
> Key: KAFKA-16805
> URL: https://issues.apache.org/jira/browse/KAFKA-16805
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Greg Harris
>Assignee: 黃竣陽
>Priority: Major
>  Labels: newbie
> Fix For: 3.8.0
>
>
> The org.gradle.util.ClosureBackedAction type has been deprecated.    
> This is scheduled to be removed in Gradle 9.0.    
> [Documentation|https://docs.gradle.org/8.7/userguide/upgrading_version_7.html#org_gradle_util_reports_deprecations]
>     
> 1 usage    
> [https://github.com/apache/kafka/blob/95adb7bfbfc69b3e9f3538cc5d6f7c6a577d30ee/build.gradle#L745-L749]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16850) KRaft - Add v2 of TopicRecord

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16850:
-

 Summary: KRaft - Add v2 of TopicRecord
 Key: KAFKA-16850
 URL: https://issues.apache.org/jira/browse/KAFKA-16850
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16852) Add *.thread.pool.size

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16852:
-

 Summary: Add *.thread.pool.size
 Key: KAFKA-16852
 URL: https://issues.apache.org/jira/browse/KAFKA-16852
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Add the remote.log.manager.copier.thread.pool.size and 
remote.log.manager.expiration.thread.pool.size configurations as internal-only



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16851) Add remote.log.disable.policy

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16851:
-

 Summary: Add remote.log.disable.policy
 Key: KAFKA-16851
 URL: https://issues.apache.org/jira/browse/KAFKA-16851
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Add the configuration as internal-only to begin with. Do not wire it to 
anything yet, just ensure that it is settable dynamically



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16853) Split RemoteLogManagerScheduledThreadPool

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16853:
-

 Summary: Split RemoteLogManagerScheduledThreadPool
 Key: KAFKA-16853
 URL: https://issues.apache.org/jira/browse/KAFKA-16853
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

To begin with create just the RemoteDataExpirationThreadPool and move 
expiration to it. Keep all settings as if the only thread pool was the 
RemoteLogManagerScheduledThreadPool. Ensure that the new thread pool is wired 
correctly to the RemoteLogManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16854) Zookeeper - Add v5 of StopReplica

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16854:
-

 Summary: Zookeeper - Add v5 of StopReplica
 Key: KAFKA-16854
 URL: https://issues.apache.org/jira/browse/KAFKA-16854
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16855) KRaft - Wire replaying a TopicRecord

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16855:
-

 Summary: KRaft - Wire replaying a TopicRecord
 Key: KAFKA-16855
 URL: https://issues.apache.org/jira/browse/KAFKA-16855
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Replaying a TopicRecord containing a new TieredEpoch and TieredState needs to 
interact with the two thread pools in the RemoteLogManager to add/remove the 
correct tasks from each



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16856) Zookeeper - Add new exception

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16856:
-

 Summary: Zookeeper - Add new exception
 Key: KAFKA-16856
 URL: https://issues.apache.org/jira/browse/KAFKA-16856
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Add TIERED_STORAGE_DISABLEMENT_IN_PROGRESS exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16857) Zookeeper - Add new ZNodes

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16857:
-

 Summary: Zookeeper - Add new ZNodes
 Key: KAFKA-16857
 URL: https://issues.apache.org/jira/browse/KAFKA-16857
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Additional information needs to be stored in new ZNodes as part of disablement. 
Ensure that said information makes it into Zookeeper.
{code:java}
/brokers/topics/{topic-name}/partitions
/tieredstorage/  /tiered_epoch  
/state {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

2024-05-28 Thread Muralidhar Basani
Hi all,

Any other suggestions or objections to the proposal?

Thanks,
Murali

On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani <
muralidhar.bas...@aiven.io> wrote:

> Thanks Greg.
> I have updated KIP with details on optimisation of LITERAL too.
>
> Regarding limitations in performance by introducing MATCH is definitely a
> question.
> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I think.
> (described in kip)
> - MATCH can be introduced only with a configuration. So by default, we can
> disable the MATCH check (ex : acls.pattern.match.enable=false), and if
> customer enables the config, only then add the lookup in the matchingAcls()
> - With the proposal already described in KIP for MATCH, we may not have
> any limitations., rather will be efficient.
>
> Maybe we are in good shape with these propositions
>
> Thanks,
> Murali
>
>
>
> On Tue, May 21, 2024 at 6:15 PM Greg Harris 
> wrote:
>
>> Hi Murali,
>>
>> I don't have a trie library in mind. I looked at the current
>> implementation of the StandardAuthorizer and found that we are already
>> benefiting from the prefix structure in the implementation [1]. The
>> current implementation appears to be a TreePSet [2].
>>
>> Now, we've already made this tradeoff once with PREFIX: Prefixes are
>> less structured than literals, because with literals you can use a
>> hashing algorithm to jump directly to your relevant ACLs in O(1), but
>> with a prefix you either need to do multiple lookups, or some sort of
>> O(log(n)) lookup. And we determined that the ultimate limitation in
>> performance was worth it for the expressiveness.
>> We're making this tradeoff again with MATCH acls: wildcards are less
>> structured than prefixes or literals, because of the reasons I
>> mentioned earlier. We need to judge now if the ultimate limitation in
>> performance is worth it.
>>
>> I think your strategy for using the optimized layout for prefix and
>> literal matches is smart, because there does seem to be a gap in
>> performance possible. It makes me wonder why the optimized layout for
>> literals was not used when prefixes were added. Literal lookups still
>> go through the tree lookup, when they could be moved to a hash-lookup
>> instead.
>> That would allow users to "choose" for themselves on a convenience vs
>> performance scale: Smaller use-cases can add a single convenient
>> MATCH, and larger use-cases can add the multiple optimized PREFIXes.
>>
>> [1]
>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339
>> [2]
>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34
>>
>> Thanks,
>> Greg
>>
>> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani
>>  wrote:
>> >
>> > @greg which library of trie you were thinking of ? There is one in the
>> > commons-collection I see.
>> >
>> > On Fri, May 17, 2024 at 3:55 PM Claude Warren  wrote:
>> >
>> > > >
>> > > > This has implications for execution complexity: If we can't compute
>> > > > whether two patterns overlap, then we need to run both of them on
>> each
>> > > > piece of input to test if they both match. Under the current
>> > > > LITERAL/PREFIX system, we can optimize execution with a trie, but
>> that
>> > > > option wouldn't be available to us with MATCH.
>> > > >
>> > >
>> > > If we consider the case of an asterisk representing 1 or more
>> characters
>> > > then determining if 2 patterns overlap can be fairly simple and very
>> fast.
>> > > Let's call the text from the ACL the target, and the text from the
>> wildcard
>> > > matcher the candidate.
>> > >
>> > > When a wildcard pattern, excluding '*',  is created:
>> > >
>> > >- the candidate text is broken into fragments separated by
>> characters
>> > >that are not Character.isLetterOrDigit() (See
>> > >https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html
>> ).
>> > >- fragments that contain 1 character are ignored.
>> > >- fragments that contains 2 or more characters are scanned and
>> every
>> > >every pair of characters used to create a Hasher (See
>> > >
>> > >
>> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
>> > > )
>> > >that hasher is added to a Bloom filter associated with the wildcard
>> > > pattern.
>> > >
>> > > When a target is being evaluated and matching wildcard entries are to
>> be
>> > > located. Split and create a bloom filter using entry and the same
>> strategy
>> > > as for the wildcard patterns above.  These bloom filters will have
>> had more
>> > > pairs of characters added than for a matching wildcard pattern.
>> > >
>> > > Each filter contains the pattern for the unique pairs in the
>> fragments of
>> > > the original text:
>> > >
>> > > Now we 

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-28 Thread Mickael Maison
Hi,

I agree with Chia-Ping, I think we could drop the ZK variant
altogether, especially if this is not going to make it in 3.8.0.
Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of
new ZooKeeper-related code in that release to delete it all right
after in 4.0.

Thanks,
Mickael

On Fri, May 24, 2024 at 5:03 PM Christo Lolov  wrote:
>
> Hello!
>
> I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping and
> Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews!
>
> Realistically, I don't think I have the bandwidth to get this in 3.8.0.
> Due to this, I will mark tentatively the Zookeeper part for 3.9 if the
> community decides that they do in fact want one more 3.x release.
> I will mark the KRaft part as ready to be started and aiming for either 4.0
> or 3.9.
>
> Best,
> Christo


[jira] [Created] (KAFKA-16858) Flatten SMT throws NPE

2024-05-28 Thread Adam Strickland (Jira)
Adam Strickland created KAFKA-16858:
---

 Summary: Flatten SMT throws NPE
 Key: KAFKA-16858
 URL: https://issues.apache.org/jira/browse/KAFKA-16858
 Project: Kafka
  Issue Type: Bug
  Components: connect
Affects Versions: 3.6.0
 Environment: Kafka 3.6 by way of CP 7.6.0
Reporter: Adam Strickland
 Attachments: FlattenTest.java

{{ConnectSchema.expectedClassesFor}} sometimes will throw an NPE as part of a 
call to an SMT chain.  Stack trace snippet:

{{at 
com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.apply(MomentFlatten.java:84)}}
{{at 
com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.applyWithSchema(MomentFlatten.java:173)}}
{{at 
com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:280)}}
{{at 
com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:280)}}
{{at 
com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:280)}}
{{at 
com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:274)}}
{{at org.apache.kafka.connect.data.Struct.put(Struct.java:203)}}
{{at org.apache.kafka.connect.data.Struct.put(Struct.java:216)}}
{{at 
org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:255)}}
{{at 
org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:213)}}
{{at 
org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:224)}}
{{at 
org.apache.kafka.connect.data.ConnectSchema.expectedClassesFor(ConnectSchema.java:268)}}

(the above transform is a sub-class of 
{{{}o.a.k.connect.transforms.Flatten{}}}; have confirmed that the error occurs 
regardless).

The field being transformed is an array of structs. If the call to 
{{Schema#valueSchema()}} (o.a.k.connect.data.ConnectSchema.java:255) returns 
{{{}null{}}}, the subsequent call to {{Schema#name()}} at 
o.a.k.connect.data.ConnectSchema:268 throws an NPE.

The strange thing that we have observed is that this doesn't always happen; 
*sometimes* the struct's schema is found and sometimes it is not. We have been 
unable to determine the root cause, but have constructed a test that replicates 
the problem as observed (see attachment).

In our case we have worked around the issue with the aforementioned sub-class 
of {{{}Flatten{}}}, catching and logging the NPE on that specific use-case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6208) Reduce startup time for Kafka Connect workers

2024-05-28 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-6208.

Fix Version/s: 3.6.0
   Resolution: Fixed

This is fixed by setting plugin.discovery=service_load on 3.6+, see 
KAFKA-14627/KIP-898 for more details.

> Reduce startup time for Kafka Connect workers
> -
>
> Key: KAFKA-6208
> URL: https://issues.apache.org/jira/browse/KAFKA-6208
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Affects Versions: 1.0.0
>Reporter: Randall Hauch
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> Kafka Connect startup times are excessive with a handful of connectors on the 
> plugin path or classpath. We should not be scanning three times (once for 
> connectors, once for SMTs, and once for converters), and hopefully we can 
> avoid scanning directories that are clearly not plugin directories. 
> We should also consider using Java's Service Loader to quickly identify 
> connectors. The latter would require a KIP and would require time to for 
> connectors to migrate, but we could be smarter about only scanning plugin 
> directories that need to be scanned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-5451) Kafka Connect should scan classpath asynchronously

2024-05-28 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-5451.

Resolution: Won't Do

Kafka Connect needs the results of plugin scanning to answer basic REST 
queries, and to be assigned workloads via joining the group. Without finalized 
scan results, neither of these operations can meaningfully complete.

Rather than make the scanning asynchronous, we have elected to make it faster 
via KAFKA-14627/KIP-898. It no longer makes sense to async-process something 
that takes <1s.

> Kafka Connect should scan classpath asynchronously
> --
>
> Key: KAFKA-5451
> URL: https://issues.apache.org/jira/browse/KAFKA-5451
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>  Labels: performance
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When Kafka Connect workers start up, they scan the classpath and module paths 
> for connectors, transformations, and converters. This takes anywhere from 
> 15-30sec or longer depending upon how many JARs are included. Currently, this 
> scanning is done synchronously during startup of the Kafka Connect workers, 
> even though the workers may not need the result of the scan.
> The scanning logic should be asynchronous and should only block any 
> components that require the result of the scan. This will improve startup 
> time of the workers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16759) Invalid client telemetry transition on consumer close

2024-05-28 Thread Andrew Schofield (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schofield resolved KAFKA-16759.
--
Resolution: Fixed

> Invalid client telemetry transition on consumer close
> -
>
> Key: KAFKA-16759
> URL: https://issues.apache.org/jira/browse/KAFKA-16759
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Minor
> Fix For: 3.8.0
>
>
> Using the console consumer with client telemetry enabled, I hit an invalid 
> state transition when closing the consumer with CTRL-C. The consumer sends a 
> final "terminating" telemetry push which puts the client telemetry reporter 
> into TERMINATING_PUSH_IN_PROGRESS state. When it receives a response in this 
> state, it attempts an invalid state transition.
>  
> {noformat}
> [2024-05-13 19:19:35,804] WARN Error updating client telemetry state, 
> disabled telemetry 
> (org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter)
> java.lang.IllegalStateException: Invalid telemetry state transition from 
> TERMINATING_PUSH_IN_PROGRESS to PUSH_NEEDED; the valid telemetry state 
> transitions from TERMINATING_PUSH_IN_PROGRESS are: TERMINATED
>   at 
> org.apache.kafka.common.telemetry.ClientTelemetryState.validateTransition(ClientTelemetryState.java:163)
>   at 
> org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.maybeSetState(ClientTelemetryReporter.java:827)
>   at 
> org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.handleResponse(ClientTelemetryReporter.java:520)
>   at 
> org.apache.kafka.clients.NetworkClient$TelemetrySender.handleResponse(NetworkClient.java:1321)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:948)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:594)
>   at 
> org.apache.kafka.clients.consumer.internals.NetworkClientDelegate.poll(NetworkClientDelegate.java:130)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.sendUnsentRequests(ConsumerNetworkThread.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:275)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:95)
> [2024-05-13 19:19:35,805] WARN Unable to transition state after successful 
> push telemetry from state TERMINATING_PUSH_IN_PROGRESS 
> (org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2939

2024-05-28 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16515) Fix the ZK Metadata cache use of voter static configuration

2024-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-16515.

Resolution: Fixed

> Fix the ZK Metadata cache use of voter static configuration
> ---
>
> Key: KAFKA-16515
> URL: https://issues.apache.org/jira/browse/KAFKA-16515
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: José Armando García Sancio
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.8.0
>
>
> Looks like because of ZK migration to KRaft the ZK Metadata cache was changed 
> to read the voter static configuration. This needs to change to use the voter 
> nodes reported by  the raft manager or the kraft client.
> The injection code is in KafkaServer where it constructs 
> MetadataCache.zkMetadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16516) Fix the controller node provider for broker to control channel

2024-05-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-16516.

Resolution: Fixed

> Fix the controller node provider for broker to control channel
> --
>
> Key: KAFKA-16516
> URL: https://issues.apache.org/jira/browse/KAFKA-16516
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: José Armando García Sancio
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.8.0
>
>
> The broker to controller channel gets the set of voters directly from the 
> static configuration. This needs to change so that the leader nodes comes 
> from the kraft client/manager.
> The code is in KafkaServer where it construct the RaftControllerNodeProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16859) Cleanup check if tiered storage is enabled

2024-05-28 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-16859:
--

 Summary: Cleanup check if tiered storage is enabled
 Key: KAFKA-16859
 URL: https://issues.apache.org/jira/browse/KAFKA-16859
 Project: Kafka
  Issue Type: Task
Reporter: Mickael Maison


We have 2 ways to detect whether tiered storage is enabled:
- KafkaConfig.isRemoteLogStorageSystemEnabled
- KafkaConfig.remoteLogManagerConfig().enableRemoteStorageSystem()

We use both in various files. We should stick with one way to do it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16796) Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder

2024-05-28 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16796.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Introduce new org.apache.kafka.tools.api.Decoder to replace 
> kafka.serializer.Decoder
> 
>
> Key: KAFKA-16796
> URL: https://issues.apache.org/jira/browse/KAFKA-16796
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: need-kip
> Fix For: 3.8.0
>
>
> We need a replacement in order to complete 
> https://issues.apache.org/jira/browse/KAFKA-14579 in kafak 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-05-28 Thread Sophie Blee-Goldman
Hey all,

Two more quick updates to the KIP, please let me know if you have any
questions or feedback or naming suggestions:

1. We'd like to introduce an additional error code with the following
signature:
 * MISSING_PROCESS_ID: A ProcessId present in the input ApplicationState
was not present in the output TaskAssignment

2. While implementing the new TaskInfo class, specifically the
#sourceTopicPartitions and #changelogTopicPartitions APIs, we realized that
the source topic changelog optimization would create some overlap between
these two sets, which might be confusing for users as the API seems to
suggest these are disjoint sets. To make this distinction more clear, we
would like to introduce another small container class called the
TaskTopicPartition, which just contains metadata about how a TopicPartition
relates to a given task, such as whether it is a source topic and whether
it is a changelog topic. The TaskInfo API will then be simplified by
removing the separate #inputTopicPartitions, #changelogTopicPartitions, and
#partitionToRackIds methods, and replacing these with a single method:

Set topicPartitions();

Please refer to the updated KIP for the complete definition of the new
TaskTopicPartition class


Thanks!
Sophie


On Wed, May 15, 2024 at 3:41 PM Sophie Blee-Goldman 
wrote:

> Thanks Bruno!
>
> First, need to make one quick fix to what I said in the previous email --
> the new rackId() getter will be added to KafkaStreamsState, not
> KafkaStreamsApplication (The KIP is correct, but what I put in the email
> was not)
>
> U1. I would actually prefer to keep the constructors as is, for reasons I
> realize I forgot to mention. Let me know if this makes sense to you or you
> would still prefer to break up the constructors anyways:
>
> The KafkaStreamsApplication class has two required parameters and one
> optional one. The required params are of course the processId and
> assignment so imo it would not make sense to break these up across two
> different constructors, since both have to be supplied. The
> followupRebalanceDeadline on the other hand is purely optional, which is
> why that one is in a separate, non-static constructor
>
> Re: for vs of, unfortunately for is a protected keyword in java. I'm open
> to other naming suggestions though. I actually personally prefer the more
> direct naming style, eg something like #ofProcessIdAndAssignment (or
> #forProcessIdAndAssignment if you'd prefer?) But we tend to trends towards
> the shorter fluent naming style and others have mentioned a preference for
> names like "#of" in the past. I'm happy with any name (any name that's
> allowed in java, that is :P)
>
> U3: This is a very good point, and in fact the existence of TaskMetadata
> was why I went with TaskInfo here. I'm not super happy with it but at least
> this TaskInfo is pretty well encapsulated in the assignment code and
> generally won't be mixed up with the TaskMetadata/Task/etc part of the
> code. If anyone would like to submit a KIP to clean all this up at some
> point, I would definitely be supportive!
>
> On Wed, May 15, 2024 at 2:44 AM Bruno Cadonna  wrote:
>
>> Hi,
>>
>> Thank you for the updates!
>>
>> I have a couple of comments.
>>
>> U1
>> Yeah, that makes sense to me. However, could we break out the assignment
>> part to make it more readable? Something like:
>>
>>
>> KafkaStreamsAssignment.for(processId).withAssignment(assignment).withFollowRebalance(rebalanceDeadline)
>>
>> nits:
>> U1.a I slightly prefer "for" compared to "of", but I leave the decision
>> to others.
>> U1.b An alternative to "withAssignment()" could be simply "with()".
>>
>>
>> U3
>> Yeah, I like the TaskInfo approach. However, task metadata interfaces
>> start to proliferate a bit too much in our code base. We have
>> TaskMetadata, TaskInfo, and finally Task that provide similar methods. I
>> think we should try to consolidate those interfaces. Does not need to
>> happen in this KIP, but we should consider it.
>>
>> Best,
>> Bruno
>>
>>
>> On 5/15/24 6:01 AM, Sophie Blee-Goldman wrote:
>> > Hey all,
>> >
>> > I have a few updates to mention that have come up during the
>> implementation
>> > phase. The KIP should reflect the latest proposal and what has been
>> merged
>> > so far, but I'll list everything again here so that you don't have to
>> sift
>> > through the entire KIP:
>> >
>> > U1: The KafkaStreamsAssignment interface was converted to a class, and
>> two
>> > public constructors were added in keeping with the fluent API used
>> > elsewhere in Streams:
>> >
>> >
>> >> public  class KafkaStreamsAssignment {
>> >
>> >  public static KafkaStreamsAssignment of(final ProcessId processId,
>> >> final Set assignment);
>> >>
>> >>  public KafkaStreamsAssignment withFollowupRebalance(final Instant
>> >> rebalanceDeadline);
>> >
>> >   }
>> >
>> >
>> > U2: Any lag-related APIs in the KafkaStreamsState interface will throw
>> an
>> > UnsupportedOperationException if the user opted out of computing 

Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-05-28 Thread Sophie Blee-Goldman
Ah, one more very small thing:

3. We changed the name of a KafkaStreamsAssignment method from #assignment
to just #tasks. The new signature is

 public Set tasks();

The reason for this is that the term "assignment" is used a lot already,
and if we call the object itself an "assignment" then we should refer to
the specific tasks that make up this assignment as just the "tasks"

Also, with the original name, this is a valid but very silly sounding
method call chain: TaskAssignment.assignment().get(0).assignment() (like I
said, too much "assignment" in the mix)

On Tue, May 28, 2024 at 1:13 PM Sophie Blee-Goldman 
wrote:

> Hey all,
>
> Two more quick updates to the KIP, please let me know if you have any
> questions or feedback or naming suggestions:
>
> 1. We'd like to introduce an additional error code with the following
> signature:
>  * MISSING_PROCESS_ID: A ProcessId present in the input ApplicationState
> was not present in the output TaskAssignment
>
> 2. While implementing the new TaskInfo class, specifically the
> #sourceTopicPartitions and #changelogTopicPartitions APIs, we realized that
> the source topic changelog optimization would create some overlap between
> these two sets, which might be confusing for users as the API seems to
> suggest these are disjoint sets. To make this distinction more clear, we
> would like to introduce another small container class called the
> TaskTopicPartition, which just contains metadata about how a TopicPartition
> relates to a given task, such as whether it is a source topic and whether
> it is a changelog topic. The TaskInfo API will then be simplified by
> removing the separate #inputTopicPartitions, #changelogTopicPartitions, and
> #partitionToRackIds methods, and replacing these with a single method:
>
> Set topicPartitions();
>
> Please refer to the updated KIP for the complete definition of the new
> TaskTopicPartition class
>
>
> Thanks!
> Sophie
>
>
> On Wed, May 15, 2024 at 3:41 PM Sophie Blee-Goldman 
> wrote:
>
>> Thanks Bruno!
>>
>> First, need to make one quick fix to what I said in the previous email --
>> the new rackId() getter will be added to KafkaStreamsState, not
>> KafkaStreamsApplication (The KIP is correct, but what I put in the email
>> was not)
>>
>> U1. I would actually prefer to keep the constructors as is, for reasons I
>> realize I forgot to mention. Let me know if this makes sense to you or you
>> would still prefer to break up the constructors anyways:
>>
>> The KafkaStreamsApplication class has two required parameters and one
>> optional one. The required params are of course the processId and
>> assignment so imo it would not make sense to break these up across two
>> different constructors, since both have to be supplied. The
>> followupRebalanceDeadline on the other hand is purely optional, which is
>> why that one is in a separate, non-static constructor
>>
>> Re: for vs of, unfortunately for is a protected keyword in java. I'm open
>> to other naming suggestions though. I actually personally prefer the more
>> direct naming style, eg something like #ofProcessIdAndAssignment (or
>> #forProcessIdAndAssignment if you'd prefer?) But we tend to trends towards
>> the shorter fluent naming style and others have mentioned a preference for
>> names like "#of" in the past. I'm happy with any name (any name that's
>> allowed in java, that is :P)
>>
>> U3: This is a very good point, and in fact the existence of TaskMetadata
>> was why I went with TaskInfo here. I'm not super happy with it but at least
>> this TaskInfo is pretty well encapsulated in the assignment code and
>> generally won't be mixed up with the TaskMetadata/Task/etc part of the
>> code. If anyone would like to submit a KIP to clean all this up at some
>> point, I would definitely be supportive!
>>
>> On Wed, May 15, 2024 at 2:44 AM Bruno Cadonna  wrote:
>>
>>> Hi,
>>>
>>> Thank you for the updates!
>>>
>>> I have a couple of comments.
>>>
>>> U1
>>> Yeah, that makes sense to me. However, could we break out the assignment
>>> part to make it more readable? Something like:
>>>
>>>
>>> KafkaStreamsAssignment.for(processId).withAssignment(assignment).withFollowRebalance(rebalanceDeadline)
>>>
>>> nits:
>>> U1.a I slightly prefer "for" compared to "of", but I leave the decision
>>> to others.
>>> U1.b An alternative to "withAssignment()" could be simply "with()".
>>>
>>>
>>> U3
>>> Yeah, I like the TaskInfo approach. However, task metadata interfaces
>>> start to proliferate a bit too much in our code base. We have
>>> TaskMetadata, TaskInfo, and finally Task that provide similar methods. I
>>> think we should try to consolidate those interfaces. Does not need to
>>> happen in this KIP, but we should consider it.
>>>
>>> Best,
>>> Bruno
>>>
>>>
>>> On 5/15/24 6:01 AM, Sophie Blee-Goldman wrote:
>>> > Hey all,
>>> >
>>> > I have a few updates to mention that have come up during the
>>> implementation
>>> > phase. The KIP should reflect the latest proposal and what has been

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2940

2024-05-28 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-1035: StateStore managed changelog offsets

2024-05-28 Thread Matthias J. Sax
Sounds like a good plan. -- I think we are still wrapping up 3.8 
release, but would also like to move forward with with one.


Should we start a VOTE?

For merging PRs we need to wait after code freeze, and 3.8 branch was 
but. But we could start reviewing PRs before this already.



-Matthias

On 5/17/24 3:05 AM, Nick Telford wrote:

Hi everyone,

As discussed on the Zoom call, we're going to handle rebalance meta-data by:

- On start-up, Streams will open each store and read its changelog offsets
into an in-memory cache. This cache will be shared among all StreamThreads.
- On rebalance, the cache will be consulted for Task offsets for any Task
that is not active on any instance-local StreamThreads. If the Task is
active on *any* instance-local StreamThread, we will report the Task lag as
"up to date" (i.e. -1), because we know that the local state is currently
up-to-date.

We will avoid caching offsets across restarts in the legacy ".checkpoint"
file, so that we can eliminate the logic for handling this class. If
performance of opening/closing many state stores is poor, we can
parallelise it by forking off a thread for each Task directory when reading
the offsets.

I'll update the KIP later today to reflect this design, but I will try to
keep it high-level, so that the exact implementation can vary.

Regards,

Nick

On Thu, 16 May 2024 at 03:12, Sophie Blee-Goldman 
wrote:


103: I like the idea of immediately deprecating #managesOffsets and aiming
to make offset management mandatory in the long run. I assume we would also
log a warning for any custom stores that return "false" from this method to
encourage custom store implementations to start doing so? My only
question/concern is that if we want folks to start managing their own
offsets then we should make this transition easy for them, perhaps by
exposing some public utility APIs for things that are currently handled by
Kafka Streams such as reading/writing checkpoint files. Maybe it would be
useful to include a small example in the KIP of what it would actually mean
to "manage your own offsets" -- I know (all too well) that plugging in
custom storage implementations is not easy and most people who do this are
probably fairly advanced users, but offset management will be a totally new
ballgame to most people people and this kind of feels like throwing them
off the deep end. We should at least provide a lifejacket via some kind of
utility API and/or example

200. There's been a lot of back and forth on the rebalance metadata/task
lag computation question, so forgive me if I missed any part of this, but I
think we've landed at the right idea here. To summarize: the "tl;dr"
explanation is that we'll write the checkpoint file only on close and will
account for hard-crash scenarios by opening up the stores on startup and
writing a checkpoint file for any missing tasks. Does that sound about
right?

A few clarifications:
I think we're all more or less on the same page here but just to be
absolutely clear, the task lags for each task directory found on disk will
be reported by only one of the StreamThreads, and each StreamThread will
report lags only for tasks that it already owns or are not assigned to any
other StreamThread in the client. In other words, we only need to get the
task lag for completely unassigned/unlocked tasks, which means if there is
a checkpoint file at all then it must be up-to-date, because there is no
other StreamThread actively writing to that state store (if so then only
that StreamThread would report lag for that particular task).

This still leaves the "no checkpoint at all" case which as previously
mentioned can occur after a  hard-crash. Luckily we only have to worry
about this once, after starting up again following said hard crash. We can
simply open up each of the state stores before ever joining the group, get
the offsets from rocksdb, and write them to a new checkpoint file. After
that, we can depend on the checkpoints written at close and won't have to
open up any stores that aren't already assigned for the reasons laid out in
the paragraph above.

As for the specific mechanism and which thread-does-what, since there were
some questions, this is how I'm imagining the process:

1.   The general idea is that we simply go through each task directories
with state but no checkpoint file and open the StateStore, call
#committedOffset, and then write it to the checkpoint file. We can then
close these stores and let things proceed as normal.
2.  This only has to happen once, during startup, but we have two
options:
   1. Do this from KafkaStreams#start, ie before we even create the
   StreamThreads
   2.  Do this from StreamThread#start, following a similar lock-based
   approach to the one used #computeTaskLags, where each StreamThread
just
   makes a pass over the task directories on disk and attempts to lock
them
   one by one. If they obtain the lock, check whether there is stat

[jira] [Resolved] (KAFKA-16847) Revise the README for recent CI changes

2024-05-28 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16847.

Resolution: Invalid

> Revise the README for recent CI changes 
> 
>
> Key: KAFKA-16847
> URL: https://issues.apache.org/jira/browse/KAFKA-16847
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Cheng-Kai, Zhang
>Priority: Minor
>
> The recent changes [0] removes the test of 11 and 17, and that is good to our 
> CI resources. However, in the root readme we still declaim "We build and test 
> Apache Kafka with Java 8, 11, 17 and 21" 
> [0] 
> https://github.com/apache/kafka/commit/adab48df6830259d33bd9705b91885c4f384f267
> [1] https://github.com/apache/kafka/blob/trunk/README.md?plain=1#L7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1035: StateStore managed changelog offsets

2024-05-28 Thread Bruno Cadonna

Totally agree on moving forward and starting the VOTE!

However, the KIP should be updated with the new info before starting the 
VOTE.


Best,
Bruno

On 5/29/24 2:36 AM, Matthias J. Sax wrote:
Sounds like a good plan. -- I think we are still wrapping up 3.8 
release, but would also like to move forward with with one.


Should we start a VOTE?

For merging PRs we need to wait after code freeze, and 3.8 branch was 
but. But we could start reviewing PRs before this already.



-Matthias

On 5/17/24 3:05 AM, Nick Telford wrote:

Hi everyone,

As discussed on the Zoom call, we're going to handle rebalance 
meta-data by:


- On start-up, Streams will open each store and read its changelog 
offsets
into an in-memory cache. This cache will be shared among all 
StreamThreads.

- On rebalance, the cache will be consulted for Task offsets for any Task
that is not active on any instance-local StreamThreads. If the Task is
active on *any* instance-local StreamThread, we will report the Task 
lag as

"up to date" (i.e. -1), because we know that the local state is currently
up-to-date.

We will avoid caching offsets across restarts in the legacy ".checkpoint"
file, so that we can eliminate the logic for handling this class. If
performance of opening/closing many state stores is poor, we can
parallelise it by forking off a thread for each Task directory when 
reading

the offsets.

I'll update the KIP later today to reflect this design, but I will try to
keep it high-level, so that the exact implementation can vary.

Regards,

Nick

On Thu, 16 May 2024 at 03:12, Sophie Blee-Goldman 
wrote:

103: I like the idea of immediately deprecating #managesOffsets and 
aiming
to make offset management mandatory in the long run. I assume we 
would also
log a warning for any custom stores that return "false" from this 
method to

encourage custom store implementations to start doing so? My only
question/concern is that if we want folks to start managing their own
offsets then we should make this transition easy for them, perhaps by
exposing some public utility APIs for things that are currently 
handled by
Kafka Streams such as reading/writing checkpoint files. Maybe it 
would be
useful to include a small example in the KIP of what it would 
actually mean

to "manage your own offsets" -- I know (all too well) that plugging in
custom storage implementations is not easy and most people who do 
this are
probably fairly advanced users, but offset management will be a 
totally new

ballgame to most people people and this kind of feels like throwing them
off the deep end. We should at least provide a lifejacket via some 
kind of

utility API and/or example

200. There's been a lot of back and forth on the rebalance metadata/task
lag computation question, so forgive me if I missed any part of this, 
but I

think we've landed at the right idea here. To summarize: the "tl;dr"
explanation is that we'll write the checkpoint file only on close and 
will

account for hard-crash scenarios by opening up the stores on startup and
writing a checkpoint file for any missing tasks. Does that sound about
right?

A few clarifications:
I think we're all more or less on the same page here but just to be
absolutely clear, the task lags for each task directory found on disk 
will

be reported by only one of the StreamThreads, and each StreamThread will
report lags only for tasks that it already owns or are not assigned 
to any
other StreamThread in the client. In other words, we only need to get 
the
task lag for completely unassigned/unlocked tasks, which means if 
there is

a checkpoint file at all then it must be up-to-date, because there is no
other StreamThread actively writing to that state store (if so then only
that StreamThread would report lag for that particular task).

This still leaves the "no checkpoint at all" case which as previously
mentioned can occur after a  hard-crash. Luckily we only have to worry
about this once, after starting up again following said hard crash. 
We can
simply open up each of the state stores before ever joining the 
group, get

the offsets from rocksdb, and write them to a new checkpoint file. After
that, we can depend on the checkpoints written at close and won't 
have to
open up any stores that aren't already assigned for the reasons laid 
out in

the paragraph above.

As for the specific mechanism and which thread-does-what, since there 
were

some questions, this is how I'm imagining the process:

    1.   The general idea is that we simply go through each task 
directories

    with state but no checkpoint file and open the StateStore, call
    #committedOffset, and then write it to the checkpoint file. We 
can then

    close these stores and let things proceed as normal.
    2.  This only has to happen once, during startup, but we have two
    options:
   1. Do this from KafkaStreams#start, ie before we even create the
   StreamThreads
   2.  Do this from StreamThread#start, following a similar 
lo

[jira] [Created] (KAFKA-16860) Introduce `group.version` feature flag

2024-05-28 Thread David Jacot (Jira)
David Jacot created KAFKA-16860:
---

 Summary: Introduce `group.version` feature flag
 Key: KAFKA-16860
 URL: https://issues.apache.org/jira/browse/KAFKA-16860
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)