Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2639

2024-02-12 Thread Apache Jenkins Server
See 




Re: Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread Luke Chen
Hi David,

I agree it'll be great if we could distinguish between "flakyFailure" and
"alwaysFailure".
And I also agree we should still show the "flakyFailure" tests because
there could be some potential bugs there.

Thanks for working on it.
Luke

On Tue, Feb 13, 2024 at 12:57 AM Ismael Juma  wrote:

> Sounds good. I am supportive of this change.
>
> Ismael
>
> On Mon, Feb 12, 2024 at 7:43 AM David Jacot 
> wrote:
>
> > Hi Bruno,
> >
> > Yes, you're right. Sorry for the typo.
> >
> > Hi Ismael,
> >
> > You're right. Jenkins does not support the flakyFailure element and
> > hence the information is not at all in the Jenkins report. I am still
> > experimenting with printing the flaky tests somewhere. I will update this
> > thread if I get something working. In the meantime, I wanted to gauge
> > whether there is support for it.
> >
> > Cheers,
> > David
> >
> > On Mon, Feb 12, 2024 at 3:59 PM Ismael Juma  wrote:
> >
> > > Hi David,
> > >
> > > Your message didn't make this clear, but you are saying that Jenkins
> does
> > > _not_ support the flakyFailure element and hence this information will
> be
> > > completely missing from the Jenkins report. Have we considered
> including
> > > the flakyFailure information ourselves? I have seen that being done and
> > it
> > > seems strictly better than totally ignoring it.
> > >
> > > Ismael
> > >
> > > On Mon, Feb 12, 2024 at 12:11 AM David Jacot
>  > >
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I have been playing with `reports.junitXml.mergeReruns` setting in
> > gradle
> > > > [1]. From the gradle doc:
> > > >
> > > > > When mergeReruns is enabled, if a test fails but is then retried
> and
> > > > succeeds, its failures will be recorded as  instead of
> > > > , within one . This is effectively the reporting
> > > > produced by the surefire plugin of Apache Maven™ when enabling
> reruns.
> > If
> > > > your CI server understands this format, it will indicate that the
> test
> > > was
> > > > flaky. If it does not, it will indicate that the test succeeded as it
> > > will
> > > > ignore the  information. If the test does not succeed
> > (i.e.
> > > > it fails for every retry), it will be indicated as having failed
> > whether
> > > > your tool understands this format or not.
> > > >
> > > > With this, we get really close to having green builds [2] all the
> time.
> > > > There are only a few tests which are too flaky. We should address or
> > > > disable those.
> > > >
> > > > I think that this would help us a lot because it would reduce the
> noise
> > > > that we get in pull requests. At the moment, there are just too many
> > > failed
> > > > tests reported so it is really hard to know whether a pull request is
> > > > actually fine or not.
> > > >
> > > > [1] applies it to both unit and integration tests. Following the
> > > discussion
> > > > in the `github build queue` thread, it may be better to only apply it
> > to
> > > > the integration tests. Being stricter with unit tests would make
> sense.
> > > >
> > > > This does not mean that we should continue our effort to reduce the
> > > number
> > > > of flaky tests. For this, I propose to keep using Gradle Entreprise.
> It
> > > > provides a nice report for them that we can leverage.
> > > >
> > > > Thoughts?
> > > >
> > > > Best,
> > > > David
> > > >
> > > > [1] https://github.com/apache/kafka/pull/14862
> > > > [2]
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests
> > > >
> > >
> >
>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2638

2024-02-12 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-9376) Plugin class loader not found using MM2

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-9376.

Fix Version/s: 2.5.0
   Resolution: Fixed

> Plugin class loader not found using MM2
> ---
>
> Key: KAFKA-9376
> URL: https://issues.apache.org/jira/browse/KAFKA-9376
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0
>Reporter: Sinóros-Szabó Péter
>Priority: Minor
> Fix For: 2.5.0
>
>
> I am using MM2 (release 2.4.0 with scala 2.12) I geta bunch of classloader 
> errors. MM2 seems to be working, but I do not know if all of it components 
> are working as expected as this is the first time I use MM2.
> I run MM2 with the following command:
> {code:java}
> ./bin/connect-mirror-maker.sh config/connect-mirror-maker.properties
> {code}
> Errors are:
> {code:java}
> [2020-01-07 15:06:17,892] ERROR Plugin class loader for connector: 
> 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not found. 
> Returning: 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@6ebf0f36 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165)
> [2020-01-07 15:06:17,889] ERROR Plugin class loader for connector: 
> 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not found. 
> Returning: 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@6ebf0f36 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165)
> [2020-01-07 15:06:17,904] INFO ConnectorConfig values:
>  config.action.reload = restart
>  connector.class = org.apache.kafka.connect.mirror.MirrorHeartbeatConnector
>  errors.log.enable = false
>  errors.log.include.messages = false
>  errors.retry.delay.max.ms = 6
>  errors.retry.timeout = 0
>  errors.tolerance = none
>  header.converter = null
>  key.converter = null
>  name = MirrorHeartbeatConnector
>  tasks.max = 1
>  transforms = []
>  value.converter = null
>  (org.apache.kafka.connect.runtime.ConnectorConfig:347)
> [2020-01-07 15:06:17,904] INFO EnrichedConnectorConfig values:
>  config.action.reload = restart
>  connector.class = org.apache.kafka.connect.mirror.MirrorHeartbeatConnector
>  errors.log.enable = false
>  errors.log.include.messages = false
>  errors.retry.delay.max.ms = 6
>  errors.retry.timeout = 0
>  errors.tolerance = none
>  header.converter = null
>  key.converter = null
>  name = MirrorHeartbeatConnector
>  tasks.max = 1
>  transforms = []
>  value.converter = null
>  
> (org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig:347)
> [2020-01-07 15:06:17,905] INFO TaskConfig values:
>  task.class = class org.apache.kafka.connect.mirror.MirrorHeartbeatTask
>  (org.apache.kafka.connect.runtime.TaskConfig:347)
> [2020-01-07 15:06:17,905] INFO Instantiated task MirrorHeartbeatConnector-0 
> with version 1 of type org.apache.kafka.connect.mirror.MirrorHeartbeatTask 
> (org.apache.kafka.connect.runtime.Worker:434){code}
> After a while, these errors are not logged any more.
> Config is:
> {code:java}
> clusters = eucmain, euwbackup
> eucmain.bootstrap.servers = kafka1:9092,kafka2:9092
> euwbackup.bootstrap.servers = 172.30.197.203:9092,172.30.213.104:9092
> eucmain->euwbackup.enabled = true
> eucmain->euwbackup.topics = .*
> eucmain->euwbackup.topics.blacklist = ^(kafka|kmf|__|pricing).*
> eucmain->euwbackup.rename.topics = false
> rename.topics = false
> eucmain->euwbackup.sync.topic.acls.enabled = false
> sync.topic.acls.enabled = false{code}
> Using OpenJDK 8 or 11, I get the same error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-7217) Loading dynamic topic data into kafka connector sink using regex

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-7217.

Fix Version/s: 1.1.0
   Resolution: Fixed

> Loading dynamic topic data into kafka connector sink using regex
> 
>
> Key: KAFKA-7217
> URL: https://issues.apache.org/jira/browse/KAFKA-7217
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Affects Versions: 1.1.0
>Reporter: Pratik Gaglani
>Priority: Major
> Fix For: 1.1.0
>
>
> The new feature to use regex KAFKA-3074
> in connectors, however it seems that the topic data from the newly added 
> topics after the connector has been started is not consumed until the 
> connector is restarted. We have a need to dynamically added new topic and 
> have connector consume the topic based on regex defined in properties of 
> connector. How can it be achieved? Ex: regex: topic-.* topic: topic-1, 
> topic-2 If I introduce new topic topic-3, then how can I make the connector 
> consume the topic data without restarting it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16245) DescribeConsumerGroupTest failing

2024-02-12 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-16245:
--

 Summary: DescribeConsumerGroupTest failing
 Key: KAFKA-16245
 URL: https://issues.apache.org/jira/browse/KAFKA-16245
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


The first instances on trunk are in this PR 
[https://github.com/apache/kafka/pull/15275]
And this PR seems to have it failing consistently in the builds when it wasn't 
failing this consistently before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6340) Support of transactions in KafkaConnect

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-6340.

Fix Version/s: 3.3.0
   Resolution: Fixed

> Support of transactions in KafkaConnect
> ---
>
> Key: KAFKA-6340
> URL: https://issues.apache.org/jira/browse/KAFKA-6340
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Oleg Kuznetsov
>Priority: Major
> Fix For: 3.3.0
>
>
> Now KafkaConnect source connectors commit source offset in periodic task.
> Proposed approach is to produce data record batch and the config update 
> records (related to the batch) within Kafka transaction, that would prevent 
> publishing duplicate records on connector restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-4961) Mirrormaker crash with org.apache.kafka.common.protocol.types.SchemaException

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-4961.

Resolution: Won't Fix

> Mirrormaker crash with org.apache.kafka.common.protocol.types.SchemaException
> -
>
> Key: KAFKA-4961
> URL: https://issues.apache.org/jira/browse/KAFKA-4961
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Di Shang
>Priority: Major
>  Labels: mirror-maker
>
> We are running a cluster of 3 brokers and using mirrormaker to replicate a 
> topic to a different 3-broker cluster. Occasionally we find that when the 
> source cluster is under heavy load with lots of messages coming in, 
> mirrormaker will crash with SchemaException. 
> {noformat}
> 27 Mar 2017 19:02:22.030 [mirrormaker-thread-0] DEBUG 
> org.apache.kafka.clients.NetworkClient handleTimedOutRequests(line:399) 
> Disconnecting from node 5 due to request timeout.
> 27 Mar 2017 19:02:22.032 [mirrormaker-thread-0] DEBUG 
> org.apache.kafka.clients.NetworkClient handleTimedOutRequests(line:399) 
> Disconnecting from node 7 due to request timeout.
> 27 Mar 2017 19:02:22.033 [mirrormaker-thread-0] DEBUG 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient 
> onComplete(line:376) Cancelled FETCH request 
> ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@96db60c9,
>  
> request=RequestSend(header={api_key=1,api_version=1,correlation_id=76978,client_id=dx-stg02-wdc04-0},
>  
> body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=0,fetch_offset=129037541,max_bytes=1048576},{partition=1,fetch_offset=120329329,max_bytes=1048576},{partition=33,fetch_offset=125526115,max_bytes=1048576},{partition=36,fetch_offset=125526627,max_bytes=1048576},{partition=5,fetch_offset=121654333,max_bytes=1048576},{partition=37,fetch_offset=120262628,max_bytes=1048576},{partition=9,fetch_offset=125568321,max_bytes=1048576},{partition=41,fetch_offset=121593740,max_bytes=1048576},{partition=12,fetch_offset=125563836,max_bytes=1048576},{partition=13,fetch_offset=122044962,max_bytes=1048576},{partition=45,fetch_offset=125504213,max_bytes=1048576},{partition=48,fetch_offset=125506892,max_bytes=1048576},{partition=17,fetch_offset=121635934,max_bytes=1048576},{partition=49,fetch_offset=121985309,max_bytes=1048576},{partition=21,fetch_offset=125549718,max_bytes=1048576},{partition=24,fetch_offset=125548506,max_bytes=1048576},{partition=25,fetch_offset=120289719,max_bytes=1048576},{partition=29,fetch_offset=121612535,max_bytes=1048576}]}]}),
>  createdTimeMs=1490641301465, sendTimeMs=1490641301465) with correlation id 
> 76978 due to node 5 being disconnected
> 27 Mar 2017 19:02:22.035 [mirrormaker-thread-0] DEBUG 
> org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch 
> failed
> org.apache.kafka.common.errors.DisconnectException: null
> 27 Mar 2017 19:02:22.037 [mirrormaker-thread-0] DEBUG 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient 
> onComplete(line:376) Cancelled FETCH request 
> ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@fcb8c50d,
>  
> request=RequestSend(header={api_key=1,api_version=1,correlation_id=76980,client_id=dx-stg02-wdc04-0},
>  
> body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=32,fetch_offset=125478125,max_bytes=1048576},{partition=2,fetch_offset=121280695,max_bytes=1048576},{partition=3,fetch_offset=125515146,max_bytes=1048576},{partition=35,fetch_offset=121216188,max_bytes=1048576},{partition=38,fetch_offset=121220634,max_bytes=1048576},{partition=7,fetch_offset=121634123,max_bytes=1048576},{partition=39,fetch_offset=125464566,max_bytes=1048576},{partition=8,fetch_offset=125515210,max_bytes=1048576},{partition=11,fetch_offset=121257359,max_bytes=1048576},{partition=43,fetch_offset=121571984,max_bytes=1048576},{partition=44,fetch_offset=125455538,max_bytes=1048576},{partition=14,fetch_offset=121264791,max_bytes=1048576},{partition=15,fetch_offset=125495034,max_bytes=1048576},{partition=47,fetch_offset=121199057,max_bytes=1048576},{partition=19,fetch_offset=121613792,max_bytes=1048576},{partition=20,fetch_offset=125495807,max_bytes=1048576},{partition=23,fetch_offset=121237155,max_bytes=1048576},{partition=26,fetch_offset=121249178,max_bytes=1048576},{partition=27,fetch_offset=125317927,max_bytes=1048576},{partition=31,fetch_offset=121591702,max_bytes=1048576}]}]}),
>  createdTimeMs=1490641301466, sendTimeMs=1490641301466) with correlation id 
> 76980 due to node 7 being disconnected

[jira] [Resolved] (KAFKA-3213) [CONNECT] It looks like we are not backing off properly when reconfiguring tasks

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-3213.

Fix Version/s: 3.5.0
   Resolution: Fixed

> [CONNECT] It looks like we are not backing off properly when reconfiguring 
> tasks
> 
>
> Key: KAFKA-3213
> URL: https://issues.apache.org/jira/browse/KAFKA-3213
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Reporter: Gwen Shapira
>Assignee: Liquan Pei
>Priority: Major
> Fix For: 3.5.0
>
>
> Looking at logs of attempt to reconfigure connector while leader is 
> restarting, I see:
> {code}
> [2016-01-29 20:31:01,799] ERROR IO error forwarding REST request:  
> (org.apache.kafka.connect.runtime.rest.RestServer)
> java.net.ConnectException: Connection refused
> [2016-01-29 20:31:01,802] ERROR Request to leader to reconfigure connector 
> tasks failed (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: IO Error 
> trying to forward REST request: Connection refused
> [2016-01-29 20:31:01,802] ERROR Failed to reconfigure connector's tasks, 
> retrying after backoff: 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: IO Error 
> trying to forward REST request: Connection refused
> [2016-01-29 20:31:01,803] DEBUG Sending POST with input 
> [{"tables":"bar","table.poll.interval.ms":"1000","incrementing.column.name":"id","connection.url":"jdbc:mysql://worker1:3306/testdb?user=root","name":"test-mysql-jdbc","tasks.max":"3","task.class":"io.confluent.connect.jdbc.JdbcSourceTask","poll.interval.ms":"1000","topic.prefix":"test-","connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector","mode":"incrementing"},{"tables":"foo","table.poll.interval.ms":"1000","incrementing.column.name":"id","connection.url":"jdbc:mysql://worker1:3306/testdb?user=root","name":"test-mysql-jdbc","tasks.max":"3","task.class":"io.confluent.connect.jdbc.JdbcSourceTask","poll.interval.ms":"1000","topic.prefix":"test-","connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector","mode":"incrementing"}]
>  to http://worker2:8083/connectors/test-mysql-jdbc/tasks 
> (org.apache.kafka.connect.runtime.rest.RestServer)
> [2016-01-29 20:31:01,803] ERROR IO error forwarding REST request:  
> (org.apache.kafka.connect.runtime.rest.RestServer)
> java.net.ConnectException: Connection refused
> [2016-01-29 20:31:01,804] ERROR Request to leader to reconfigure connector 
> tasks failed (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: IO Error 
> trying to forward REST request: Connection refused
> {code}
> Note that it looks like we are retrying every 1ms, while I'd expect a retry 
> every 250ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage

2024-02-12 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16229.

Resolution: Fixed

> Slow expiration of Producer IDs leading to high CPU usage
> -
>
> Key: KAFKA-16229
> URL: https://issues.apache.org/jira/browse/KAFKA-16229
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Expiration of ProducerIds is implemented with a slow removal of map keys:
> ```
>         producers.keySet().removeAll(keys);
> ```
> Unnecessarily going through all producer ids and then throw all expired keys 
> to be removed.
> This leads to exponential time on worst case when most/all keys need to be 
> removed:
> ```
> Benchmark                                        (numProducerIds)  Mode  Cnt  
>          Score            Error  Units
> ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3  
>       9164.043 ±      10647.877  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3  
>     341561.093 ±      20283.211  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3  
>   44957983.550 ±    9389011.290  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
> 5683374164.167 ± 1446242131.466  ns/op
> ```
> A simple fix is to use map#remove(key) instead, leading to a more linear 
> growth:
> ```
> Benchmark                                        (numProducerIds)  Mode  Cnt  
>       Score         Error  Units
> ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3  
>    5779.056 ±     651.389  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3  
>   61430.530 ±   21875.644  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3  
>  643887.031 ±  600475.302  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
> 7741689.539 ± 3218317.079  ns/op
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13338) kafka upgrade 6.1.1 asking to change log.cleanup.policy from delete to compact

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-13338.
-
Resolution: Invalid

> kafka upgrade 6.1.1 asking to change log.cleanup.policy from delete to compact
> --
>
> Key: KAFKA-13338
> URL: https://issues.apache.org/jira/browse/KAFKA-13338
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Fátima Galera
>Priority: Major
> Attachments: image003.png
>
>
> Hi all,
>  
> During kafka upgrade from 5.3.1 to 6.1.1.1 we get below error starting 
> kafka-connector services
>  
> ERROR [Worker clientId=connect-1, groupId=connect-cluster] Uncaught exception 
> in herder work thread, exiting: 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> 87738-org.apache.kafka.common.config.ConfigException: Topic 'connect-offsets' 
> supplied via the 'offset.storage.topic' property is required to have 
> 'cleanup.policy=compact' to guarantee consistency and durability of source 
> connector offsets, but found the topic currently has 'cleanup.policy=delete'. 
> Continuing would likely result in eventually losing source connector offsets 
> and problems restarting this Connect cluster in the future. Change the 
> 'offset.storage.topic' property in the Connect worker configurations to use a 
> topic with 'cleanup.policy=compact'.
>  
> After that error we added log.cleanup.policy=compact parameter on 
> /etc/kafka/server.properties and kafka-connector service started but a few 
> days later all the connectors are down with attached error.
>  
> Could you please let us know how we can resolved the issue?
>  
> Best regards and thanks
> Fátima
> NGA -Alight
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13163) MySQL Sink Connector - JsonConverter - DataException: Unknown schema type: null

2024-02-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-13163.
-
Resolution: Invalid

> MySQL Sink Connector - JsonConverter - DataException: Unknown schema type: 
> null
> ---
>
> Key: KAFKA-13163
> URL: https://issues.apache.org/jira/browse/KAFKA-13163
> Project: Kafka
>  Issue Type: Task
>  Components: connect
>Affects Versions: 2.1.1
> Environment: PreProd
>Reporter: Muddam Pullaiah Yadav
>Priority: Major
>
> Please help with the following issue. Really appreciate it! 
>  
> We are using Azure HDInsight Kafka cluster 
> My sink Properties:
>  
> cat mysql-sink-connector
>  {
>  "name":"mysql-sink-connector",
>  "config":
> { "tasks.max":"2", "batch.size":"1000", "batch.max.rows":"1000", 
> "poll.interval.ms":"500", 
> "connector.class":"org.apache.kafka.connect.file.FileStreamSinkConnector", 
> "connection.url":"jdbc:mysql://moddevdb.mysql.database.azure.com:3306/db_test_dev",
>  "table.name":"db_test_dev.tbl_clients_merchants", "topics":"test", 
> "connection.user":"grabmod", "connection.password":"#admin", 
> "auto.create":"true", "auto.evolve":"true", 
> "value.converter":"org.apache.kafka.connect.json.JsonConverter", 
> "value.converter.schemas.enable":"false", 
> "key.converter":"org.apache.kafka.connect.json.JsonConverter", 
> "key.converter.schemas.enable":"true" }
> }
>  
> [2021-08-04 11:18:30,234] ERROR WorkerSinkTask\{id=mysql-sink-connector-0} 
> Task threw an uncaught and unrecoverable exception 
> (org.apache.kafka.connect.runtime.WorkerTask:177)
>  org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in 
> error handler
>  at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
>  at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
>  at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:514)
>  at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:491)
>  at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
>  at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226)
>  at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194)
>  at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
>  at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.kafka.connect.errors.DataException: Unknown schema 
> type: null
>  at 
> org.apache.kafka.connect.json.JsonConverter.convertToConnect(JsonConverter.java:743)
>  at 
> org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:363)
>  at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:514)
>  at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
>  at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
>  ... 13 more
>  [2021-08-04 11:18:30,234] ERROR WorkerSinkTask\{id=mysql-sink-connector-0} 
> Task is being killed and will not recover until manually restarted 
> (org.apache.kafka.connect.runtime.WorkerTask:178)
>  [2021-08-04 11:18:30,235] INFO [Consumer clientId=consumer-18, 
> groupId=connect-mysql-sink-connector] Sending LeaveGroup request to 
> coordinator 
> wn2-grabde.fkgw2p1emuqu5d21xcbqrhqqbf.rx.internal.cloudapp.net:9092 (id: 
> 2147482646 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:782)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.7.0 RC4

2024-02-12 Thread Jakub Scholz
+1 (non-binding). I used the staged binaries with Scala 2.13 and the staged
Maven artifacts to run my tests. All seems to work fine. Thanks.

Jakub

On Fri, Feb 9, 2024 at 4:20 PM Stanislav Kozlovski
 wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate we are considering for release of Apache Kafka
> 3.7.0.
>
> Major changes include:
> - Early Access to KIP-848 - the next generation of the consumer rebalance
> protocol
> - Early Access to KIP-858: Adding JBOD support to KRaft
> - KIP-714: Observability into Client metrics via a standardized interface
>
> Release notes for the 3.7.0 release:
>
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc4/RELEASE_NOTES.html
>
> *** Please download, test and vote by Thursday, February 15th, 9AM PST ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc4/
>
> * Docker release artifact to be voted upon:
> apache/kafka:3.7.0-rc4
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc4/javadoc/
>
> * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
> https://github.com/apache/kafka/releases/tag/3.7.0-rc4
>
> * Documentation:
> https://kafka.apache.org/37/documentation.html
>
> * Protocol:
> https://kafka.apache.org/37/protocol.html
>
> * Successful Jenkins builds for the 3.7 branch:
>
> Unit/integration tests: I am in the process of running and analyzing these.
> System tests: I am in the process of running these.
>
> Expect a follow-up over the weekend
>
> * Successful Docker Image Github Actions Pipeline for 3.7 branch:
> Docker Build Test Pipeline:
> https://github.com/apache/kafka/actions/runs/7845614846
>
> /**
>
> Best,
> Stanislav
>


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-12 Thread Abhijeet Kumar
Comments inline

On Wed, Dec 6, 2023 at 1:12 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. A few comments.
>
> 10. remote.log.manager.write.quota.default:
> 10.1 For other configs, we
> use replica.alter.log.dirs.io.max.bytes.per.second. To be consistent,
> perhaps this can be sth like remote.log.manager.write.max.bytes.per.second.
>

This makes sense, we can rename the following configs to be consistent.

Remote.log.manager.write.quota.default ->
remote.log.manager.write.max.bytes.per.second

Remote.log.manager.read.quota.default ->
remote.log.manager.read.max.bytes.per.second.



> 10.2 Could we list the new metrics associated with the new quota.
>

We will add the following metrics as mentioned in the other response.
*RemoteFetchThrottleTime* - The amount of time needed to bring the observed
remote fetch rate within the read quota
*RemoteCopyThrottleTime *- The amount of time needed to bring the observed
remote copy rate with the copy quota.

10.3 Is this dynamically configurable? If so, could we document the impact
> to tools like kafka-configs.sh and AdminClient?
>

Yes, the quotas are dynamically configurable. We will add them as Dynamic
Broker Configs. Users will be able to change
the following configs using either kafka-configs.sh or AdminClient by
specifying the config name and new value. For eg.

Using kafka-configs.sh

bin/kafka-configs.sh --bootstrap-server  --entity-type
brokers --entity-default --alter --add-config
remote.log.manager.write.max.bytes.per.second=52428800

Using AdminClient

ConfigEntry configEntry = new
ConfigEntry("remote.log.manager.write.max.bytes.per.second", "5242800");
AlterConfigOp alterConfigOp = new AlterConfigOp(configEntry,
AlterConfigOp.OpType.SET);
List alterConfigOps =
Collections.singletonList(alterConfigOp);

ConfigResource resource = new ConfigResource(ConfigResource.Type.BROKER,
"");
Map> updateConfig =
ImmutableMap.of(resource, alterConfigOps);
adminClient.incrementalAlterConfigs(updateConfig);


>
> Jun
>
> On Tue, Nov 28, 2023 at 2:19 AM Luke Chen  wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the KIP!
> > This is an important feature for tiered storage.
> >
> > Some comments:
> > 1. Will we introduce new metrics for this tiered storage quotas?
> > This is important because the admin can know the throttling status by
> > checking the metrics while the remote write/read are slow, like the rate
> of
> > uploading/reading byte rate, the throttled time for upload/read... etc.
> >
> > 2. Could you give some examples for the throttling algorithm in the KIP
> to
> > explain it? That will make it much clearer.
> >
> > 3. To solve this problem, we can break down the RLMTask into two smaller
> > tasks - one for segment upload and the other for handling expired
> segments.
> > How do we handle the situation when a segment is still waiting for
> > offloading while this segment is expired and eligible to be deleted?
> > Maybe it'll be easier to not block the RLMTask when quota exceeded, and
> > just check it each time the RLMTask runs?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > I have created KIP-956 for defining read and write quota for tiered
> > > storage.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > >
> > > Feedback and suggestions are welcome.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> >
>


-- 
Abhijeet.


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-12 Thread Abhijeet Kumar
Comments inline

On Tue, Nov 28, 2023 at 3:50 PM Luke Chen  wrote:
>
> Hi Abhijeet,
>
> Thanks for the KIP!
> This is an important feature for tiered storage.
>
> Some comments:
> 1. Will we introduce new metrics for this tiered storage quotas?
> This is important because the admin can know the throttling status by
> checking the metrics while the remote write/read are slow, like the rate
of
> uploading/reading byte rate, the throttled time for upload/read... etc.
>

Good point. When the fetches/copies on a broker are getting throttled,
we'll notice that the broker's fetch and copy
activities stabilize according to the predefined limit. We plan to add the
following extra metrics to indicate the throttled
time for fetches/copies, similar to the local fetch throttle metrics.

*RemoteFetchThrottleTime* - The amount of time needed to bring the observed
remote fetch rate within the read quota
*RemoteCopyThrottleTime *- The amount of time needed to bring the observed
remote copy rate with the copy quota.

> 2. Could you give some examples for the throttling algorithm in the KIP to
> explain it? That will make it much clearer.
>

The QuotaManagers look something like:

class RlmWriteQuotaManager {
   public void updateQuota(Quota newQuota) {// Implementation}
   public boolean isQuotaExceeded() {// Implementation}
   public void record(Double value) {// Implementation}
}

class RlmReadQuotaManager {
   public void updateQuota(Quota newQuota) {// Implementation}
   public boolean isQuotaExceeded() {// Implementation}
   public void record(Double value) {// Implementation}
}

*Read Throttling*

RemoteLogReader will capture the read rate once it has fetched the remote
log segment to serve the fetch request.
Changes would look something like below.

class RemoteLogReader {
@Override
public Void call() {
   // ...
   quotaManager.record(result.fetchDataInfo.map(info -> (double)
info.records.sizeInBytes()).orElse(0.0));
   callback.accept(result);
  }
}

Read throttling will be carried out by the ReplicaManager. When it gets
offset out of range error, it will check for quota
getting exhausted before returning delayedRemoteStorageFetch info.

def handleOffsetOutOfRangeError()
 {
   // ...
   if (params.isFromFollower) {
   // ...
   } else {
 // ...
 val fetchDataInfo = if (isRemoteLogReadQuotaExceeded) {
 info("Read quota exceeded. Returning empty remote storage fetch info")
 FetchDataInfo(
   LogOffsetMetadata.UnknownOffsetMetadata,
   MemoryRecords.EMPTY,
   delayedRemoteStorageFetch = None
 )
} else {
  // Regular case when quota is not exceeded
  FetchDataInfo(LogOffsetMetadata(fetchInfo.fetchOffset),
MemoryRecords.EMPTY,
   delayedRemoteStorageFetch = Some(
 RemoteStorageFetchInfo(adjustedMaxBytes, minOneMessage, tp,
fetchInfo, fetchIsolation)))
}
//...
}

When the read quota is exceeded, we return an empty remote storage fetch
info. We do not want to send an exception
in the LogReadResult response (like we do in other cases when we send
UnknownOffsetMetadata), because then it is
classified as an error in reading data, and a response is immediately sent
back to the client. Instead, we want that we
should be able to serve data for other topic partitions in the fetch
request via delayed fetch if required (when sending an
immediate response, delayed fetch is skipped). Also, immediately sending a
response would make the consumer retry
again immediately, which may run into quota exceeded situation again and
thus get it into a loop.

*Write Throttling*

RemoteLogManager.java

private ReentrantLock writeQuotaManagerLock = new ReentrantLock(true);
private Condition lockCondition = writeQuotaManagerLock.newCondition();

private void copyLogSegment {
  // ...
  writeQuotaManagerLock.lock();
  try {
while (rlmWriteQuotaManager.isQuotaExceeded) {
  // Quota exceeded, waiting for quota to be available
  lockCondition.await(timeout, TimeUnit.MILLISECONDS);
}
rlmWriteQuotamanager.record(segment.log.sizeInBytes())
// Signal waiting threads to check the quota
locakCondition.signalAll()
  } finally {
writeQuotaManagerLock.unlock();
  }

  // Actual copy operation
}

Before beginning to copy the segment to remote, the RLMTask checks if the
quota is exceeded. If it has, the task waits for

a specified period before rechecking the quota status. Once the RLMTask
confirms that the quota is available, it records

the size of the log it plans to upload with the quota manager before
proceeding with the segment upload. This step is crucial

to ensure that the quota is accurately updated and prevents other RLMTasks
from mistakenly assuming that quota space is

available for them as well.

> 3. To solve this problem, we can break down the RLMTask into two smaller
> tasks - one for segment upload and the other for handling expired
segments.
> How do we handle the situation when a segment is still waiting for
> offloading while this segment is expire

Re: [PR] 3.7: Add blog post for Kafka 3.7 [kafka-site]

2024-02-12 Thread via GitHub


OmniaGM commented on code in PR #578:
URL: https://github.com/apache/kafka-site/pull/578#discussion_r1486622896


##
blog.html:
##
@@ -22,6 +22,119 @@
 
 
 Blog
+
+
+
+Apache 
Kafka 3.7.0 Release Announcement
+
+TODO: January 2024 - Stanislav Kozlovski (https://twitter.com/0xeed";>@BdKozlovski)
+We are proud to announce the release of Apache Kafka 3.7.0. 
This release contains many new features and improvements. This blog post will 
highlight some of the more prominent features. For a full list of changes, be 
sure to check the https://downloads.apache.org/kafka/3.7.0/RELEASE_NOTES.html";>release 
notes.
+See the https://kafka.apache.org/36/documentation.html#upgrade_3_7_0";>Upgrading 
to 3.7.0 from any version 0.8.x through 3.6.x section in the documentation 
for the list of notable changes and detailed upgrade steps.
+
+In the last release, 3.6,
+https://kafka.apache.org/documentation/#kraft_zk_migration";>the ability 
to migrate Kafka clusters from a ZooKeeper metadata system
+to a KRaft metadata system was ready for usage in 
production environments with one caveat - JBOD was not yet available for KRaft 
clusters.
+This release, 3.7, we are shipping support for JBOD in 
KRaft. (See https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft";>KIP-858
 for details)

Review Comment:
   https://github.com/apache/kafka-site/pull/579/files 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINIOR: Mark JBOD as an early release [kafka-site]

2024-02-12 Thread via GitHub


OmniaGM commented on PR #579:
URL: https://github.com/apache/kafka-site/pull/579#issuecomment-1939334651

   @stanislavkozlovski can you have a look into this please? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] MINIOR: Mark JBOD as an early release [kafka-site]

2024-02-12 Thread via GitHub


OmniaGM opened a new pull request, #579:
URL: https://github.com/apache/kafka-site/pull/579

   Update the blog to mark JBOD as early access


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] 3.7: Add blog post for Kafka 3.7 [kafka-site]

2024-02-12 Thread via GitHub


OmniaGM commented on code in PR #578:
URL: https://github.com/apache/kafka-site/pull/578#discussion_r1486615067


##
blog.html:
##
@@ -22,6 +22,119 @@
 
 
 Blog
+
+
+
+Apache 
Kafka 3.7.0 Release Announcement
+
+TODO: January 2024 - Stanislav Kozlovski (https://twitter.com/0xeed";>@BdKozlovski)
+We are proud to announce the release of Apache Kafka 3.7.0. 
This release contains many new features and improvements. This blog post will 
highlight some of the more prominent features. For a full list of changes, be 
sure to check the https://downloads.apache.org/kafka/3.7.0/RELEASE_NOTES.html";>release 
notes.
+See the https://kafka.apache.org/36/documentation.html#upgrade_3_7_0";>Upgrading 
to 3.7.0 from any version 0.8.x through 3.6.x section in the documentation 
for the list of notable changes and detailed upgrade steps.
+
+In the last release, 3.6,
+https://kafka.apache.org/documentation/#kraft_zk_migration";>the ability 
to migrate Kafka clusters from a ZooKeeper metadata system
+to a KRaft metadata system was ready for usage in 
production environments with one caveat - JBOD was not yet available for KRaft 
clusters.
+This release, 3.7, we are shipping support for JBOD in 
KRaft. (See https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft";>KIP-858
 for details)

Review Comment:
   Hi @stanislavkozlovski, I will push an update this line to mark JBOD as 
early access in 3.7.0 blog release notes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2637

2024-02-12 Thread Apache Jenkins Server
See 




Re: Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread Ismael Juma
Sounds good. I am supportive of this change.

Ismael

On Mon, Feb 12, 2024 at 7:43 AM David Jacot 
wrote:

> Hi Bruno,
>
> Yes, you're right. Sorry for the typo.
>
> Hi Ismael,
>
> You're right. Jenkins does not support the flakyFailure element and
> hence the information is not at all in the Jenkins report. I am still
> experimenting with printing the flaky tests somewhere. I will update this
> thread if I get something working. In the meantime, I wanted to gauge
> whether there is support for it.
>
> Cheers,
> David
>
> On Mon, Feb 12, 2024 at 3:59 PM Ismael Juma  wrote:
>
> > Hi David,
> >
> > Your message didn't make this clear, but you are saying that Jenkins does
> > _not_ support the flakyFailure element and hence this information will be
> > completely missing from the Jenkins report. Have we considered including
> > the flakyFailure information ourselves? I have seen that being done and
> it
> > seems strictly better than totally ignoring it.
> >
> > Ismael
> >
> > On Mon, Feb 12, 2024 at 12:11 AM David Jacot  >
> > wrote:
> >
> > > Hi folks,
> > >
> > > I have been playing with `reports.junitXml.mergeReruns` setting in
> gradle
> > > [1]. From the gradle doc:
> > >
> > > > When mergeReruns is enabled, if a test fails but is then retried and
> > > succeeds, its failures will be recorded as  instead of
> > > , within one . This is effectively the reporting
> > > produced by the surefire plugin of Apache Maven™ when enabling reruns.
> If
> > > your CI server understands this format, it will indicate that the test
> > was
> > > flaky. If it does not, it will indicate that the test succeeded as it
> > will
> > > ignore the  information. If the test does not succeed
> (i.e.
> > > it fails for every retry), it will be indicated as having failed
> whether
> > > your tool understands this format or not.
> > >
> > > With this, we get really close to having green builds [2] all the time.
> > > There are only a few tests which are too flaky. We should address or
> > > disable those.
> > >
> > > I think that this would help us a lot because it would reduce the noise
> > > that we get in pull requests. At the moment, there are just too many
> > failed
> > > tests reported so it is really hard to know whether a pull request is
> > > actually fine or not.
> > >
> > > [1] applies it to both unit and integration tests. Following the
> > discussion
> > > in the `github build queue` thread, it may be better to only apply it
> to
> > > the integration tests. Being stricter with unit tests would make sense.
> > >
> > > This does not mean that we should continue our effort to reduce the
> > number
> > > of flaky tests. For this, I propose to keep using Gradle Entreprise. It
> > > provides a nice report for them that we can leverage.
> > >
> > > Thoughts?
> > >
> > > Best,
> > > David
> > >
> > > [1] https://github.com/apache/kafka/pull/14862
> > > [2]
> > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests
> > >
> >
>


Re: Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread David Jacot
Hi Bruno,

Yes, you're right. Sorry for the typo.

Hi Ismael,

You're right. Jenkins does not support the flakyFailure element and
hence the information is not at all in the Jenkins report. I am still
experimenting with printing the flaky tests somewhere. I will update this
thread if I get something working. In the meantime, I wanted to gauge
whether there is support for it.

Cheers,
David

On Mon, Feb 12, 2024 at 3:59 PM Ismael Juma  wrote:

> Hi David,
>
> Your message didn't make this clear, but you are saying that Jenkins does
> _not_ support the flakyFailure element and hence this information will be
> completely missing from the Jenkins report. Have we considered including
> the flakyFailure information ourselves? I have seen that being done and it
> seems strictly better than totally ignoring it.
>
> Ismael
>
> On Mon, Feb 12, 2024 at 12:11 AM David Jacot 
> wrote:
>
> > Hi folks,
> >
> > I have been playing with `reports.junitXml.mergeReruns` setting in gradle
> > [1]. From the gradle doc:
> >
> > > When mergeReruns is enabled, if a test fails but is then retried and
> > succeeds, its failures will be recorded as  instead of
> > , within one . This is effectively the reporting
> > produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> > your CI server understands this format, it will indicate that the test
> was
> > flaky. If it does not, it will indicate that the test succeeded as it
> will
> > ignore the  information. If the test does not succeed (i.e.
> > it fails for every retry), it will be indicated as having failed whether
> > your tool understands this format or not.
> >
> > With this, we get really close to having green builds [2] all the time.
> > There are only a few tests which are too flaky. We should address or
> > disable those.
> >
> > I think that this would help us a lot because it would reduce the noise
> > that we get in pull requests. At the moment, there are just too many
> failed
> > tests reported so it is really hard to know whether a pull request is
> > actually fine or not.
> >
> > [1] applies it to both unit and integration tests. Following the
> discussion
> > in the `github build queue` thread, it may be better to only apply it to
> > the integration tests. Being stricter with unit tests would make sense.
> >
> > This does not mean that we should continue our effort to reduce the
> number
> > of flaky tests. For this, I propose to keep using Gradle Entreprise. It
> > provides a nice report for them that we can leverage.
> >
> > Thoughts?
> >
> > Best,
> > David
> >
> > [1] https://github.com/apache/kafka/pull/14862
> > [2]
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests
> >
>


Re: [VOTE] 3.7.0 RC4

2024-02-12 Thread Josep Prat
Hi Stanislav,

Thanks for running the release. It gets a +1 (non-binding) from me.
I run the following steps to validate:
- Compiled with Java 17 and Scala 2.13.12
- Run unit and integration tests
  - "LogDirFailureTest > testIOExceptionDuringLogRoll(String).quorum=kraft"
failed[1] several times before I could get a clear build.
- Checked JavaDoc and clicked links pointing to JDK
- Run getting started with ZK and KRaft
- Verified artifact's signatures and hashes

Best,

[1]: Gradle Test Run :core:test > Gradle Test Executor 159 >
LogDirFailureTest > testIOExceptionDuringLogRoll(String) >
testIOExceptionDuringLogRoll(String).quorum=kraft FAILED
org.opentest4j.AssertionFailedError: expected:  but was: 
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at
app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at
app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
at
app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:179)
at
app//kafka.utils.TestUtils$.causeLogDirFailure(TestUtils.scala:1671)
at
app//kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:186)
at
app//kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:70)

On Fri, Feb 9, 2024 at 4:20 PM Stanislav Kozlovski
 wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate we are considering for release of Apache Kafka
> 3.7.0.
>
> Major changes include:
> - Early Access to KIP-848 - the next generation of the consumer rebalance
> protocol
> - Early Access to KIP-858: Adding JBOD support to KRaft
> - KIP-714: Observability into Client metrics via a standardized interface
>
> Release notes for the 3.7.0 release:
>
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc4/RELEASE_NOTES.html
>
> *** Please download, test and vote by Thursday, February 15th, 9AM PST ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc4/
>
> * Docker release artifact to be voted upon:
> apache/kafka:3.7.0-rc4
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc4/javadoc/
>
> * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
> https://github.com/apache/kafka/releases/tag/3.7.0-rc4
>
> * Documentation:
> https://kafka.apache.org/37/documentation.html
>
> * Protocol:
> https://kafka.apache.org/37/protocol.html
>
> * Successful Jenkins builds for the 3.7 branch:
>
> Unit/integration tests: I am in the process of running and analyzing these.
> System tests: I am in the process of running these.
>
> Expect a follow-up over the weekend
>
> * Successful Docker Image Github Actions Pipeline for 3.7 branch:
> Docker Build Test Pipeline:
> https://github.com/apache/kafka/actions/runs/7845614846
>
> /**
>
> Best,
> Stanislav
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Re: Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread Ismael Juma
Hi David,

Your message didn't make this clear, but you are saying that Jenkins does
_not_ support the flakyFailure element and hence this information will be
completely missing from the Jenkins report. Have we considered including
the flakyFailure information ourselves? I have seen that being done and it
seems strictly better than totally ignoring it.

Ismael

On Mon, Feb 12, 2024 at 12:11 AM David Jacot 
wrote:

> Hi folks,
>
> I have been playing with `reports.junitXml.mergeReruns` setting in gradle
> [1]. From the gradle doc:
>
> > When mergeReruns is enabled, if a test fails but is then retried and
> succeeds, its failures will be recorded as  instead of
> , within one . This is effectively the reporting
> produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> your CI server understands this format, it will indicate that the test was
> flaky. If it does not, it will indicate that the test succeeded as it will
> ignore the  information. If the test does not succeed (i.e.
> it fails for every retry), it will be indicated as having failed whether
> your tool understands this format or not.
>
> With this, we get really close to having green builds [2] all the time.
> There are only a few tests which are too flaky. We should address or
> disable those.
>
> I think that this would help us a lot because it would reduce the noise
> that we get in pull requests. At the moment, there are just too many failed
> tests reported so it is really hard to know whether a pull request is
> actually fine or not.
>
> [1] applies it to both unit and integration tests. Following the discussion
> in the `github build queue` thread, it may be better to only apply it to
> the integration tests. Being stricter with unit tests would make sense.
>
> This does not mean that we should continue our effort to reduce the number
> of flaky tests. For this, I propose to keep using Gradle Entreprise. It
> provides a nice report for them that we can leverage.
>
> Thoughts?
>
> Best,
> David
>
> [1] https://github.com/apache/kafka/pull/14862
> [2]
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests
>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #94

2024-02-12 Thread Apache Jenkins Server
See 




Re: Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread Bruno Cadonna

Hi David,

I guess you meant to say

"This does not mean that we should NOT continue our effort to reduce the 
number of flaky tests."


I totally agree with what you wrote. I am also +1 on considering all 
failures for unit tests.


Best,
Bruno

On 2/12/24 9:11 AM, David Jacot wrote:

Hi folks,

I have been playing with `reports.junitXml.mergeReruns` setting in gradle
[1]. From the gradle doc:


When mergeReruns is enabled, if a test fails but is then retried and

succeeds, its failures will be recorded as  instead of
, within one . This is effectively the reporting
produced by the surefire plugin of Apache Maven™ when enabling reruns. If
your CI server understands this format, it will indicate that the test was
flaky. If it does not, it will indicate that the test succeeded as it will
ignore the  information. If the test does not succeed (i.e.
it fails for every retry), it will be indicated as having failed whether
your tool understands this format or not.

With this, we get really close to having green builds [2] all the time.
There are only a few tests which are too flaky. We should address or
disable those.

I think that this would help us a lot because it would reduce the noise
that we get in pull requests. At the moment, there are just too many failed
tests reported so it is really hard to know whether a pull request is
actually fine or not.

[1] applies it to both unit and integration tests. Following the discussion
in the `github build queue` thread, it may be better to only apply it to
the integration tests. Being stricter with unit tests would make sense.

This does not mean that we should continue our effort to reduce the number
of flaky tests. For this, I propose to keep using Gradle Entreprise. It
provides a nice report for them that we can leverage.

Thoughts?

Best,
David

[1] https://github.com/apache/kafka/pull/14862
[2]
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests



[jira] [Created] (KAFKA-16244) Move code style exceptions from suppressions.xml to the code

2024-02-12 Thread David Jacot (Jira)
David Jacot created KAFKA-16244:
---

 Summary: Move code style exceptions from suppressions.xml to the 
code
 Key: KAFKA-16244
 URL: https://issues.apache.org/jira/browse/KAFKA-16244
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Apache Kafka 3.7.0 Release

2024-02-12 Thread Mayank Shekhar Narula
Stanislav


this *doesn't* seem like a blocker to me given the complexity,
> public API changes and

Fair enough, since it only has been seen for very high partition counts.
The PR  in trunk is under
active review, i will cherry-pick it for 3.7.1.


> the fact that it's for high partition counts (a
> definition here would help though)

This is hard to give, but only so far seen with high values of partitions
i.e. 36k.


On Mon, Feb 12, 2024 at 10:26 AM Stanislav Kozlovski
 wrote:

> Hey Divij, that is a good point regarding KIP-848.
>
> David, as the author of the KIP, would you be able to drive this?
>
> Similarly, would anybody be willing to drive such an EA Release Note for
> the JBOD feature?
>
> Mayank, this *doesn't* seem like a blocker to me given the complexity,
> public API changes and the fact that it's for high partition counts (a
> definition here would help though)
>
> Reminder - RC4 is out for vote right now.
>
> Best,
> Stanislav
>
> On Tue, Feb 6, 2024 at 5:25 PM Mayank Shekhar Narula <
> mayanks.nar...@gmail.com> wrote:
>
> > Hi Folks
> >
> > KIP-951 was delivered fully in AK 3.7. Its 1st optimisation was delivered
> > in 3.6.1, to skip backoff period for a produce batch being retried to new
> > leader i.e. KAFKA-15415.
> >
> > KAFKA-15415 current implementation introduced a performance regression,
> by
> > increasing synchronization on the produce path, especially for high
> > partition counts. The description section of
> > https://issues.apache.org/jira/browse/KAFKA-16226 goes more into details
> > of
> > the regression.
> >
> > I have put up a fix https://github.com/apache/kafka/pull/15323, which
> > removes this synchronization. The fix adds a new public method to
> > Cluster.java, and a public constructor to PartitionInfo.java.
> >
> > Is this a blocker for v3.7.0?
> >
> > PS - Posted in KIP-951's voting thread as well
> > .
> >
> >
> > On Fri, Feb 2, 2024 at 3:58 PM Divij Vaidya 
> > wrote:
> >
> > > Hey folks
> > >
> > > The release plan for 3.7.0 [1] calls out KIP 848 as "Targeting a
> Preview
> > in
> > > 3.7".
> > >
> > > Is that still true? If yes, then we should perhaps add that in the
> blog,
> > > call it out in the release notes and prepare a preview document similar
> > to
> > > what we did for Tiered Storage Early Access release[2]
> > >
> > > If not true, then we should update the release notes to reflect the
> > current
> > > state of the KIP.
> > >
> > > (I think the same is true for other KIPs like KIP-963)
> > >
> > > [1]
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.7.0
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
> > >
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Thu, Jan 11, 2024 at 1:03 PM Luke Chen  wrote:
> > >
> > > > Hi all,
> > > >
> > > > There is a bug KAFKA-16101
> > > >  reporting that
> > > "Kafka
> > > > cluster will be unavailable during KRaft migration rollback".
> > > > The impact for this issue is that if brokers try to rollback to ZK
> mode
> > > > during KRaft migration process, there will be a period of time the
> > > cluster
> > > > is unavailable.
> > > > Since ZK migrating to KRaft feature is a production ready feature, I
> > > think
> > > > this should be addressed soon.
> > > > Do you think this is a blocker for v3.7.0?
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > > On Thu, Jan 11, 2024 at 6:11 AM Stanislav Kozlovski
> > > >  wrote:
> > > >
> > > > > Thanks Colin,
> > > > >
> > > > > With that, I believe we are out of blockers. I was traveling today
> > and
> > > > > couldn't build an RC - expect one to be published tomorrow (barring
> > any
> > > > > problems).
> > > > >
> > > > > In the meanwhile - here is a PR for the 3.7 blog post -
> > > > > https://github.com/apache/kafka-site/pull/578
> > > > >
> > > > > Best,
> > > > > Stan
> > > > >
> > > > > On Wed, Jan 10, 2024 at 12:06 AM Colin McCabe 
> > > > wrote:
> > > > >
> > > > > > KAFKA-16094 has been fixed and backported to 3.7.
> > > > > >
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 8, 2024, at 14:52, Colin McCabe wrote:
> > > > > > > On an unrelated note, I found a blocker bug related to upgrades
> > > from
> > > > > > > 3.6 (and earlier) to 3.7.
> > > > > > >
> > > > > > > The JIRA is here:
> > > > > > >   https://issues.apache.org/jira/browse/KAFKA-16094
> > > > > > >
> > > > > > > Fix here:
> > > > > > >   https://github.com/apache/kafka/pull/15153
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jan 8, 2024, at 14:47, Colin McCabe wrote:
> > > > > > >> Hi Ismael,
> > > > > > >>
> > > > > > >> I wasn't aware of that. If we are required to publish all
> > modules,
> > > > > then
> > > > > > >> this is working as intended.
> > >

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-02-12 Thread Andrew Schofield
Hi Chirag,
Thanks for your question.

28. Good catch. Those options were omitted in error. I will update the KIP.

Thanks,
Andrew

> On 12 Feb 2024, at 13:06, Chirag Wadhwa  wrote:
>
> Hi Andrew,
>
> Thank you for the KIP, it is a great read ! I just have a small question.
>
> 28. I noticed that the "*--state*" and "*--timeout*" options are not
> mentioned for the kafka-share-groups.sh tool. Was this omission
> intentional, or is it possibly an oversight in the KIP?
> Thanks,
> Chirag
>
> On Mon, Feb 12, 2024 at 5:25 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Jun
>> Thanks for your comments.
>>
>> 10. For read-uncommitted isolation level, the consumer just reads all
>> records.
>> For read-committed isolation level, the share-partition leader does the
>> filtering to
>> enable correct skipping of aborted records. The consumers in a share group
>> are not
>> aware of the filtering, unlike consumers in consumer groups.
>>
>> 11. The “classic” type is the pre-KIP 848 consumer groups.
>>
>> 12. By setting the configuration for a group resource, you are saying
>> “when a new group is
>> created with this name, it must have this type”. It’s not changing the
>> type of an existing
>> group.
>>
>> 13. Good catch. The Server Assignor should be at group level. I will
>> change it.
>>
>> 14. That is true. I have maintained it to keep similarity with consumer
>> groups,
>> but it is not currently exposed to clients. It might be best to remove it.
>>
>> 15. I had intended that SimpleAssignor implements
>> org.apache.kafka.clients.consumer.ConsumerPartitionAssignor.
>> Actually, I think there’s benefit to using a new interface so that someone
>> doesn’t inadvertently
>> configure something like the RoundRobinAssignor for a share group. It
>> wouldn’t go well. I will
>> add a new interface to the KIP.
>>
>> 16. When an existing member issues a ShareGroupHeartbeatRequest to the new
>> coordinator,
>> the coordinator returns UNKNOWN_MEMBER_ID. The client then sends another
>> ShareGroupHeartbeatRequest
>> containing no member ID and epoch 0. The coordinator then returns the
>> member ID.
>>
>> 17. I don’t think so. What is the client going to do with the exception?
>> Share groups are
>> intentionally removing some of the details of using Kafka offsets from the
>> consumers. If the
>> SPSO needs to be reset due to retention, it just does that automatically.
>>
>> 18. The proposed use of control records needs some careful thought.
>> 18.1. They’re written by the share-partition leader, not the coordinator.
>> 18.2. If the client commits the acknowledgement, it is only confirmed to
>> the client
>> once it has been replicated to the other replica brokers. So, committing
>> an acknowledgement
>> is very similar to sending a record to a topic in terms of the behaviour.
>>
>> 19. You are correct. The possibility of record duplication exists in
>> failure scenarios. A future KIP
>> will add EOS support for share groups.
>>
>> 20.1. Yes, an exception. I was thinking InvalidOffsetException. I will
>> update the KIP with more
>> detail about protocol error codes and API exceptions.
>> 20.2. I think that’s a mistake. I’ll rectify it.
>>
>> 21. The message sets for the new control records would be filtered out for
>> all consumers.
>>
>> 22. Fetch from follower is not supported. I will update the KIP.
>>
>> 23.1. I am not quite happy with the explanation of the checkpoint and
>> delta records. Essentially,
>> there needs to be one checkpoint and then any number of deltas. Then
>> another checkpoint supersedes
>> the previous records, and can have its own sequence of deltas, and so on.
>> Because recovery requires the
>> leader to read the latest checkpoint and all subsequent deltas, you want
>> to take checkpoints frequently
>> enough to speed up recovery, but infrequently enough to minimise the
>> performance impact of reserializing
>> all the state.
>> 23.2. I’ll check the document again carefully, but the SHARE_DELTA should
>> always contain DeliveryCount
>> for every member of the States array.
>>
>> 24. I was anticipating added to the index files which are part of each log
>> segment.
>>
>> 25. The acknowledgements for each topic-partition are atomic. All this
>> really means is that we perform the
>> state checking and the state persistence atomically (one control record).
>> The callback tells you whether the
>> acknowledgements for the entire topic-partition succeeded or failed,
>> rather than each record individually.
>> I could have gone with a callback with a record-based interface. Would
>> that be preferable, do you think?
>> For one thing, that does give more flexibility for optimisations such as
>> fetch pipelining in the future.
>>
>> 26. The metadata is unused. This is re-using an existing class
>> (OffsetAndMetadata). Perhaps it would be better
>> not to.
>>
>> 27. Yes, agreed. I will add it.
>>
>> Thanks,
>> Andrew
>>
>>> On 9 Feb 2024, at 23:14, Jun Rao  wrote:
>>>
>>> Hi, Andrew,
>>

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-02-12 Thread Chirag Wadhwa
Hi Andrew,

Thank you for the KIP, it is a great read ! I just have a small question.

28. I noticed that the "*--state*" and "*--timeout*" options are not
mentioned for the kafka-share-groups.sh tool. Was this omission
intentional, or is it possibly an oversight in the KIP?
Thanks,
Chirag

On Mon, Feb 12, 2024 at 5:25 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi Jun
> Thanks for your comments.
>
> 10. For read-uncommitted isolation level, the consumer just reads all
> records.
> For read-committed isolation level, the share-partition leader does the
> filtering to
> enable correct skipping of aborted records. The consumers in a share group
> are not
> aware of the filtering, unlike consumers in consumer groups.
>
> 11. The “classic” type is the pre-KIP 848 consumer groups.
>
> 12. By setting the configuration for a group resource, you are saying
> “when a new group is
> created with this name, it must have this type”. It’s not changing the
> type of an existing
> group.
>
> 13. Good catch. The Server Assignor should be at group level. I will
> change it.
>
> 14. That is true. I have maintained it to keep similarity with consumer
> groups,
> but it is not currently exposed to clients. It might be best to remove it.
>
> 15. I had intended that SimpleAssignor implements
> org.apache.kafka.clients.consumer.ConsumerPartitionAssignor.
> Actually, I think there’s benefit to using a new interface so that someone
> doesn’t inadvertently
> configure something like the RoundRobinAssignor for a share group. It
> wouldn’t go well. I will
> add a new interface to the KIP.
>
> 16. When an existing member issues a ShareGroupHeartbeatRequest to the new
> coordinator,
> the coordinator returns UNKNOWN_MEMBER_ID. The client then sends another
> ShareGroupHeartbeatRequest
> containing no member ID and epoch 0. The coordinator then returns the
> member ID.
>
> 17. I don’t think so. What is the client going to do with the exception?
> Share groups are
> intentionally removing some of the details of using Kafka offsets from the
> consumers. If the
> SPSO needs to be reset due to retention, it just does that automatically.
>
> 18. The proposed use of control records needs some careful thought.
> 18.1. They’re written by the share-partition leader, not the coordinator.
> 18.2. If the client commits the acknowledgement, it is only confirmed to
> the client
> once it has been replicated to the other replica brokers. So, committing
> an acknowledgement
> is very similar to sending a record to a topic in terms of the behaviour.
>
> 19. You are correct. The possibility of record duplication exists in
> failure scenarios. A future KIP
> will add EOS support for share groups.
>
> 20.1. Yes, an exception. I was thinking InvalidOffsetException. I will
> update the KIP with more
> detail about protocol error codes and API exceptions.
> 20.2. I think that’s a mistake. I’ll rectify it.
>
> 21. The message sets for the new control records would be filtered out for
> all consumers.
>
> 22. Fetch from follower is not supported. I will update the KIP.
>
> 23.1. I am not quite happy with the explanation of the checkpoint and
> delta records. Essentially,
> there needs to be one checkpoint and then any number of deltas. Then
> another checkpoint supersedes
> the previous records, and can have its own sequence of deltas, and so on.
> Because recovery requires the
> leader to read the latest checkpoint and all subsequent deltas, you want
> to take checkpoints frequently
> enough to speed up recovery, but infrequently enough to minimise the
> performance impact of reserializing
> all the state.
> 23.2. I’ll check the document again carefully, but the SHARE_DELTA should
> always contain DeliveryCount
> for every member of the States array.
>
> 24. I was anticipating added to the index files which are part of each log
> segment.
>
> 25. The acknowledgements for each topic-partition are atomic. All this
> really means is that we perform the
> state checking and the state persistence atomically (one control record).
> The callback tells you whether the
> acknowledgements for the entire topic-partition succeeded or failed,
> rather than each record individually.
> I could have gone with a callback with a record-based interface. Would
> that be preferable, do you think?
> For one thing, that does give more flexibility for optimisations such as
> fetch pipelining in the future.
>
> 26. The metadata is unused. This is re-using an existing class
> (OffsetAndMetadata). Perhaps it would be better
> not to.
>
> 27. Yes, agreed. I will add it.
>
> Thanks,
> Andrew
>
> > On 9 Feb 2024, at 23:14, Jun Rao  wrote:
> >
> > Hi, Andrew,
> >
> > Thanks for the KIP. A few comments below.
> >
> > 10. ShareFetchResponse: To consume transactional data, currently
> > FetchResponse includes the AbortedTransactions fields for the client to
> > properly skip aborted records. ShareFetchResponse doesn't include that.
> How
> > do we prevent the consumer fr

[jira] [Resolved] (KAFKA-16239) Clean up references to non-existent IntegrationTestHelper

2024-02-12 Thread Divij Vaidya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-16239.
--
Resolution: Fixed

> Clean up references to non-existent IntegrationTestHelper
> -
>
> Key: KAFKA-16239
> URL: https://issues.apache.org/jira/browse/KAFKA-16239
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Divij Vaidya
>Priority: Minor
>  Labels: newbie
> Fix For: 3.8.0
>
>
> A bunch of places in the code javadocs and READ docs refer to a class called 
> IntegrationTestHelper. Such a class does not exist. 
> This task will clean up all referenced to IntegrationTestHelper from Kafka 
> code base.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2636

2024-02-12 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-932: Queues for Kafka

2024-02-12 Thread Andrew Schofield
Hi Jun
Thanks for your comments.

10. For read-uncommitted isolation level, the consumer just reads all records.
For read-committed isolation level, the share-partition leader does the 
filtering to
enable correct skipping of aborted records. The consumers in a share group are 
not
aware of the filtering, unlike consumers in consumer groups.

11. The “classic” type is the pre-KIP 848 consumer groups.

12. By setting the configuration for a group resource, you are saying “when a 
new group is
created with this name, it must have this type”. It’s not changing the type of 
an existing
group.

13. Good catch. The Server Assignor should be at group level. I will change it.

14. That is true. I have maintained it to keep similarity with consumer groups,
but it is not currently exposed to clients. It might be best to remove it.

15. I had intended that SimpleAssignor implements 
org.apache.kafka.clients.consumer.ConsumerPartitionAssignor.
Actually, I think there’s benefit to using a new interface so that someone 
doesn’t inadvertently
configure something like the RoundRobinAssignor for a share group. It wouldn’t 
go well. I will
add a new interface to the KIP.

16. When an existing member issues a ShareGroupHeartbeatRequest to the new 
coordinator,
the coordinator returns UNKNOWN_MEMBER_ID. The client then sends another 
ShareGroupHeartbeatRequest
containing no member ID and epoch 0. The coordinator then returns the member ID.

17. I don’t think so. What is the client going to do with the exception? Share 
groups are
intentionally removing some of the details of using Kafka offsets from the 
consumers. If the
SPSO needs to be reset due to retention, it just does that automatically.

18. The proposed use of control records needs some careful thought.
18.1. They’re written by the share-partition leader, not the coordinator.
18.2. If the client commits the acknowledgement, it is only confirmed to the 
client
once it has been replicated to the other replica brokers. So, committing an 
acknowledgement
is very similar to sending a record to a topic in terms of the behaviour.

19. You are correct. The possibility of record duplication exists in failure 
scenarios. A future KIP
will add EOS support for share groups.

20.1. Yes, an exception. I was thinking InvalidOffsetException. I will update 
the KIP with more
detail about protocol error codes and API exceptions.
20.2. I think that’s a mistake. I’ll rectify it.

21. The message sets for the new control records would be filtered out for all 
consumers.

22. Fetch from follower is not supported. I will update the KIP.

23.1. I am not quite happy with the explanation of the checkpoint and delta 
records. Essentially,
there needs to be one checkpoint and then any number of deltas. Then another 
checkpoint supersedes
the previous records, and can have its own sequence of deltas, and so on. 
Because recovery requires the
leader to read the latest checkpoint and all subsequent deltas, you want to 
take checkpoints frequently
enough to speed up recovery, but infrequently enough to minimise the 
performance impact of reserializing
all the state.
23.2. I’ll check the document again carefully, but the SHARE_DELTA should 
always contain DeliveryCount
for every member of the States array.

24. I was anticipating added to the index files which are part of each log 
segment.

25. The acknowledgements for each topic-partition are atomic. All this really 
means is that we perform the
state checking and the state persistence atomically (one control record). The 
callback tells you whether the
acknowledgements for the entire topic-partition succeeded or failed, rather 
than each record individually.
I could have gone with a callback with a record-based interface. Would that be 
preferable, do you think?
For one thing, that does give more flexibility for optimisations such as fetch 
pipelining in the future.

26. The metadata is unused. This is re-using an existing class 
(OffsetAndMetadata). Perhaps it would be better
not to.

27. Yes, agreed. I will add it.

Thanks,
Andrew

> On 9 Feb 2024, at 23:14, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the KIP. A few comments below.
> 
> 10. ShareFetchResponse: To consume transactional data, currently
> FetchResponse includes the AbortedTransactions fields for the client to
> properly skip aborted records. ShareFetchResponse doesn't include that. How
> do we prevent the consumer from reading aborted records in a share group?
> 
> 11. "adding "share"  to the existing group types of "consumer"  and
> "classic" "
> What's the "classic" type?
> 
> 12. bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-name
> group --entity-name G1 --alter --add-config group.type=share
> So, one could change the group type? What happens to the states associated
> with the group (members, epoch, offsets, etc)?
> 
> 13. Why is Server Assignor at member level, instead of group level?
> 
> 14. Member.metadata: How is that being used? It isn't exposed to

[jira] [Resolved] (KAFKA-14041) Avoid the keyword var for a variable declaration in ConfigTransformer

2024-02-12 Thread Divij Vaidya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-14041.
--
Resolution: Fixed

> Avoid the keyword var for a variable declaration in ConfigTransformer
> -
>
> Key: KAFKA-14041
> URL: https://issues.apache.org/jira/browse/KAFKA-14041
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: QualiteSys QualiteSys
>Assignee: Andrew Schofield
>Priority: Major
> Fix For: 3.8.0
>
>
> In the file 
> clients\src\main\java\org\apache\kafka\common\config\ConfigTransformer.java a 
> variable named var is declared :
> line 84 : for (ConfigVariable var : vars) {
> Since it is a java keyword, could the variable name be changed ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Kafka-Streams-Scala for Scala 3

2024-02-12 Thread Matthias Berndt
The additional load on the ASF's CI infrastructure may or may not be a
problem, but either way I don't think it is to be addressed within the
scope of this particular change.

Best,
Matthias

Am Mo., 12. Feb. 2024 um 11:37 Uhr schrieb Josep Prat
:

> Hi Matthias,
> If I understand it right (I'm not an expert in CI infra), Kafka uses the
> ASF shared Jenkins, meaning this would potentially increase the
> time-to-built for the rest of the projects.
>
> Best,
>
> On Mon, Feb 12, 2024 at 11:30 AM Matthias Berndt <
> matthias.ber...@ttmzero.com> wrote:
>
> > Hey Josep,
> >
> > wouldn't it be possible to run this in a separate CI job that runs in
> > parallel? Then it shouldn't take any longer than today – assuming enough
> CI
> > resources are available.
> >
> > Best,
> > Matthias
> >
> > Am Mo., 12. Feb. 2024 um 09:14 Uhr schrieb Josep Prat
> > :
> >
> > > Hi Matthias,
> > >
> > > One of the problems of adding support for Scala 3 for the scala-streams
> > > submodule is that we would need to have another build run on CI (which
> is
> > > already extremely long). I guess we could restrict this only when
> having
> > > changes on the Stream module(s).
> > > If I remember it correctly, when I tried to port Kafka to Scala 3 (as a
> > > whole) the sentiment was that only 2 versions of Scala should be
> > supported
> > > at a time. Kafka 4.0.0 will remove support for Scala 2.12. I'll wait
> for
> > > others to chime in. Maybe Ismael has some thoughts about it.
> > >
> > > Best,
> > >
> > > On Sat, Feb 10, 2024 at 1:55 AM Matthias Berndt <
> > > matthias.ber...@ttmzero.com>
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > I'd like to discuss a Scala 3 release of the Kafka-Streams-Scala
> > library.
> > > > As you might have seen already, I have recently created a ticket
> > > > https://issues.apache.org/jira/browse/KAFKA-16237
> > > > and a PR
> > > > https://github.com/apache/kafka/pull/15338
> > > > to move this forward. The changes required to make
> Kafka-Streams-Scala
> > > > compile with Scala 3 are trivial; the trickier part is the build
> system
> > > and
> > > > the release process
> > > > I have made some changes to the build system (feel free to comment on
> > the
> > > > above PR about that) that make it possible to test
> Kafka-Streams-Scala
> > > and
> > > > build the jar. What remains to be done is the CI and release process.
> > > There
> > > > is a `release.py` file in the Kafka repository's root directory,
> which
> > > > assumes that all artifacts are available for all supported Scala
> > > versions.
> > > > This is no longer the case with my changes because while porting
> > > > Kafka-Streams-Scala to Scala 3 is trivial, porting Kafka to Scala 3
> is
> > > less
> > > > so, and shouldn't hold back a Scala 3 release of
> Kafka-Streams-Scala. I
> > > > would appreciate some guidance as to what the release process should
> > look
> > > > like in the future.
> > > >
> > > > Oh and I've made a PR to remove a syntax error from release.py.
> > > > https://github.com/apache/kafka/pull/15350
> > > >
> > > > All the best,
> > > > Matthias
> > > >
> > >
> > >
> > > --
> > > [image: Aiven] 
> > >
> > > *Josep Prat*
> > > Open Source Engineering Director, *Aiven*
> > > josep.p...@aiven.io   |   +491715557497
> > > aiven.io    |   <
> > https://www.facebook.com/aivencloud
> > > >
> > >      <
> > > https://twitter.com/aiven_io>
> > > *Aiven Deutschland GmbH*
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Re: [DISCUSS] Kafka-Streams-Scala for Scala 3

2024-02-12 Thread Josep Prat
Hi Matthias,
If I understand it right (I'm not an expert in CI infra), Kafka uses the
ASF shared Jenkins, meaning this would potentially increase the
time-to-built for the rest of the projects.

Best,

On Mon, Feb 12, 2024 at 11:30 AM Matthias Berndt <
matthias.ber...@ttmzero.com> wrote:

> Hey Josep,
>
> wouldn't it be possible to run this in a separate CI job that runs in
> parallel? Then it shouldn't take any longer than today – assuming enough CI
> resources are available.
>
> Best,
> Matthias
>
> Am Mo., 12. Feb. 2024 um 09:14 Uhr schrieb Josep Prat
> :
>
> > Hi Matthias,
> >
> > One of the problems of adding support for Scala 3 for the scala-streams
> > submodule is that we would need to have another build run on CI (which is
> > already extremely long). I guess we could restrict this only when having
> > changes on the Stream module(s).
> > If I remember it correctly, when I tried to port Kafka to Scala 3 (as a
> > whole) the sentiment was that only 2 versions of Scala should be
> supported
> > at a time. Kafka 4.0.0 will remove support for Scala 2.12. I'll wait for
> > others to chime in. Maybe Ismael has some thoughts about it.
> >
> > Best,
> >
> > On Sat, Feb 10, 2024 at 1:55 AM Matthias Berndt <
> > matthias.ber...@ttmzero.com>
> > wrote:
> >
> > > Hey there,
> > >
> > > I'd like to discuss a Scala 3 release of the Kafka-Streams-Scala
> library.
> > > As you might have seen already, I have recently created a ticket
> > > https://issues.apache.org/jira/browse/KAFKA-16237
> > > and a PR
> > > https://github.com/apache/kafka/pull/15338
> > > to move this forward. The changes required to make Kafka-Streams-Scala
> > > compile with Scala 3 are trivial; the trickier part is the build system
> > and
> > > the release process
> > > I have made some changes to the build system (feel free to comment on
> the
> > > above PR about that) that make it possible to test Kafka-Streams-Scala
> > and
> > > build the jar. What remains to be done is the CI and release process.
> > There
> > > is a `release.py` file in the Kafka repository's root directory, which
> > > assumes that all artifacts are available for all supported Scala
> > versions.
> > > This is no longer the case with my changes because while porting
> > > Kafka-Streams-Scala to Scala 3 is trivial, porting Kafka to Scala 3 is
> > less
> > > so, and shouldn't hold back a Scala 3 release of Kafka-Streams-Scala. I
> > > would appreciate some guidance as to what the release process should
> look
> > > like in the future.
> > >
> > > Oh and I've made a PR to remove a syntax error from release.py.
> > > https://github.com/apache/kafka/pull/15350
> > >
> > > All the best,
> > > Matthias
> > >
> >
> >
> > --
> > [image: Aiven] 
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io    |   <
> https://www.facebook.com/aivencloud
> > >
> >      <
> > https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Re: [DISCUSS] Kafka-Streams-Scala for Scala 3

2024-02-12 Thread Matthias Berndt
Hey Josep,

wouldn't it be possible to run this in a separate CI job that runs in
parallel? Then it shouldn't take any longer than today – assuming enough CI
resources are available.

Best,
Matthias

Am Mo., 12. Feb. 2024 um 09:14 Uhr schrieb Josep Prat
:

> Hi Matthias,
>
> One of the problems of adding support for Scala 3 for the scala-streams
> submodule is that we would need to have another build run on CI (which is
> already extremely long). I guess we could restrict this only when having
> changes on the Stream module(s).
> If I remember it correctly, when I tried to port Kafka to Scala 3 (as a
> whole) the sentiment was that only 2 versions of Scala should be supported
> at a time. Kafka 4.0.0 will remove support for Scala 2.12. I'll wait for
> others to chime in. Maybe Ismael has some thoughts about it.
>
> Best,
>
> On Sat, Feb 10, 2024 at 1:55 AM Matthias Berndt <
> matthias.ber...@ttmzero.com>
> wrote:
>
> > Hey there,
> >
> > I'd like to discuss a Scala 3 release of the Kafka-Streams-Scala library.
> > As you might have seen already, I have recently created a ticket
> > https://issues.apache.org/jira/browse/KAFKA-16237
> > and a PR
> > https://github.com/apache/kafka/pull/15338
> > to move this forward. The changes required to make Kafka-Streams-Scala
> > compile with Scala 3 are trivial; the trickier part is the build system
> and
> > the release process
> > I have made some changes to the build system (feel free to comment on the
> > above PR about that) that make it possible to test Kafka-Streams-Scala
> and
> > build the jar. What remains to be done is the CI and release process.
> There
> > is a `release.py` file in the Kafka repository's root directory, which
> > assumes that all artifacts are available for all supported Scala
> versions.
> > This is no longer the case with my changes because while porting
> > Kafka-Streams-Scala to Scala 3 is trivial, porting Kafka to Scala 3 is
> less
> > so, and shouldn't hold back a Scala 3 release of Kafka-Streams-Scala. I
> > would appreciate some guidance as to what the release process should look
> > like in the future.
> >
> > Oh and I've made a PR to remove a syntax error from release.py.
> > https://github.com/apache/kafka/pull/15350
> >
> > All the best,
> > Matthias
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Re: Apache Kafka 3.7.0 Release

2024-02-12 Thread Stanislav Kozlovski
Hey Divij, that is a good point regarding KIP-848.

David, as the author of the KIP, would you be able to drive this?

Similarly, would anybody be willing to drive such an EA Release Note for
the JBOD feature?

Mayank, this *doesn't* seem like a blocker to me given the complexity,
public API changes and the fact that it's for high partition counts (a
definition here would help though)

Reminder - RC4 is out for vote right now.

Best,
Stanislav

On Tue, Feb 6, 2024 at 5:25 PM Mayank Shekhar Narula <
mayanks.nar...@gmail.com> wrote:

> Hi Folks
>
> KIP-951 was delivered fully in AK 3.7. Its 1st optimisation was delivered
> in 3.6.1, to skip backoff period for a produce batch being retried to new
> leader i.e. KAFKA-15415.
>
> KAFKA-15415 current implementation introduced a performance regression, by
> increasing synchronization on the produce path, especially for high
> partition counts. The description section of
> https://issues.apache.org/jira/browse/KAFKA-16226 goes more into details
> of
> the regression.
>
> I have put up a fix https://github.com/apache/kafka/pull/15323, which
> removes this synchronization. The fix adds a new public method to
> Cluster.java, and a public constructor to PartitionInfo.java.
>
> Is this a blocker for v3.7.0?
>
> PS - Posted in KIP-951's voting thread as well
> .
>
>
> On Fri, Feb 2, 2024 at 3:58 PM Divij Vaidya 
> wrote:
>
> > Hey folks
> >
> > The release plan for 3.7.0 [1] calls out KIP 848 as "Targeting a Preview
> in
> > 3.7".
> >
> > Is that still true? If yes, then we should perhaps add that in the blog,
> > call it out in the release notes and prepare a preview document similar
> to
> > what we did for Tiered Storage Early Access release[2]
> >
> > If not true, then we should update the release notes to reflect the
> current
> > state of the KIP.
> >
> > (I think the same is true for other KIPs like KIP-963)
> >
> > [1] https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.7.0
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
> >
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Thu, Jan 11, 2024 at 1:03 PM Luke Chen  wrote:
> >
> > > Hi all,
> > >
> > > There is a bug KAFKA-16101
> > >  reporting that
> > "Kafka
> > > cluster will be unavailable during KRaft migration rollback".
> > > The impact for this issue is that if brokers try to rollback to ZK mode
> > > during KRaft migration process, there will be a period of time the
> > cluster
> > > is unavailable.
> > > Since ZK migrating to KRaft feature is a production ready feature, I
> > think
> > > this should be addressed soon.
> > > Do you think this is a blocker for v3.7.0?
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Thu, Jan 11, 2024 at 6:11 AM Stanislav Kozlovski
> > >  wrote:
> > >
> > > > Thanks Colin,
> > > >
> > > > With that, I believe we are out of blockers. I was traveling today
> and
> > > > couldn't build an RC - expect one to be published tomorrow (barring
> any
> > > > problems).
> > > >
> > > > In the meanwhile - here is a PR for the 3.7 blog post -
> > > > https://github.com/apache/kafka-site/pull/578
> > > >
> > > > Best,
> > > > Stan
> > > >
> > > > On Wed, Jan 10, 2024 at 12:06 AM Colin McCabe 
> > > wrote:
> > > >
> > > > > KAFKA-16094 has been fixed and backported to 3.7.
> > > > >
> > > > > Colin
> > > > >
> > > > >
> > > > > On Mon, Jan 8, 2024, at 14:52, Colin McCabe wrote:
> > > > > > On an unrelated note, I found a blocker bug related to upgrades
> > from
> > > > > > 3.6 (and earlier) to 3.7.
> > > > > >
> > > > > > The JIRA is here:
> > > > > >   https://issues.apache.org/jira/browse/KAFKA-16094
> > > > > >
> > > > > > Fix here:
> > > > > >   https://github.com/apache/kafka/pull/15153
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 8, 2024, at 14:47, Colin McCabe wrote:
> > > > > >> Hi Ismael,
> > > > > >>
> > > > > >> I wasn't aware of that. If we are required to publish all
> modules,
> > > > then
> > > > > >> this is working as intended.
> > > > > >>
> > > > > >> I am a bit curious if we've discussed why we need to publish the
> > > > server
> > > > > >> modules to Sonatype. Is there a discussion about the pros and
> cons
> > > of
> > > > > >> this somewhere?
> > > > > >>
> > > > > >> regards,
> > > > > >> Colin
> > > > > >>
> > > > > >> On Mon, Jan 8, 2024, at 14:09, Ismael Juma wrote:
> > > > > >>> All modules are published to Sonatype - that's a requirement.
> You
> > > may
> > > > > be
> > > > > >>> missing the fact that `core` is published as `kafka_2.13` and
> > > > > `kafka_2.12`.
> > > > > >>>
> > > > > >>> Ismael
> > > > > >>>
> > > > > >>> On Tue, Jan 9, 2024 at 12:00 AM Colin McCabe <
> cmcc...@apache.org
> > >
> > > > > wrote:
> > > > > >>>
> > > > >  Hi Ismael,
> > > > > 
> > > > >  It seems like both the metadata gradl

[jira] [Created] (KAFKA-16243) Idle kafka-console-consumer with new consumer group protocol preemptively leaves group

2024-02-12 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16243:


 Summary: Idle kafka-console-consumer with new consumer group 
protocol preemptively leaves group
 Key: KAFKA-16243
 URL: https://issues.apache.org/jira/browse/KAFKA-16243
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 3.7.0
Reporter: Andrew Schofield
Assignee: Andrew Schofield


Using the new consumer group protocol with kafka-console-consumer.sh, I find 
that if I leave the consumer with no records to process for 5 minutes 
(max.poll.interval.ms = 30ms), the tool logs the following warning message 
and leaves the group.

"consumer poll timeout has expired. This means the time between subsequent 
calls to poll() was longer than the configured max.poll.interval.ms, which 
typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records."

With the older consumer, this did not occur.

The reason is that the consumer keeps a poll timer which is used to ensure 
liveness of the application thread. The poll timer automatically updates while 
the `Consumer.poll(Duration)` method is blocked, while the newer consumer only 
updates the poll timer when a new call to `Consumer.poll(Duration)` is issued. 
This means that the kafka-console-consumer.sh tools, which uses a very long 
timeout by default, works differently with the new consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16220) KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions is flaky

2024-02-12 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-16220.

Resolution: Fixed

> KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions
>  is flaky
> 
>
> Key: KAFKA-16220
> URL: https://issues.apache.org/jira/browse/KAFKA-16220
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Lucas Brutschy
>Assignee: Lucas Brutschy
>Priority: Major
>  Labels: flaky, flaky-test
>
> This test has seen significant flakyness
>  
> https://ge.apache.org/s/fac7lploprvuu/tests/task/:streams:test/details/org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest/shouldThrowIllegalArgumentExceptionWhenCustomPartitionerReturnsMultiplePartitions()?top-execution=1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Kafka-Streams-Scala for Scala 3

2024-02-12 Thread Josep Prat
Hi Matthias,

One of the problems of adding support for Scala 3 for the scala-streams
submodule is that we would need to have another build run on CI (which is
already extremely long). I guess we could restrict this only when having
changes on the Stream module(s).
If I remember it correctly, when I tried to port Kafka to Scala 3 (as a
whole) the sentiment was that only 2 versions of Scala should be supported
at a time. Kafka 4.0.0 will remove support for Scala 2.12. I'll wait for
others to chime in. Maybe Ismael has some thoughts about it.

Best,

On Sat, Feb 10, 2024 at 1:55 AM Matthias Berndt 
wrote:

> Hey there,
>
> I'd like to discuss a Scala 3 release of the Kafka-Streams-Scala library.
> As you might have seen already, I have recently created a ticket
> https://issues.apache.org/jira/browse/KAFKA-16237
> and a PR
> https://github.com/apache/kafka/pull/15338
> to move this forward. The changes required to make Kafka-Streams-Scala
> compile with Scala 3 are trivial; the trickier part is the build system and
> the release process
> I have made some changes to the build system (feel free to comment on the
> above PR about that) that make it possible to test Kafka-Streams-Scala and
> build the jar. What remains to be done is the CI and release process. There
> is a `release.py` file in the Kafka repository's root directory, which
> assumes that all artifacts are available for all supported Scala versions.
> This is no longer the case with my changes because while porting
> Kafka-Streams-Scala to Scala 3 is trivial, porting Kafka to Scala 3 is less
> so, and shouldn't hold back a Scala 3 release of Kafka-Streams-Scala. I
> would appreciate some guidance as to what the release process should look
> like in the future.
>
> Oh and I've made a PR to remove a syntax error from release.py.
> https://github.com/apache/kafka/pull/15350
>
> All the best,
> Matthias
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread David Jacot
Hi folks,

I have been playing with `reports.junitXml.mergeReruns` setting in gradle
[1]. From the gradle doc:

> When mergeReruns is enabled, if a test fails but is then retried and
succeeds, its failures will be recorded as  instead of
, within one . This is effectively the reporting
produced by the surefire plugin of Apache Maven™ when enabling reruns. If
your CI server understands this format, it will indicate that the test was
flaky. If it does not, it will indicate that the test succeeded as it will
ignore the  information. If the test does not succeed (i.e.
it fails for every retry), it will be indicated as having failed whether
your tool understands this format or not.

With this, we get really close to having green builds [2] all the time.
There are only a few tests which are too flaky. We should address or
disable those.

I think that this would help us a lot because it would reduce the noise
that we get in pull requests. At the moment, there are just too many failed
tests reported so it is really hard to know whether a pull request is
actually fine or not.

[1] applies it to both unit and integration tests. Following the discussion
in the `github build queue` thread, it may be better to only apply it to
the integration tests. Being stricter with unit tests would make sense.

This does not mean that we should continue our effort to reduce the number
of flaky tests. For this, I propose to keep using Gradle Entreprise. It
provides a nice report for them that we can leverage.

Thoughts?

Best,
David

[1] https://github.com/apache/kafka/pull/14862
[2]
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests