[jira] [Commented] (KAFKA-16211) Inconsistent config values in CreateTopicsResult and DescribeConfigsResult

2024-02-20 Thread Deng Ziming (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819095#comment-17819095
 ] 

Deng Ziming commented on KAFKA-16211:
-

 
{code:java}
// code placeholder
@ExtendWith(value = Array(classOf[ClusterTestExtensions]))
@ClusterTestDefaults(clusterType = Type.ALL, brokers = 1)
class Test(cluster: ClusterInstance) {

@BeforeEach
def setup(config: ClusterConfig): Unit = {
  config.serverProperties().setProperty("log.segment.bytes", "573741824")
}
    @ClusterTest
  def test(): Unit = {
// KAFKA-16211
val topic = "mytopic"
val resource = new ConfigResource(ConfigResource.Type.TOPIC, topic)
val newTopics = Seq(new NewTopic(topic, 1, 1.toShort))
val admin = cluster.createAdminClient()
val create = 
admin.createTopics(newTopics.asJava).config(topic).get().get("segment.bytes")
val describe = 
admin.describeConfigs(Collections.singletonList(resource)).values().get(resource).get().get("segment.bytes")
println(s"create $create \ndescribe: $describe")
  }
}
 {code}
This can be reproduced using the code above with some adjustment to 
`KafkaClusterTestKit.Builder.createNodeConfig` , it seems we should use 
"log.segment.bytes" instead of "segment.bytes". 

 

> Inconsistent config values in CreateTopicsResult and DescribeConfigsResult
> --
>
> Key: KAFKA-16211
> URL: https://issues.apache.org/jira/browse/KAFKA-16211
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Gantigmaa Selenge
>Priority: Minor
>
> When creating a topic in KRaft cluster, a config value returned in 
> CreateTopicsResult is different than what you get from describe topic 
> configs, if the config was set in broker.properties or controller.properties 
> or in both but with different values. 
>  
> For example, start a broker with `segment.bytes` set to 573741824 in the 
> properties file and then create a topic, the CreateTopicsResult contains:
> ConfigEntry(name=segment.bytes, value=1073741824, source=DEFAULT_CONFIG, 
> isSensitive=false, isReadOnly=false, synonyms=[], type=INT, 
> documentation=null)
>  because the controller was started without setting this config. 
> However when you describe configurations for the same topic, the config value 
> set by the broker is returned:
> Create topic configsConfigEntry(name=segment.bytes, value=573741824, 
> source=STATIC_BROKER_CONFIG, isSensitive=false, isReadOnly=false, 
> synonyms=[], type=null, documentation=null)
>  
> Vice versa, if the controller is started with this config set to a different 
> value, the create topic request returns the value set by the controller and 
> then when you describe the config for the same topic, you get the value set 
> by the broker. This makes it confusing to understand which value being is 
> used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16190) Member should send full heartbeat when rejoining

2024-02-20 Thread Quoc Phong Dang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819081#comment-17819081
 ] 

Quoc Phong Dang commented on KAFKA-16190:
-

[~kirktrue] Sorry for the delay, I've just create a PR, you can have look at it 
now.

Thanks

> Member should send full heartbeat when rejoining
> 
>
> Key: KAFKA-16190
> URL: https://issues.apache.org/jira/browse/KAFKA-16190
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Quoc Phong Dang
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support, newbie
> Fix For: 3.8.0
>
>
> The heartbeat request builder should make sure that all fields are sent in 
> the heartbeat request when the consumer rejoins (currently the 
> HeartbeatRequestManager request builder is reset on failure scenarios, which 
> should cover the fence+rejoin sequence). 
> Note that the existing HeartbeatRequestManagerTest.testHeartbeatState misses 
> this exact case given that it does explicitly change the subscription when it 
> gets fenced. We should ensure we test a consumer that keeps it same initial 
> subscription when it rejoins after being fenced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] KAFKA-16190: Member should send full heartbeat when rejoining [kafka]

2024-02-20 Thread via GitHub


phong260702 opened a new pull request, #15401:
URL: https://github.com/apache/kafka/pull/15401

   When the consumer rejoins, heartbeat request builder make sure that all 
fields are sent in the heartbeat request.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] (KAFKA-16190) Member should send full heartbeat when rejoining

2024-02-20 Thread Quoc Phong Dang (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-16190 ]


Quoc Phong Dang deleted comment on KAFKA-16190:
-

was (Author: JIRAUSER303789):
[~kirktrue] Thank you and sorry for the dely, It takes my sometime to look into 
the KIP and trying to navigate the code. I'm trying to see how should i know if 
a consumer is rejoin.

> Member should send full heartbeat when rejoining
> 
>
> Key: KAFKA-16190
> URL: https://issues.apache.org/jira/browse/KAFKA-16190
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Quoc Phong Dang
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support, newbie
> Fix For: 3.8.0
>
>
> The heartbeat request builder should make sure that all fields are sent in 
> the heartbeat request when the consumer rejoins (currently the 
> HeartbeatRequestManager request builder is reset on failure scenarios, which 
> should cover the fence+rejoin sequence). 
> Note that the existing HeartbeatRequestManagerTest.testHeartbeatState misses 
> this exact case given that it does explicitly change the subscription when it 
> gets fenced. We should ensure we test a consumer that keeps it same initial 
> subscription when it rejoins after being fenced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] KAFKA-16259 Immutable MetadataCache to improve client performance [kafka]

2024-02-20 Thread via GitHub


ericzhifengchen commented on PR #15376:
URL: https://github.com/apache/kafka/pull/15376#issuecomment-1955880312

   Hi Mayank,
   
   My github account is ***@***.***
   
   Thanks,
   Zhifeng
   
   On Mon, Feb 19, 2024 at 4:22 AM Mayank Shekhar Narula <
   ***@***.***> wrote:
   
   > @ericzhifengchen  It seems you had a
   > similar idea on creating immutable metadata cache on the client to improve
   > latency :)
   >
   > I have created a follow-up #15385
   >  to add similar test to
   > testConcurrentUpdateAndGetCluster in this PR. I can add you as a
   > co-author to PR 15385, can you share the email with your github account?
   > See steps on getting this information here
   > 

   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
--
Description: 
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are sent to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 1000 records to the topic, expecting 500 records in partition0, and 500 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
issue. It should already exist for a long time.

 

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.

  was:
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are sent to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
issue. It should already exist for a long time.

 

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.


> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> 

[jira] [Updated] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
--
Description: 
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are sent to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
issue. It should already exist for a long time.

 

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.

  was:
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
issue. It should already exist for a long time.

 

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.


> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> URL: 

[jira] [Commented] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819057#comment-17819057
 ] 

ASF GitHub Bot commented on KAFKA-16283:


showuon opened a new pull request, #585:
URL: https://github.com/apache/kafka-site/pull/585

   Add notes in "config doc" to notify users about the bug: KAFKA-16283 and not 
to use `RoundRobinPartitioner`.




> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> URL: https://issues.apache.org/jira/browse/KAFKA-16283
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.0, 3.6.1
>Reporter: Luke Chen
>Priority: Major
>
> When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we 
> expect data are send to all partitions in round-robin manner. But we found 
> there are only half of the partitions got the data. This causes half of the 
> resources(storage, consumer...) are wasted.
> {code:java}
> > bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> > localhost:9092 --partitions 2 
> Created topic quickstart-events4.
> # send 10 records to the topic, expecting 5 records in partition0, and 5 
> records in partition1
> > bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 
> > 1000 --record-size 1024 --throughput -1 --producer-props 
> > bootstrap.servers=localhost:9092 
> > partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
> 1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg 
> latency, 121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 
> 99.9th.
> > ls -al /tmp/kafka-logs/quickstart-events4-1
> total 24
> drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
> drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
> -rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
> -rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
> -rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
> .timeindex
> -rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
> -rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
> # No records in partition 1
> > ls -al /tmp/kafka-logs/quickstart-events4-0
> total 8
> drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
> drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
> -rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
> -rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
> -rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
> .timeindex
> -rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
> -rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
> {code}
> Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
> issue. It should already exist for a long time.
>  
> Had a quick look, it's because we will abortOnNewBatch each time when new 
> batch created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] KAFKA-16283: notify users about RoundRobinPartitioner bug [kafka]

2024-02-20 Thread via GitHub


showuon opened a new pull request, #15400:
URL: https://github.com/apache/kafka/pull/15400

   Add notes in "3.7.0 notable changes" and "config doc" to notify users not to 
use `RoundRobinPartitioner`.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
--
Affects Version/s: 3.6.1
   3.0.0
   3.1.0

> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> URL: https://issues.apache.org/jira/browse/KAFKA-16283
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.0, 3.6.1
>Reporter: Luke Chen
>Priority: Major
>
> When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we 
> expect data are send to all partitions in round-robin manner. But we found 
> there are only half of the partitions got the data. This causes half of the 
> resources(storage, consumer...) are wasted.
> {code:java}
> > bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> > localhost:9092 --partitions 2 
> Created topic quickstart-events4.
> # send 10 records to the topic, expecting 5 records in partition0, and 5 
> records in partition1
> > bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 
> > 1000 --record-size 1024 --throughput -1 --producer-props 
> > bootstrap.servers=localhost:9092 
> > partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
> 1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg 
> latency, 121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 
> 99.9th.
> > ls -al /tmp/kafka-logs/quickstart-events4-1
> total 24
> drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
> drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
> -rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
> -rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
> -rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
> .timeindex
> -rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
> -rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
> # No records in partition 1
> > ls -al /tmp/kafka-logs/quickstart-events4-0
> total 8
> drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
> drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
> -rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
> -rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
> -rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
> .timeindex
> -rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
> -rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
> {code}
> Had a quick look, it's because we will abortOnNewBatch each time when new 
> batch created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
--
Description: 
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Tested in kafka 3.0.0, 3.2.3, and the latest trunk, they all have the same 
issue. It should already exist for a long time.

 

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.

  was:
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.


> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> URL: https://issues.apache.org/jira/browse/KAFKA-16283
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 

[jira] [Updated] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
--
Description: 
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.
{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel   1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}
Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created.

  was:
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.


{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel  1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created. 


> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> URL: https://issues.apache.org/jira/browse/KAFKA-16283
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Priority: Major
>
> When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we 
> 

[jira] [Updated] (KAFKA-16283) RoundRobinPartitioner will only send to half of the partitions in a topic

2024-02-20 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-16283:
--
Description: 
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.


{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 1000 
> --record-size 1024 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

1000 records sent, 6535.947712 records/sec (6.38 MB/sec), 2.88 ms avg latency, 
121.00 ms max latency, 2 ms 50th, 7 ms 95th, 10 ms 99th, 121 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-1
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel  1037819  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-0
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created. 

  was:
When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we expect 
data are send to all partitions in round-robin manner. But we found there are 
only half of the partitions got the data. This causes half of the 
resources(storage, consumer...) are wasted.


{code:java}
> bin/kafka-topics.sh --create --topic quickstart-events4 --bootstrap-server 
> localhost:9092 --partitions 2 

Created topic quickstart-events4.

# send 10 records to the topic, expecting 5 records in partition0, and 5 
records in partition1
> bin/kafka-producer-perf-test.sh --topic quickstart-events4 --num-records 10 
> --record-size 100 --throughput -1 --producer-props 
> bootstrap.servers=localhost:9092 
> partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner

10 records sent, 72.463768 records/sec (0.01 MB/sec), 35.10 ms avg latency, 
132.00 ms max latency, 24 ms 50th, 132 ms 95th, 132 ms 99th, 132 ms 99.9th.

> ls -al /tmp/kafka-logs/quickstart-events4-0
total 24
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel  1151  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 8  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata

# No records in partition 1
> ls -al /tmp/kafka-logs/quickstart-events4-1
total 8
drwxr-xr-x   7 lukchen  wheel   224  2 20 19:53 .
drwxr-xr-x  70 lukchen  wheel  2240  2 20 19:53 ..
-rw-r--r--   1 lukchen  wheel  10485760  2 20 19:53 .index
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 .log
-rw-r--r--   1 lukchen  wheel  10485756  2 20 19:53 
.timeindex
-rw-r--r--   1 lukchen  wheel 0  2 20 19:53 leader-epoch-checkpoint
-rw-r--r--   1 lukchen  wheel43  2 20 19:53 partition.metadata
{code}

Had a quick look, it's because we will abortOnNewBatch each time when new batch 
created. 


> RoundRobinPartitioner will only send to half of the partitions in a topic
> -
>
> Key: KAFKA-16283
> URL: https://issues.apache.org/jira/browse/KAFKA-16283
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Priority: Major
>
> When using `org.apache.kafka.clients.producer.RoundRobinPartitioner`, we 
> 

Re: [PR] KAFKA-16288, KAFKA-16289: Fix Values convertToDecimal exception and parseString corruption [kafka]

2024-02-20 Thread via GitHub


C0urante commented on code in PR #15399:
URL: https://github.com/apache/kafka/pull/15399#discussion_r1496759639


##
connect/api/src/test/java/org/apache/kafka/connect/data/ValuesTest.java:
##
@@ -744,6 +785,32 @@ public void shouldConvertTimestampValues() {
 assertEquals(current, ts4);
 }
 
+@Test
+public void shouldConvertDecimalValues() {
+// Various forms of the same number should all be parsed to the same 
BigDecimal
+Number number = 1.0f;
+String string = number.toString();
+BigDecimal value = new BigDecimal(string);
+byte[] bytes = Decimal.fromLogical(Decimal.schema(1), value);
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+assertEquals(value, Values.convertToDecimal(null, number, 1));
+assertEquals(value, Values.convertToDecimal(null, string, 1));
+assertEquals(value, Values.convertToDecimal(null, value, 1));
+assertEquals(value, Values.convertToDecimal(null, bytes, 1));
+assertEquals(value, Values.convertToDecimal(null, buffer, 1));
+}
+
+@Test
+public void shouldConvertDecimalValuesInList() {
+List decimals = Arrays.asList("\"1.0\"", 
BigDecimal.valueOf(Long.MAX_VALUE).add(BigDecimal.ONE), 
BigDecimal.valueOf(Long.MIN_VALUE).subtract(BigDecimal.ONE), BigDecimal.ONE, 
BigDecimal.ONE);
+String strings = decimals.toString();
+SchemaAndValue schemaAndValue = Values.parseString(strings);
+Schema schema = schemaAndValue.schema();
+assertEquals(Type.ARRAY, schema.type());
+assertNull(schema.valueSchema());

Review Comment:
   Is it too difficult to also add an assertion about the parsed values in the 
array? Not a blocker, but seems nice to have if possible, especially since we 
don't cover anything except various representations of `1` in the other test 
above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (KAFKA-16190) Member should send full heartbeat when rejoining

2024-02-20 Thread Quoc Phong Dang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818979#comment-17818979
 ] 

Quoc Phong Dang edited comment on KAFKA-16190 at 2/21/24 1:11 AM:
--

[~kirktrue] Thank you and sorry for the dely, It takes my sometime to look into 
the KIP and trying to navigate the code. I'm trying to see how should i know if 
a consumer is rejoin.


was (Author: JIRAUSER303789):
[~kirktrue] Thank you and sorry for the dely, It takes my sometime to look into 
the KIP and trying to navigate the code. I'm not so sure the file I'm trying to 
change is the correct one, can you point me out the location need to be done it 
would be helpful.

> Member should send full heartbeat when rejoining
> 
>
> Key: KAFKA-16190
> URL: https://issues.apache.org/jira/browse/KAFKA-16190
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Quoc Phong Dang
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support, newbie
> Fix For: 3.8.0
>
>
> The heartbeat request builder should make sure that all fields are sent in 
> the heartbeat request when the consumer rejoins (currently the 
> HeartbeatRequestManager request builder is reset on failure scenarios, which 
> should cover the fence+rejoin sequence). 
> Note that the existing HeartbeatRequestManagerTest.testHeartbeatState misses 
> this exact case given that it does explicitly change the subscription when it 
> gets fenced. We should ensure we test a consumer that keeps it same initial 
> subscription when it rejoins after being fenced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6675) Connect workers should log plugin path and available plugins more clearly

2024-02-20 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-6675.

Fix Version/s: 3.6.0
 Assignee: Greg Harris  (was: Valeria Vasylieva)
   Resolution: Fixed

This was incorporated into the bin/connect-plugin-path.sh list command, as 
specified in KIP-898: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery]
 . This can be used offline without starting the connect worker or loading any 
live configurations.

> Connect workers should log plugin path and available plugins more clearly
> -
>
> Key: KAFKA-6675
> URL: https://issues.apache.org/jira/browse/KAFKA-6675
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Affects Versions: 0.11.0.1
>Reporter: Randall Hauch
>Assignee: Greg Harris
>Priority: Minor
> Fix For: 3.6.0
>
>
> Users struggle with setting the plugin path and properly installing plugins. 
> If users get any of this wrong, they get strange errors only after they run 
> the worker and attempt to deploy connectors or use transformations. 
> The Connect worker should more obviously output the plugin path directories 
> and the available plugins. For example, if the {{plugin.path}} were:
> {code}
> plugin.path=/usr/local/share/java,/usr/local/plugins
> {code}
> then the worker might output something like the following information in the 
> log:
> {noformat}
> Looking for plugins on classpath and inside plugin.path directories:
>   /usr/local/share/java
>   /usr/local/plugins
>  
> Source Connector(s):
>   FileStreamSource  (org.apache.kafka.connect.file.FileStreamSourceConnector) 
>   @ classpath
>   FileStreamSink(org.apache.kafka.connect.file.FileStreamSinkConnector)   
>   @ classpath
>   JdbcSource(io.confluent.connect.jdbc.JdbcSourceConnector)   
>   @ /usr/local/share/java/kafka-connect-jdbc
>   MySql (io.debezium.connector.mysql.MySqlConnector)  
>   @ /usr/local/plugins/debezium-connector-mysql
> Converter(s):
>   JsonConverter (org.apache.kafka.connect.json.JsonConverter) 
>   @ classpath
>   ByteArrayConverter
> (org.apache.kafka.connect.converters.ByteArrayConverter)@ classpath
>   SimpleHeaderConverter 
> (org.apache.kafka.connect.converters.SimpleHeaderConverter) @ classpath
>   AvroConverter (io.confluent.connect.avro.AvroConverter) 
>   @ /usr/local/share/java/kafka-serde-tools
> Transformation(s):
>   InsertField   (org.apache.kafka.connect.transforms.InsertField) 
>   @ classpath
>   ReplaceField  (org.apache.kafka.connect.transforms.ReplaceField)
>   @ classpath
>   MaskField (org.apache.kafka.connect.transforms.MaskField)   
>   @ classpath
>   ValueToKey(org.apache.kafka.connect.transforms.ValueToKey)  
>   @ classpath
>   HoistField(org.apache.kafka.connect.transforms.HoistField)  
>   @ classpath
>   ExtractField  (org.apache.kafka.connect.transforms.ExtractField)
>   @ classpath
>   SetSchemaMetadata (org.apache.kafka.connect.transforms.SetSchemaMetadata)   
>   @ classpath
>   RegexRouter   (org.apache.kafka.connect.transforms.RegexRouter) 
>   @ classpath
>   TimestampRouter   (org.apache.kafka.connect.transforms.TimestampRouter) 
>   @ classpath
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] MINOR: Document MetadataResponse invariants for name and ID [kafka]

2024-02-20 Thread via GitHub


gharris1727 commented on PR #15386:
URL: https://github.com/apache/kafka/pull/15386#issuecomment-1955537975

   Hi @dengziming @jolshan @rajinisivaram Could you PTAL at this documentation 
change?
   
   Going off of KIP-516 and the discussion on the PRs which added topic IDs to 
the metadata request/response, Anton and I think that these descriptions are 
accurate for fully upgraded and converged clusters, because the KIP-516 
migration to backfill topicIDs should apply to every topic in the cluster.
   
   Is this a property that we can document in the spec?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16249; Improve reconciliation state machine [kafka]

2024-02-20 Thread via GitHub


jeffkbkim commented on code in PR #15364:
URL: https://github.com/apache/kafka/pull/15364#discussion_r1496700583


##
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java:
##
@@ -1211,13 +1192,99 @@ private 
CoordinatorResult consumerGr
 // 1. The member reported its owned partitions;

Review Comment:
   do we need this condition because we can only compute a valid assignment if 
we're given the partitions a member owns?



##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java:
##
@@ -1807,9 +1776,9 @@ public void testReconciliationProcess() {
 
 assertRecordsEquals(Collections.singletonList(
 RecordHelpers.newCurrentAssignmentRecord(groupId, new 
ConsumerGroupMember.Builder(memberId1)
+.setState(MemberState.UNREVOKED_PARTITIONS)
 .setMemberEpoch(10)
-.setPreviousMemberEpoch(9)
-.setTargetMemberEpoch(11)
+.setPreviousMemberEpoch(10)

Review Comment:
   this was bumped from 9 to 10. this seems right because Member 1 was at epoch 
10 previously at L1656. how did the test past before?



##
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java:
##
@@ -1211,13 +1192,99 @@ private 
CoordinatorResult consumerGr
 // 1. The member reported its owned partitions;
 // 2. The member just joined or rejoined to group (epoch equals to 
zero);
 // 3. The member's assignment has been updated.
-if (ownedTopicPartitions != null || memberEpoch == 0 || 
assignmentUpdated) {
+if (ownedTopicPartitions != null || memberEpoch == 0 || 
hasAssignedPartitionsChanged(member, updatedMember)) {
 response.setAssignment(createResponseAssignment(updatedMember));
 }
 
 return new CoordinatorResult<>(records, response);
 }
 
+/**
+ * Reconciles the current assignment of the member if needed.
+ *
+ * @param groupId   The group id.
+ * @param memberThe member to reconcile.
+ * @param currentPartitionEpoch The function returning the current epoch of
+ *  a given partition.
+ * @param targetAssignmentEpoch The target assignment epoch.
+ * @param targetAssignment  The target assignment.
+ * @param ownedTopicPartitions  The list of partitions owned by the 
member. This
+ *  is reported in the ConsumerGroupHeartbeat 
API and
+ *  it could be null if not provided.
+ * @param records   The list to accumulate any new records.
+ * @return The received member if no changes have been made; or a new
+ * member containing the new assignment.
+ */
+private ConsumerGroupMember maybeReconcile(
+String groupId,
+ConsumerGroupMember member,
+BiFunction currentPartitionEpoch,
+int targetAssignmentEpoch,
+Assignment targetAssignment,
+List 
ownedTopicPartitions,
+List records
+) {
+if (member.isReconciledTo(targetAssignmentEpoch)) {
+return member;
+}
+
+ConsumerGroupMember updatedMember = new 
CurrentAssignmentBuilder(member)
+.withTargetAssignment(targetAssignmentEpoch, targetAssignment)
+.withCurrentPartitionEpoch(currentPartitionEpoch)
+.withOwnedTopicPartitions(ownedTopicPartitions)
+.build();
+
+if (!updatedMember.equals(member)) {
+records.add(newCurrentAssignmentRecord(groupId, updatedMember));
+
+log.info("[GroupId {}] Member {} new assignment state: epoch={}, 
previousEpoch={}, state={}, "
+ + "assignedPartitions={} and revokedPartitions={}.",
+groupId, updatedMember.memberId(), 
updatedMember.memberEpoch(), updatedMember.previousMemberEpoch(), 
updatedMember.state(),
+formatAssignment(updatedMember.assignedPartitions()), 
formatAssignment(updatedMember.revokedPartitions()));
+
+if (updatedMember.state() == MemberState.UNREVOKED_PARTITIONS) {
+scheduleConsumerGroupRebalanceTimeout(
+groupId,
+updatedMember.memberId(),
+updatedMember.memberEpoch(),
+updatedMember.rebalanceTimeoutMs()
+);
+} else {

Review Comment:
   then why do we "cancel consumer group rebalance timeout"? so i think you're 
saying that the rebalance timeout is actually the revocation timeout. is this 
correct?



##
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/consumer/ConsumerGroup.java:
##
@@ -779,7 +779,7 @@ private void maybeUpdateGroupState() {
 newState = ASSIGNING;
 } else {
 for (ConsumerGroupMember 

[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-20 Thread Justine Olshan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819023#comment-17819023
 ] 

Justine Olshan commented on KAFKA-16212:


We use something similar in the fetch session cache when the topic ID is 
unknown. 
This transition state will be as long as we support ZK I would suspect. 

 

> how would this look like that I need to be aware off during extending 
> ReplicaManager cache to be topicId aware
Not sure I understand this question.

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] KAFKA-16288, KAFKA-16289: Fix Values convertToDecimal exception and parseString corruption [kafka]

2024-02-20 Thread via GitHub


gharris1727 opened a new pull request, #15399:
URL: https://github.com/apache/kafka/pull/15399

   See the descriptions of the tickets for full details:
   
   * https://issues.apache.org/jira/browse/KAFKA-16288 convertToDecimal
   * https://issues.apache.org/jira/browse/KAFKA-16289 parseString
   
   These both represent breaking changes in behavior, but only when using 
incompatible-type arrays and maps, such as the ones included in the tests. 
Since the behavior of these is so opaque and silent corruption is possible with 
the bugs, we should change the behavior unconditionally.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (KAFKA-16277) CooperativeStickyAssignor does not spread topics evenly among consumer group

2024-02-20 Thread Cameron Redpath (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819013#comment-17819013
 ] 

Cameron Redpath commented on KAFKA-16277:
-

Thanks [~ableegoldman] for the response - yes we will try to submit a patch 
soon.

> CooperativeStickyAssignor does not spread topics evenly among consumer group
> 
>
> Key: KAFKA-16277
> URL: https://issues.apache.org/jira/browse/KAFKA-16277
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Cameron Redpath
>Priority: Major
> Attachments: image-2024-02-19-13-00-28-306.png
>
>
> Consider the following scenario:
> `topic-1`: 12 partitions
> `topic-2`: 12 partitions
>  
> Of note, `topic-1` gets approximately 10 times more messages through it than 
> `topic-2`. 
>  
> Both of these topics are consumed by a single application, single consumer 
> group, which scales under load. Each member of the consumer group subscribes 
> to both topics. The `partition.assignment.strategy` being used is 
> `org.apache.kafka.clients.consumer.CooperativeStickyAssignor`. The 
> application may start with one consumer. It consumes all partitions from both 
> topics.
>  
> The problem begins when the application scales up to two consumers. What is 
> seen is that all partitions from `topic-1` go to one consumer, and all 
> partitions from `topic-2` go to the other consumer. In the case with one 
> topic receiving more messages than the other, this results in a very 
> imbalanced group where one consumer is receiving 10x the traffic of the other 
> due to partition assignment.
>  
> This is the issue being seen in our cluster at the moment. See this graph of 
> the number of messages being processed by each consumer as the group scales 
> from one to four consumers:
> !image-2024-02-19-13-00-28-306.png|width=537,height=612!
> Things to note from this graphic:
>  * With two consumers, the partitions for a topic all go to a single consumer 
> each
>  * With three consumers, the partitions for a topic are split between two 
> consumers each
>  * With four consumers, the partitions for a topic are split between three 
> consumers each
>  * The total number of messages being processed by each consumer in the group 
> is very imbalanced throughout the entire period
>  
> With regard to the number of _partitions_ being assigned to each consumer, 
> the group is balanced. However, the assignment appears to be biased so that 
> partitions from the same topic go to the same consumer. In our scenario, this 
> leads to very undesirable partition assignment.
>  
> I question if the behaviour of the assignor should be revised, so that each 
> topic has its partitions maximally spread across all available members of the 
> consumer group. In the above scenario, this would result in much more even 
> distribution of load. The behaviour would then be:
>  * With two consumers, 6 partitions from each topic go to each consumer
>  * With three consumers, 4 partitions from each topic go to each consumer
>  * With four consumers, 3 partitions from each topic go to each consumer
>  
> Of note, we only saw this behaviour after migrating to the 
> `CooperativeStickyAssignor`. It was not an issue with the default partition 
> assignment strategy.
>  
> It is possible this may be intended behaviour. In which case, what is the 
> preferred workaround for our scenario? Our current workaround if we decide to 
> go ahead with the update to `CooperativeStickyAssignor` may be to limit our 
> consumers so they only subscribe to one topic, and have two consumer threads 
> per instance of the application.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16289) Values.parseString on heterogeneous lists and maps sometimes corrupts data by inferring incorrect schema

2024-02-20 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16289:
---

 Summary: Values.parseString on heterogeneous lists and maps 
sometimes corrupts data by inferring incorrect schema
 Key: KAFKA-16289
 URL: https://issues.apache.org/jira/browse/KAFKA-16289
 Project: Kafka
  Issue Type: Bug
  Components: connect
Reporter: Greg Harris
Assignee: Greg Harris


The Values.parseString function makes a best-effort conversion of strings to 
Connect-schema'd data. It supports reading arrays and maps as delimited by 
`[,]` and `\{:,}` characters, and attempts to infer the common type of these 
structures from the types of the elements. The algorithm it follows is:

1. Parse the elements of the list in one-pass. Infer the smallest/strictest 
type which can contain each value individually.
2. Iterate over the schemas inferred for each element, and repeatedly merge two 
schemas together to the smallest type which covers both element schemas.
3. Convert the parsed elements to the common element schema.

The implementation of step 2 here: 
[https://github.com/apache/kafka/blob/ead2431c37ace9255df88ffe819bb905311af088/connect/api/src/main/java/org/apache/kafka/connect/data/Values.java#L805-L823]
 has a flaw in it however. The `elementSchema` variable has `null` as a 
sentinel both of the situations "no elements seen so far" and "no common schema 
possible" among the seen elements.

When processing the first element of the list, `null` is used to adopt the 
schema of the first element as the initial common schema. Later when an 
incompatible element is found, the common schema is set to null to indicate 
that there is no common element schema. However, a following iteration can 
misinterpret the `null` as being at the start of the list again, and inject a 
schema which works for some of the elements and not others.

When the values are converted in step 3, each element has one of the following 
happen:
1. The value is left-as is (e.g. no common schema inferred)
2. The value is converted correctly to the destination type (e.g. int -> long)
3. An exception is thrown because the type could not be converted (e.g. string 
-> struct)
4. The value is silently corrupted (e.g. long -> int, decimal -> long)

In normal circumstances either case (1) happens to all of the elements, or 
case(2) does, depending on if a common schema was found. But when this bug is 
triggered by having heterogeneous types, case (2) or case (3) can happen to 
some of the elements in the array.

The effects depend on the order of elements in the array, as the sentinel value 
bug is dependent on the iteration order of the elements. For example:

* `[1,2,"3"]` returns Byte, Byte, String
* `["1",2,3]` returns Byte, Byte, Byte (safely converts the data, case 2)
* `[1,2,{}]` returns Byte, Byte, Map
* `[{},2,3]` experiences an exception and returns String (exception, case 3)
* `[1, 2, 10]` returns Byte, Byte, BigDecimal
* `[10, 1, 2]` returns Byte, Byte, Byte (corruption, case 4)

Fixing this bug would entail changing all of these to return heterogeneous 
lists without a common schema, and not convert the values at all. However, this 
is a backwards-incompatible change because these are all situations in which we 
return data without an exception, so downstream users could be relying on the 
result.

However, this behavior is very opaque and unpredictable, and I think anyone 
that observes this in the wild would need to work-around it or avoid it, rather 
than rely on it happening. I think that fixing it to address the silent 
corruption case is a bigger benefit to users than the harm done by changing the 
other cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop

2024-02-20 Thread Phuc Hong Tran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819009#comment-17819009
 ] 

Phuc Hong Tran edited comment on KAFKA-16160 at 2/20/24 11:15 PM:
--

Also after some findings, I’m thinking that this one is not triggered when the 
consumer is trying to connect to a disconnected nodes, as the check for 
disconnected node is right before the check which would produce these logs (See 
https://github.com/apache/kafka/blob/4c70581eb63fe74494fbabf5a90e87c38e17996d/clients/src/main/java/org/apache/kafka/clients/consumer/internals/NetworkClientDelegate.java#L160)


was (Author: JIRAUSER301295):
Also after some findings, I’m thinking that this one is not triggered when the 
consumer is trying to connect to a disconnected nodes, as the check for 
disconnected node is right before the check which would produce these logs

> AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
> --
>
> Key: KAFKA-16160
> URL: https://issues.apache.org/jira/browse/KAFKA-16160
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Observing some excessive logging running AsyncKafkaConsumer and observing 
> excessive logging of :
> {code:java}
> 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, 
> groupId=concurrent_consumer] Node is not ready, handle the request in the 
> next event loop: node=worker4:9092 (id: 2147483644 rack: null), 
> request=UnsentRequest{requestBuil     
> der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', 
> memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, 
> rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], 
> serverAssignor=null, topicP     
> artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), 
> handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b,
>  node=Optional[worker4:9092 (id: 2147483644 rack: null)]     , 
> timer=org.apache.kafka.common.utils.Timer@55ed4733} 
> (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code}
> This seems to be triggered by a tight poll loop of the network thread.  The 
> right thing to do is to backoff a bit for that given node and retry later.
> This should be a blocker for 3.8 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] MINOR: extend transaction unit test to validate drain [kafka]

2024-02-20 Thread via GitHub


jolshan merged PR #15320:
URL: https://github.com/apache/kafka/pull/15320


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop

2024-02-20 Thread Phuc Hong Tran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819009#comment-17819009
 ] 

Phuc Hong Tran commented on KAFKA-16160:


Also after some findings, I’m thinking that this one is not triggered when the 
consumer is trying to connect to a disconnected nodes, as the check for 
disconnected node is right before the check which would produce these logs

> AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
> --
>
> Key: KAFKA-16160
> URL: https://issues.apache.org/jira/browse/KAFKA-16160
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Observing some excessive logging running AsyncKafkaConsumer and observing 
> excessive logging of :
> {code:java}
> 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, 
> groupId=concurrent_consumer] Node is not ready, handle the request in the 
> next event loop: node=worker4:9092 (id: 2147483644 rack: null), 
> request=UnsentRequest{requestBuil     
> der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', 
> memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, 
> rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], 
> serverAssignor=null, topicP     
> artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), 
> handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b,
>  node=Optional[worker4:9092 (id: 2147483644 rack: null)]     , 
> timer=org.apache.kafka.common.utils.Timer@55ed4733} 
> (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code}
> This seems to be triggered by a tight poll loop of the network thread.  The 
> right thing to do is to backoff a bit for that given node and retry later.
> This should be a blocker for 3.8 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15538) Client support for java regex based subscription

2024-02-20 Thread Phuc Hong Tran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819008#comment-17819008
 ] 

Phuc Hong Tran commented on KAFKA-15538:


I can put up a pull request for this ticket by this weekend. Currently I have 
no question for this.

> Client support for java regex based subscription
> 
>
> Key: KAFKA-15538
> URL: https://issues.apache.org/jira/browse/KAFKA-15538
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Phuc Hong Tran
>Priority: Blocker
>  Labels: kip-848-client-support, newbie, regex
> Fix For: 3.8.0
>
>
> When using subscribe with a java regex (Pattern), we need to resolve it on 
> the client side to send the broker a list of topic names to subscribe to.
> Context:
> The new consumer group protocol uses [Google 
> RE2/J|https://github.com/google/re2j] for regular expressions and introduces 
> new methods in the consumer API to subscribe using a `SubscribePattern`. The 
> subscribe using a java `Pattern` will be still supported for a while but 
> eventually removed.
>  * When the subscribe with SubscriptionPattern is used, the client should 
> just send the regex to the broker and it will be resolved on the server side.
>  * In the case of the subscribe with Pattern, the regex should be resolved on 
> the client side.
> As part of this task, we should re-enable all integration tests defined in 
> the PlainTextAsyncConsumer that relate to subscription with pattern and that 
> are currently disabled for the new consumer + new protocol



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16281) Possible IllegalState with KIP-996

2024-02-20 Thread Calvin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819007#comment-17819007
 ] 

Calvin Liu commented on KAFKA-16281:


[~alivshits] Jack has corrected that the issue is with KIP-996 not KIP-966

> Possible IllegalState with KIP-996
> --
>
> Key: KAFKA-16281
> URL: https://issues.apache.org/jira/browse/KAFKA-16281
> Project: Kafka
>  Issue Type: Task
>  Components: kraft
>Reporter: Jack Vanlightly
>Priority: Major
>
> I have a TLA+ model of KIP-996 (pre-vote) and I have identified an 
> IllegalState exception that would occur with the existing 
> MaybeHandleCommonResponse behavior.
> The issue stems from the fact that a leader, let's call it r1, can resign 
> (either due to a restart or check quorum) and then later initiate a pre-vote 
> where it ends up in the same epoch as before. When r1 receives a response 
> from r2 who believes that r1 is still the leader, the logic in 
> MaybeHandleCommonResponse tries to transition r1 to follower of itself, 
> causing an IllegalState exception to be raised.
> This is an example history:
>  # r1 is the leader in epoch 1.
>  # r1 quorum resigns, or restarts and resigns.
>  # r1 experiences an election timeout and transitions to Prospective.
>  # r1 sends a pre vote request to its peers.
>  # r2 thinks r1 is still the leader, sends a vote response, not granting its 
> vote and setting leaderId=r1 and epoch=1.
>  # r1 receives the vote response and executes MaybeHandleCommonResponse which 
> tries to transition r1 to Follower of itself and an illegal state occurs.
> The relevant else if statement in MaybeHandleCommonResponse is here: 
> [https://github.com/apache/kafka/blob/a26a1d847f1884a519561e7a4fb4cd13e051c824/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1538]
> In the TLA+ specification, I fixed this issue by adding a fourth condition to 
> this statement, that replica must not be in the Prospective state. 
> [https://github.com/Vanlightly/kafka-tlaplus/blob/9b2600d1cd5c65930d666b12792d47362b64c015/kraft/kip_996/kraft_kip_996_functions.tla#L336|https://github.com/Vanlightly/kafka-tlaplus/blob/421f170ba4bd8c5eceb36b88b47901ee3d9c3d2a/kraft/kip_996/kraft_kip_996_functions.tla#L336]
>  
> Note, that I also had to implement the sending of the BeginQuorumEpoch 
> request by the leader to prevent a replica getting stuck in Prospective. If 
> the replica r2 has an election timeout but due to a transient connectivity 
> issue with the leader, but has also fallen behind slightly, then r2 will 
> remain stuck as a Prospective because none of its peers, who have 
> connectivity to the leader, will grant it a pre-vote. To enable r2 to become 
> a functional member again, the leader must give it a nudge with a 
> BeginQuorumEpoch request. The alternative (which I have also modeled) is for 
> a Prospective to transition to Follower when it receives a negative pre-vote 
> response with a non-null leaderId. This comes with a separate liveness issue 
> which I can discuss if this "transition to Follower" approach is interesting. 
> Either way, a stuck Prospective needs a way to transition to follower 
> eventually, if all other members have a stable leader.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop

2024-02-20 Thread Phuc Hong Tran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819005#comment-17819005
 ] 

Phuc Hong Tran commented on KAFKA-16160:


[~kirktrue] I was not able to reproduce this scenario on my local machine. I’m 
not sure which test was this discovered in and how to trigger it

> AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
> --
>
> Key: KAFKA-16160
> URL: https://issues.apache.org/jira/browse/KAFKA-16160
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Observing some excessive logging running AsyncKafkaConsumer and observing 
> excessive logging of :
> {code:java}
> 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, 
> groupId=concurrent_consumer] Node is not ready, handle the request in the 
> next event loop: node=worker4:9092 (id: 2147483644 rack: null), 
> request=UnsentRequest{requestBuil     
> der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', 
> memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, 
> rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], 
> serverAssignor=null, topicP     
> artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), 
> handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b,
>  node=Optional[worker4:9092 (id: 2147483644 rack: null)]     , 
> timer=org.apache.kafka.common.utils.Timer@55ed4733} 
> (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code}
> This seems to be triggered by a tight poll loop of the network thread.  The 
> right thing to do is to backoff a bit for that given node and retry later.
> This should be a blocker for 3.8 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16281) Possible IllegalState with KIP-996

2024-02-20 Thread Artem Livshits (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819004#comment-17819004
 ] 

Artem Livshits commented on KAFKA-16281:


Is this a problem with KIP-966 or just a model that was built for validating 
KIP-966 found an issue in the KRaft protocol itself?  I don't think KIP-966 
changes the voting protocol for KRaft.

> Possible IllegalState with KIP-996
> --
>
> Key: KAFKA-16281
> URL: https://issues.apache.org/jira/browse/KAFKA-16281
> Project: Kafka
>  Issue Type: Task
>  Components: kraft
>Reporter: Jack Vanlightly
>Priority: Major
>
> I have a TLA+ model of KIP-996 (pre-vote) and I have identified an 
> IllegalState exception that would occur with the existing 
> MaybeHandleCommonResponse behavior.
> The issue stems from the fact that a leader, let's call it r1, can resign 
> (either due to a restart or check quorum) and then later initiate a pre-vote 
> where it ends up in the same epoch as before. When r1 receives a response 
> from r2 who believes that r1 is still the leader, the logic in 
> MaybeHandleCommonResponse tries to transition r1 to follower of itself, 
> causing an IllegalState exception to be raised.
> This is an example history:
>  # r1 is the leader in epoch 1.
>  # r1 quorum resigns, or restarts and resigns.
>  # r1 experiences an election timeout and transitions to Prospective.
>  # r1 sends a pre vote request to its peers.
>  # r2 thinks r1 is still the leader, sends a vote response, not granting its 
> vote and setting leaderId=r1 and epoch=1.
>  # r1 receives the vote response and executes MaybeHandleCommonResponse which 
> tries to transition r1 to Follower of itself and an illegal state occurs.
> The relevant else if statement in MaybeHandleCommonResponse is here: 
> [https://github.com/apache/kafka/blob/a26a1d847f1884a519561e7a4fb4cd13e051c824/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1538]
> In the TLA+ specification, I fixed this issue by adding a fourth condition to 
> this statement, that replica must not be in the Prospective state. 
> [https://github.com/Vanlightly/kafka-tlaplus/blob/9b2600d1cd5c65930d666b12792d47362b64c015/kraft/kip_996/kraft_kip_996_functions.tla#L336|https://github.com/Vanlightly/kafka-tlaplus/blob/421f170ba4bd8c5eceb36b88b47901ee3d9c3d2a/kraft/kip_996/kraft_kip_996_functions.tla#L336]
>  
> Note, that I also had to implement the sending of the BeginQuorumEpoch 
> request by the leader to prevent a replica getting stuck in Prospective. If 
> the replica r2 has an election timeout but due to a transient connectivity 
> issue with the leader, but has also fallen behind slightly, then r2 will 
> remain stuck as a Prospective because none of its peers, who have 
> connectivity to the leader, will grant it a pre-vote. To enable r2 to become 
> a functional member again, the leader must give it a nudge with a 
> BeginQuorumEpoch request. The alternative (which I have also modeled) is for 
> a Prospective to transition to Follower when it receives a negative pre-vote 
> response with a non-null leaderId. This comes with a separate liveness issue 
> which I can discuss if this "transition to Follower" approach is interesting. 
> Either way, a stuck Prospective needs a way to transition to follower 
> eventually, if all other members have a stable leader.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop

2024-02-20 Thread Phuc Hong Tran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819003#comment-17819003
 ] 

Phuc Hong Tran commented on KAFKA-16160:


For this one, I do need some helps.

> AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
> --
>
> Key: KAFKA-16160
> URL: https://issues.apache.org/jira/browse/KAFKA-16160
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Observing some excessive logging running AsyncKafkaConsumer and observing 
> excessive logging of :
> {code:java}
> 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, 
> groupId=concurrent_consumer] Node is not ready, handle the request in the 
> next event loop: node=worker4:9092 (id: 2147483644 rack: null), 
> request=UnsentRequest{requestBuil     
> der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', 
> memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, 
> rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], 
> serverAssignor=null, topicP     
> artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), 
> handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b,
>  node=Optional[worker4:9092 (id: 2147483644 rack: null)]     , 
> timer=org.apache.kafka.common.utils.Timer@55ed4733} 
> (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code}
> This seems to be triggered by a tight poll loop of the network thread.  The 
> right thing to do is to backoff a bit for that given node and retry later.
> This should be a blocker for 3.8 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16008) Fix PlaintextConsumerTest.testMaxPollIntervalMs

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16008:
-

Assignee: Lucas Brutschy  (was: Philip Nee)

> Fix PlaintextConsumerTest.testMaxPollIntervalMs
> ---
>
> Key: KAFKA-16008
> URL: https://issues.apache.org/jira/browse/KAFKA-16008
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is 
> failing when using the {{AsyncKafkaConsumer}}.
> The error is:
> {code}
> org.opentest4j.AssertionFailedError: Timed out before expected rebalance 
> completed
>     at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
> at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
> at 
> kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317)
> at 
> kafka.api.PlaintextConsumerTest.testMaxPollIntervalMs(PlaintextConsumerTest.scala:194)
> {code}
> The logs include this line:
>  
> {code}
> [2023-12-13 15:11:16,134] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> {code} 
> I don't know if that's related or not.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16152) Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16152:
-

Assignee: Lucas Brutschy  (was: Kirk True)

> Fix 
> PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart
> --
>
> Key: KAFKA-16152
> URL: https://issues.apache.org/jira/browse/KAFKA-16152
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, 
> kip-848-client-support
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16010) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16010:
-

Assignee: Lucas Brutschy  (was: Kirk True)

> Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling
> --
>
> Key: KAFKA-16010
> URL: https://issues.apache.org/jira/browse/KAFKA-16010
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> The integration test 
> {{PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling}} is 
> failing when using the {{AsyncKafkaConsumer}}.
> The error is:
> {code}
> org.opentest4j.AssertionFailedError: Did not get valid assignment for 
> partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-5, topic1-1, 
> topic1-0, topic1-3] after one consumer left
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
>   at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
>   at 
> kafka.api.AbstractConsumerTest.validateGroupAssignment(AbstractConsumerTest.scala:286)
>   at 
> kafka.api.PlaintextConsumerTest.runMultiConsumerSessionTimeoutTest(PlaintextConsumerTest.scala:1883)
>   at 
> kafka.api.PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling(PlaintextConsumerTest.scala:1281)
> {code}
> The logs include these lines:
>  
> {code}
> [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> {code} 
> I don't know if that's related or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16167:
-

Assignee: Lucas Brutschy  (was: Kirk True)

> Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup
> --
>
> Key: KAFKA-16167
> URL: https://issues.apache.org/jira/browse/KAFKA-16167
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, 
> kip-848-client-support
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] MINOR: remove unnecessary logging [kafka]

2024-02-20 Thread via GitHub


wcarlson5 commented on code in PR #15396:
URL: https://github.com/apache/kafka/pull/15396#discussion_r1496604182


##
streams/src/main/java/org/apache/kafka/streams/state/internals/AbstractRocksDBSegmentedBytesStore.java:
##
@@ -264,7 +264,6 @@ public void put(final Bytes key,
 final S segment = segments.getOrCreateSegmentIfLive(segmentId, 
context, observedStreamTime);
 if (segment == null) {
 expiredRecordSensor.record(1.0d, context.currentSystemTimeMs());
-LOG.warn("Skipping record for expired segment.");

Review Comment:
   I'm indifferent. We can convert it if anyone finds value in it but I don't 
really see it. But I also don't think it will crowd the debug logs too much 
either. 路 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16200:
-

Assignee: Bruno Cadonna  (was: Kirk True)

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Bruno Cadonna
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16008) Fix PlaintextConsumerTest.testMaxPollIntervalMs

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16008:
-

Assignee: Philip Nee  (was: Kirk True)

> Fix PlaintextConsumerTest.testMaxPollIntervalMs
> ---
>
> Key: KAFKA-16008
> URL: https://issues.apache.org/jira/browse/KAFKA-16008
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Philip Nee
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is 
> failing when using the {{AsyncKafkaConsumer}}.
> The error is:
> {code}
> org.opentest4j.AssertionFailedError: Timed out before expected rebalance 
> completed
>     at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
> at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
> at 
> kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317)
> at 
> kafka.api.PlaintextConsumerTest.testMaxPollIntervalMs(PlaintextConsumerTest.scala:194)
> {code}
> The logs include this line:
>  
> {code}
> [2023-12-13 15:11:16,134] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> {code} 
> I don't know if that's related or not.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16241) Kafka Streams hits IllegalStateException trying to recycle a task

2024-02-20 Thread Walker Carlson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson updated KAFKA-16241:
---
Priority: Critical  (was: Major)

> Kafka Streams hits IllegalStateException trying to recycle a task
> -
>
> Key: KAFKA-16241
> URL: https://issues.apache.org/jira/browse/KAFKA-16241
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.6.1
>Reporter: Matthias J. Sax
>Priority: Critical
> Attachments: streams-1.zip, streams-2.zip, streams-3.zip
>
>
> Running with EOS-v2 (not sure if relevant or not) and hitting:
> {code:java}
> [2024-02-08 20:57:42,325] ERROR [i-0fede2697f39580f9-StreamThread-1] 
> stream-thread [i-0fede2697f39580f9-StreamThread-1] Failed to recycle task 1_0 
> cleanly. Attempting to close remaining tasks before re-throwing: 
> (org.apache.kafka.streams.processor.internals.TaskManager)
> java.lang.IllegalStateException: Illegal state RESTORING while recycling 
> active task 1_0
>     at 
> org.apache.kafka.streams.processor.internals.StreamTask.prepareRecycle(StreamTask.java:582)
>     at 
> org.apache.kafka.streams.processor.internals.StandbyTaskCreator.createStandbyTaskFromActive(StandbyTaskCreator.java:125)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.convertActiveToStandby(TaskManager.java:675)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.closeAndRecycleTasks(TaskManager.java:651)
>     at 
> org.apache.kafka.streams.processor.internals.TaskManager.handleAssignment(TaskManager.java:350)
>     at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.onAssignment(StreamsPartitionAssignor.java:1381)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokeOnAssignment(ConsumerCoordinator.java:315)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:469)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:478)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:389)
>     at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:564)
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1220)
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1179)
>     at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:1014)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollPhase(StreamThread.java:954)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:766)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:617)
>     at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:579)
>  {code}
> Logs of all three KS instances attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-16200:
---

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15475) Request might retry forever even if the user API timeout expires

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15475.
---
Resolution: Fixed

> Request might retry forever even if the user API timeout expires
> 
>
> Key: KAFKA-15475
> URL: https://issues.apache.org/jira/browse/KAFKA-15475
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> If the request timeout in the background thread, it will be completed with 
> TimeoutException, which is Retriable.  In the TopicMetadataRequestManager and 
> possibly other managers, the request might continue to be retried forever.
>  
> There are two ways to fix this
>  # Pass a timer to the manager to remove the inflight requests when it is 
> expired.
>  # Pass the future to the application layer and continue to retry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] KAFKA-16265: KIP-994 (Part 1) Minor Enhancements to ListTransactionsRequest [kafka]

2024-02-20 Thread via GitHub


jolshan commented on code in PR #15384:
URL: https://github.com/apache/kafka/pull/15384#discussion_r1496555226


##
tools/src/main/java/org/apache/kafka/tools/TransactionsCommand.java:
##
@@ -436,16 +436,26 @@ public String name() {
 
 @Override
 public void addSubparser(Subparsers subparsers) {
-subparsers.addParser(name())
+Subparser subparser = subparsers.addParser(name())
 .help("list transactions");
+
+subparser.addArgument("--duration-filter")
+.help("Duration (in millis) to filter by: if < 0, all 
transactions will be returned; " +
+"otherwise, only transactions running longer than 
this duration will be returned")
+.action(store())
+.type(Long.class)
+.required(false);
 }
 
 @Override
 public void execute(Admin admin, Namespace ns, PrintStream out) throws 
Exception {
+ListTransactionsOptions options = new ListTransactionsOptions();
+
Optional.ofNullable(ns.getLong("duration_filter")).ifPresent(options::durationFilter);

Review Comment:
   please update this line as well to reflect the new name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-20 Thread Omnia Ibrahim (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818991#comment-17818991
 ] 

Omnia Ibrahim commented on KAFKA-16212:
---

I don't believe ReplicaManager have significant meaning for topic Id zero or 
null. The code related to KRAFT it assume there will always be a topic Id, 
while other codes that doesn't care about topic Id and interact with 
ReplicaManager either not updated yet or doesn't have topic Id awareness 
design. So theoretically this will simplify proposal #1. 

However we will have to 
1. have validation in varies places to handle topic Id as dummy values. 
2. we might need to revert these dummy value and some of the validations later 
in the future. 

I think if we have been using similar approach in other places then proposal#1 
should be fine. 

With all of that said I have one worry regarding the code readability and 
maintenance as having topic Id as Option/Optional.empty/Null/Zero UUID as dummy 
values in the APIs in different places might be a bit confusing during 
extending the code. Is there an agreement as part of KIP-516 for how long this 
transition state will last before having topic id in most places and how would 
this look like that I need to be aware off during extending ReplicaManager 
cache to be topicId aware?

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16233) Review auto-commit continuously committing when no progress

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16233:
-

Assignee: Philip Nee

> Review auto-commit continuously committing when no progress 
> 
>
> Key: KAFKA-16233
> URL: https://issues.apache.org/jira/browse/KAFKA-16233
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Philip Nee
>Priority: Minor
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> When auto-commit is enabled, the consumer (legacy and new) will continuously 
> send commit requests with the current positions, even if no progress is made 
> and positions remain unchanged. We could consider if this is really needed 
> for some reason, or if we could improve it and just send auto-commit on the 
> interval if positions have moved, avoiding sending repeatedly the same commit 
> request.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16261) MembershipManagerImpl.updateSubscription fails if already empty subscription

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16261:
--
Fix Version/s: 3.8.0

> MembershipManagerImpl.updateSubscription fails if already empty subscription
> 
>
> Key: KAFKA-16261
> URL: https://issues.apache.org/jira/browse/KAFKA-16261
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Andrew Schofield
>Assignee: Lianet Magrans
>Priority: Minor
>  Labels: consumer-threading-refactor, kip-848-client-support
> Fix For: 3.8.0
>
>
> The internal SubscriptionState object keeps track of whether the assignment 
> is user-assigned, or auto-assigned. If there are no assigned partitions, the 
> assignment resets to NONE. If you call SubscriptionState.assignFromSubscribed 
> in this state it fails.
> The easiest thing is perhaps to check 
> SubscriptionState.hasAutoAssignedPartitions() to make sure that 
> assignFromSubscribed is going to be permitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16233) Review auto-commit continuously committing when no progress

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16233:
--
Fix Version/s: 3.8.0

> Review auto-commit continuously committing when no progress 
> 
>
> Key: KAFKA-16233
> URL: https://issues.apache.org/jira/browse/KAFKA-16233
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Priority: Minor
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> When auto-commit is enabled, the consumer (legacy and new) will continuously 
> send commit requests with the current positions, even if no progress is made 
> and positions remain unchanged. We could consider if this is really needed 
> for some reason, or if we could improve it and just send auto-commit on the 
> interval if positions have moved, avoiding sending repeatedly the same commit 
> request.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15694) New integration tests to have full coverage for preview

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15694:
--
Fix Version/s: 3.8.0

> New integration tests to have full coverage for preview
> ---
>
> Key: KAFKA-15694
> URL: https://issues.apache.org/jira/browse/KAFKA-15694
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Minor
>  Labels: kip-848, kip-848-client-support, kip-848-preview
> Fix For: 3.8.0
>
>
> These are to fix bugs discovered during PR reviews but not tests.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15840) Correct initialization of ConsumerGroupHeartbeat by client

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15840:
--
Fix Version/s: 3.7.0

> Correct initialization of ConsumerGroupHeartbeat by client
> --
>
> Key: KAFKA-15840
> URL: https://issues.apache.org/jira/browse/KAFKA-15840
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> The new consumer using the KIP-848 protocol currently leaves the 
> TopicPartitions set to null for the ConsumerGroupHeartbeat request, even when 
> the MemberEpoch is zero. This violates the KIP which expects the list to be 
> empty (but not null).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15651) Investigate auto commit guarantees during Consumer.assign()

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15651:
--
Fix Version/s: 3.7.0

> Investigate auto commit guarantees during Consumer.assign()
> ---
>
> Key: KAFKA-15651
> URL: https://issues.apache.org/jira/browse/KAFKA-15651
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, kip-848-preview
> Fix For: 3.7.0
>
>
> In the {{assign()}} method implementation, both {{KafkaConsumer}} and 
> {{PrototypeAsyncConsumer}} commit offsets asynchronously. Is this 
> intentional? [~junrao] asks in a [recent PR 
> review|https://github.com/apache/kafka/pull/14406/files/193af8230d0c61853d764cbbe29bca2fc6361af9#r1349023459]:
> {quote}Do we guarantee that the new owner of the unsubscribed partitions 
> could pick up the latest committed offset?
> {quote}
> Let's confirm whether the asynchronous approach is acceptable and correct. If 
> it is, great, let's enhance the documentation to briefly explain why. If it 
> is not, let's correct the behavior if it's within the API semantic 
> expectations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15631) Do not send new heartbeat request while another one in-flight

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15631:
--
Fix Version/s: 3.7.0

> Do not send new heartbeat request while another one in-flight
> -
>
> Key: KAFKA-15631
> URL: https://issues.apache.org/jira/browse/KAFKA-15631
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Philip Nee
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> Client consumer should not send a new heartbeat request while there is a 
> previous in-flight. If a HB is in-flight, we should wait for a response or 
> timeout before sending a next one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16261) MembershipManagerImpl.updateSubscription fails if already empty subscription

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16261:
--
Component/s: consumer

> MembershipManagerImpl.updateSubscription fails if already empty subscription
> 
>
> Key: KAFKA-16261
> URL: https://issues.apache.org/jira/browse/KAFKA-16261
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Andrew Schofield
>Assignee: Lianet Magrans
>Priority: Minor
>  Labels: consumer-threading-refactor, kip-848-client-support
> Fix For: 3.8.0
>
>
> The internal SubscriptionState object keeps track of whether the assignment 
> is user-assigned, or auto-assigned. If there are no assigned partitions, the 
> assignment resets to NONE. If you call SubscriptionState.assignFromSubscribed 
> in this state it fails.
> The easiest thing is perhaps to check 
> SubscriptionState.hasAutoAssignedPartitions() to make sure that 
> assignFromSubscribed is going to be permitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15543) Send HB request right after reconciliation completes

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15543:
--
Fix Version/s: 3.7.0

> Send HB request right after reconciliation completes
> 
>
> Key: KAFKA-15543
> URL: https://issues.apache.org/jira/browse/KAFKA-15543
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Priority: Blocker
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> HeartbeatRequest manager should send HB request outside of the interval, 
> right after the reconciliation process completes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15573) Implement auto-commit on partition assignment revocation

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15573:
--
Fix Version/s: 3.7.0

> Implement auto-commit on partition assignment revocation
> 
>
> Key: KAFKA-15573
> URL: https://issues.apache.org/jira/browse/KAFKA-15573
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lianet Magrans
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> When the group member's assignment changes and partitions are revoked and 
> auto-commit is enabled, we need to ensure that the commit request manager is 
> invoked to queue up the commits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15592) Member does not need to always try to join a group when a groupId is configured

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15592:
--
Fix Version/s: 3.7.0

> Member does not need to always try to join a group when a groupId is 
> configured
> ---
>
> Key: KAFKA-15592
> URL: https://issues.apache.org/jira/browse/KAFKA-15592
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> Currently, instantiating a membershipManager means the member will always 
> seek to join a group unless it has failed fatally.  However, this is not 
> always the case because the member should be able to join and leave a group 
> any time during its life cycle. Maybe we should include an "inactive" state 
> in the state machine indicating the member does not want to be in a rebalance 
> group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16284) Performance regression in RocksDB

2024-02-20 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-16284:

Affects Version/s: 3.8.0

> Performance regression in RocksDB
> -
>
> Key: KAFKA-16284
> URL: https://issues.apache.org/jira/browse/KAFKA-16284
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Affects Versions: 3.8.0
>Reporter: Lucas Brutschy
>Assignee: Lucas Brutschy
>Priority: Major
>
> In benchmarks, we are noticing a performance regression in the performance of 
> `RocksDBStore`.
> The regression happens between those two commits:
>  
> {code:java}
> trunk - 70c8b8d0af - regressed - 2024-01-06T14:00:20Z
> trunk - d5aa341a18 - not regressed - 2023-12-31T11:47:14Z
> {code}
> The regression can be reproduced by the following test:
>  
> {code:java}
> package org.apache.kafka.streams.state.internals;
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.common.utils.Bytes;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.processor.StateStoreContext;
> import org.apache.kafka.test.InternalMockProcessorContext;
> import org.apache.kafka.test.MockRocksDbConfigSetter;
> import org.apache.kafka.test.StreamsTestUtils;
> import org.apache.kafka.test.TestUtils;
> import org.junit.Before;
> import org.junit.Test;
> import java.io.File;
> import java.nio.ByteBuffer;
> import java.util.Properties;
> public class RocksDBStorePerfTest {
> InternalMockProcessorContext context;
> RocksDBStore rocksDBStore;
> final static String DB_NAME = "db-name";
> final static String METRICS_SCOPE = "metrics-scope";
> RocksDBStore getRocksDBStore() {
> return new RocksDBStore(DB_NAME, METRICS_SCOPE);
> }
> @Before
> public void setUp() {
> final Properties props = StreamsTestUtils.getStreamsConfig();
> props.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, 
> MockRocksDbConfigSetter.class);
> File dir = TestUtils.tempDirectory();
> context = new InternalMockProcessorContext<>(
> dir,
> Serdes.String(),
> Serdes.String(),
> new StreamsConfig(props)
> );
> }
> @Test
> public void testPerf() {
> long start = System.currentTimeMillis();
> for (int i = 0; i < 10; i++) {
> System.out.println("Iteration: "+i+" Time: " + 
> (System.currentTimeMillis() - start));
> RocksDBStore rocksDBStore = getRocksDBStore();
> rocksDBStore.init((StateStoreContext) context, rocksDBStore);
> for (int j = 0; j < 100; j++) {
> rocksDBStore.put(new 
> Bytes(ByteBuffer.allocate(4).putInt(j).array()), "perf".getBytes());
> }
> rocksDBStore.close();
> }
> long end = System.currentTimeMillis();
> System.out.println("Time: " + (end - start));
> }
> }
>  {code}
>  
> I have isolated the regression to commit 
> [5bc3aa4|https://github.com/apache/kafka/commit/5bc3aa428067dff1f2b9075ff5d1351fb05d4b10].
>  On my machine, the test takes ~8 seconds before 
> [5bc3aa4|https://github.com/apache/kafka/commit/5bc3aa428067dff1f2b9075ff5d1351fb05d4b10]
>  and ~30 seconds after 
> [5bc3aa4|https://github.com/apache/kafka/commit/5bc3aa428067dff1f2b9075ff5d1351fb05d4b10].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15540) Handle heartbeat and revocation when consumer leaves group

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15540:
--
Fix Version/s: 3.7.0

> Handle heartbeat and revocation when consumer leaves group
> --
>
> Key: KAFKA-15540
> URL: https://issues.apache.org/jira/browse/KAFKA-15540
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> When a consumer intentionally leaves a group we should:
>  * release assignment (revoke partitions)
>  * send a last Heartbeat request with epoch -1 (or -2 if static member)
> Note that the revocation involves stop fetching, committing offsets if 
> auto-commit enabled and invoking the onPartitionsRevoked callback.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15539) Client should stop fetching while partitions being revoked

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15539:
--
Fix Version/s: 3.7.0

> Client should stop fetching while partitions being revoked
> --
>
> Key: KAFKA-15539
> URL: https://issues.apache.org/jira/browse/KAFKA-15539
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-preview
> Fix For: 3.7.0
>
>
> When partitions are being revoked (client received revocation on heartbeat 
> and is in the process of invoking the callback), we need to make sure we do 
> not fetch from those partitions anymore:
>  * no new fetches should be sent out for the partitions being revoked
>  * no fetch responses should be handled for those partitions (case where a 
> fetch was already in-flight when the partition revocation started.
> This does not seem to be handled in the current KafkaConsumer and the old 
> consumer protocol (only for the EAGER protocol). 
> Consider re-using the existing pendingRevocation logic that already exist in 
> the subscriptionState & used from the fetcher to determine if a partition is 
> fetchable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15515) Remove duplicated integration tests for new consumer

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15515:
--
Fix Version/s: 3.7.0

> Remove duplicated integration tests for new consumer
> 
>
> Key: KAFKA-15515
> URL: https://issues.apache.org/jira/browse/KAFKA-15515
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Priority: Major
>  Labels: consumer-threading-refactor, integration-tests
> Fix For: 3.7.0
>
>
> This task involves removing the temporary `PlaintextAsyncConsumer` file 
> containing duplicated integration tests for the new consumer. The copy was 
> generated to catch regressions and validate functionality in the new consumer 
> while in development. It should be deleted when the new consumer is fully 
> implemented and the existing integration tests (`PlaintextConsumerTest`) can 
> be executed for both implementations.
>  
> Context:
>  
> For the current KafkaConsumer, a set of integration tests exist in the file 
> PlaintextConsumerTest. Those tests cannot be executed as such for the new 
> consumer implementation for 2 main reasons
> - the new consumer is being developed as a new PrototypeAsyncConsumer class, 
> in parallel to the existing KafkaConsumer. 
> - the new consumer is under development, so it does not support all the 
> consumer functionality yet. 
>  
> In order to be able to run the subsets of tests that the new consumer 
> supports while the implementation completes, it was decided to :  
>  - to make a copy of the `PlaintextAsyncConsumer` class, named 
> PlaintextAsyncConsumer.
> - leave all the existing integration tests that cover the simple consumer 
> case unchanged, and disable the tests that are not yet supported by the new 
> consumer. Disabled tests will be enabled as the async consumer
> evolves.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] KAFKA-16286; Notify listener of latest leader and epoch [kafka]

2024-02-20 Thread via GitHub


jsancio commented on PR #15397:
URL: https://github.com/apache/kafka/pull/15397#issuecomment-1955046620

   > Is there any chance we could fire leader change more than once on a 
transition? (I guess this might be allowable behavior?)
   
   Thanks. I updated the KRaft tests to check that there no duplicate 
notifications.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (KAFKA-16224) Fix handling of deleted topic when auto-committing before revocation

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16224:
--
Fix Version/s: 3.8.0

> Fix handling of deleted topic when auto-committing before revocation
> 
>
> Key: KAFKA-16224
> URL: https://issues.apache.org/jira/browse/KAFKA-16224
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Major
>  Labels: client-transitions-issues, kip-848-client-support
> Fix For: 3.8.0
>
>
> Current logic for auto-committing offsets when partitions are revoked will 
> retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the 
> member not completing the revocation in time. We should consider this as an 
> indication of the topic being deleted, and in the context of committing 
> offsets to revoke partitions, we should abort the commit attempt and move on 
> to complete and ack the revocation (effectively considering 
> UnknownTopicOrPartitionException as non-retriable in this context) 
> Note that legacy coordinator behaviour around this seems to be the same as 
> the new consumer currently has.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16251) Fenced member should not send heartbeats while waiting for onPartitionsLost to complete

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16251:
--
Fix Version/s: 3.8.0

> Fenced member should not send heartbeats while waiting for onPartitionsLost 
> to complete
> ---
>
> Key: KAFKA-16251
> URL: https://issues.apache.org/jira/browse/KAFKA-16251
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support
> Fix For: 3.8.0
>
>
> When a member gets fenced, it transitions to FENCED state and triggers the 
> onPartitionsLost callback to release it assignment. Members should stop 
> sending heartbeats while FENCED, and resume sending it only after completing 
> the callback, when it transitions to JOINING.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16199) Prune the event queue if event timeout expired before starting

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16199.
---
Resolution: Duplicate

> Prune the event queue if event timeout expired before starting
> --
>
> Key: KAFKA-16199
> URL: https://issues.apache.org/jira/browse/KAFKA-16199
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16200.
---
Resolution: Duplicate

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16019) Some of the tests in PlaintextConsumer can't seem to deterministically invoke and verify the consumer callback

2024-02-20 Thread Kirk True (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818983#comment-17818983
 ] 

Kirk True commented on KAFKA-16019:
---

{{testPerPartitionLeadMetricsCleanUpWithSubscribe}} is now passing 
consistently, so marking this as fixed.

> Some of the tests in PlaintextConsumer can't seem to deterministically invoke 
> and verify the consumer callback
> --
>
> Key: KAFKA-16019
> URL: https://issues.apache.org/jira/browse/KAFKA-16019
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> I was running the PlaintextConsumer to test the async consumer; however, a 
> few tests were failing with not being able to verify the listener is invoked 
> correctly
> For example `testPerPartitionLeadMetricsCleanUpWithSubscribe`
> Around 50% of the time, the listener's callsToAssigned was never incremented 
> correctly.  Event changing it to awaitUntilTrue it was still the same case
> {code:java}
> consumer.subscribe(List(topic, topic2).asJava, listener)
> val records = awaitNonEmptyRecords(consumer, tp)
> assertEquals(1, listener.callsToAssigned, "should be assigned once") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16019) Some of the tests in PlaintextConsumer can't seem to deterministically invoke and verify the consumer callback

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16019.
---
Resolution: Fixed

> Some of the tests in PlaintextConsumer can't seem to deterministically invoke 
> and verify the consumer callback
> --
>
> Key: KAFKA-16019
> URL: https://issues.apache.org/jira/browse/KAFKA-16019
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> I was running the PlaintextConsumer to test the async consumer; however, a 
> few tests were failing with not being able to verify the listener is invoked 
> correctly
> For example `testPerPartitionLeadMetricsCleanUpWithSubscribe`
> Around 50% of the time, the listener's callsToAssigned was never incremented 
> correctly.  Event changing it to awaitUntilTrue it was still the same case
> {code:java}
> consumer.subscribe(List(topic, topic2).asJava, listener)
> val records = awaitNonEmptyRecords(consumer, tp)
> assertEquals(1, listener.callsToAssigned, "should be assigned once") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16023) PlaintextConsumerTest needs to wait for reconciliation to complete before proceeding

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16023.
---
Resolution: Fixed

{{testPerPartitionLagMetricsCleanUpWithSubscribe}} is now passing consistently, 
so marking this as fixed.

> PlaintextConsumerTest needs to wait for reconciliation to complete before 
> proceeding
> 
>
> Key: KAFKA-16023
> URL: https://issues.apache.org/jira/browse/KAFKA-16023
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> Several tests in PlaintextConsumerTest.scala (such as 
> testPerPartitionLagMetricsCleanUpWithSubscribe) uses:
> assertEquals(1, listener.callsToAssigned, "should be assigned once")
> However, as the timing for reconciliation completion is not deterministic due 
> to asynchronous processing. We actually need to wait until the condition to 
> happen.
> However, another issue is the timeout - some of these tasks might not 
> complete within the 600ms timeout, so the tests are deemed to be flaky.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15993) Enable max poll integration tests that depend on callback invocation

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15993.
---
Resolution: Duplicate

> Enable max poll integration tests that depend on callback invocation
> 
>
> Key: KAFKA-15993
> URL: https://issues.apache.org/jira/browse/KAFKA-15993
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> We will enable integration tests using the async consumer in KAFKA-15971.  
> However, we should also enable tests that rely on rebalance listeners after 
> KAFKA-15628 is closed.  One example would be testMaxPollIntervalMs, that I 
> relies on the listener to verify the correctness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16010) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16010:
--
Issue Type: Bug  (was: Test)

> Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling
> --
>
> Key: KAFKA-16010
> URL: https://issues.apache.org/jira/browse/KAFKA-16010
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> The integration test 
> {{PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling}} is 
> failing when using the {{AsyncKafkaConsumer}}.
> The error is:
> {code}
> org.opentest4j.AssertionFailedError: Did not get valid assignment for 
> partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-5, topic1-1, 
> topic1-0, topic1-3] after one consumer left
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
>   at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
>   at 
> kafka.api.AbstractConsumerTest.validateGroupAssignment(AbstractConsumerTest.scala:286)
>   at 
> kafka.api.PlaintextConsumerTest.runMultiConsumerSessionTimeoutTest(PlaintextConsumerTest.scala:1883)
>   at 
> kafka.api.PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling(PlaintextConsumerTest.scala:1281)
> {code}
> The logs include these lines:
>  
> {code}
> [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
> groupId=my-test] consumer poll timeout has expired. This means the time 
> between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time processing messages. You can address this either by increasing 
> max.poll.interval.ms or by reducing the maximum size of batches returned in 
> poll() with max.poll.records. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
> {code} 
> I don't know if that's related or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15993) Enable max poll integration tests that depend on callback invocation

2024-02-20 Thread Kirk True (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818981#comment-17818981
 ] 

Kirk True commented on KAFKA-15993:
---

Closing this as it will be covered when the following Jiras are resolved:
 * KAFKA-16008

 * KAFKA-16010

 * KAFKA-16152

 * KAFKA-16167

> Enable max poll integration tests that depend on callback invocation
> 
>
> Key: KAFKA-15993
> URL: https://issues.apache.org/jira/browse/KAFKA-15993
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> We will enable integration tests using the async consumer in KAFKA-15971.  
> However, we should also enable tests that rely on rebalance listeners after 
> KAFKA-15628 is closed.  One example would be testMaxPollIntervalMs, that I 
> relies on the listener to verify the correctness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16200:
--
Priority: Blocker  (was: Critical)

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] KAFKA-15770: IQv2 must return immutable position [kafka]

2024-02-20 Thread via GitHub


mjsax merged PR #15219:
URL: https://github.com/apache/kafka/pull/15219


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (KAFKA-15974) Enforce that event processing respects user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15974:
--
Description: 
The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
block waiting for the event to complete. The application thread will block for 
the timeout, but there is not yet a consistent manner in which events are timed 
out.

Enforce at the event handler/event processing layer that timeouts are respected 
per the design in KAFKA-15848.

  was:The intention of the {{CompletableApplicationEvent}} is for a 
{{Consumer}} to block waiting for the event to complete. The application thread 
will block for the timeout, but there is not yet a consistent manner in which 
events are timed out.


> Enforce that event processing respects user-provided timeout
> 
>
> Key: KAFKA-15974
> URL: https://issues.apache.org/jira/browse/KAFKA-15974
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the event handler/event processing layer that timeouts are 
> respected per the design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16200:
--
Description: 
The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
block waiting for the event to complete. The application thread will block for 
the timeout, but there is not yet a consistent manner in which events are timed 
out.

Enforce at the request manager layer that timeouts are respected per the design 
in KAFKA-15848.

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16200:
--
Summary: Enforce that RequestManager implementations respect user-provided 
timeout  (was: Ensure RequestManager handling of expired timeouts are 
consistent)

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15974) Enforce that event processing respects user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15974:
--
Summary: Enforce that event processing respects user-provided timeout  
(was: Enforce that events and requests respect user-provided timeout)

> Enforce that event processing respects user-provided timeout
> 
>
> Key: KAFKA-15974
> URL: https://issues.apache.org/jira/browse/KAFKA-15974
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16190) Member should send full heartbeat when rejoining

2024-02-20 Thread Quoc Phong Dang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818979#comment-17818979
 ] 

Quoc Phong Dang commented on KAFKA-16190:
-

[~kirktrue] Thank you and sorry for the dely, It takes my sometime to look into 
the KIP and trying to navigate the code. I'm not so sure the file I'm trying to 
change is the correct one, can you point me out the location need to be done it 
would be helpful.

> Member should send full heartbeat when rejoining
> 
>
> Key: KAFKA-16190
> URL: https://issues.apache.org/jira/browse/KAFKA-16190
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Quoc Phong Dang
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support, newbie
> Fix For: 3.8.0
>
>
> The heartbeat request builder should make sure that all fields are sent in 
> the heartbeat request when the consumer rejoins (currently the 
> HeartbeatRequestManager request builder is reset on failure scenarios, which 
> should cover the fence+rejoin sequence). 
> Note that the existing HeartbeatRequestManagerTest.testHeartbeatState misses 
> this exact case given that it does explicitly change the subscription when it 
> gets fenced. We should ensure we test a consumer that keeps it same initial 
> subscription when it rejoins after being fenced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15848) Design solution for inconsistency between ConsumerDelegate timeout policies

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15848:
--
Summary: Design solution for inconsistency between ConsumerDelegate timeout 
policies  (was: Consumer API timeout inconsistent between ConsumerDelegate 
implementations)

> Design solution for inconsistency between ConsumerDelegate timeout policies
> ---
>
> Key: KAFKA-15848
> URL: https://issues.apache.org/jira/browse/KAFKA-15848
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, documentation
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> The two {{ConsumerDelegate}} implementations ({{{}LegacyKafkaConsumer{}}} and 
> {{{}AsyncKafkaConsumer{}}}) have a fundamental difference related to their 
> use and interpretation of the {{Timer}} that is supplied.
> h3. tl;dr
> {{AsyncKafkaConsumer}} is very literal about the timeout, whereas 
> {{LegacyKafkaConsumer}} seems to give a little wiggle room.
> {{LegacyKafkaConsumer}} is structured so that the logic it uses can check for 
> success of its operations _before_ checking the timer:
>  # Submit operation asynchronously
>  # Wait for operation to complete using {{NetworkClient.poll()}}
>  # Check for result
>  ## If successful, return success
>  ## If fatal failure, return failure
>  # Check timer
>  ## If timer expired, return failure
> {{AsyncKafkaConsumer}} uses {{Future.get()}} to wait for its operations:
>  # Submit operation asynchronously
>  # Wait for operation to complete using {{Future.get()}}
>  ## If operation timed out, {{Future.get()}} will throw a timeout error
>  # Check for result
>  ## If successful, return success
>  ## Otherwise, return failure
> h3. How to reproduce
> This causes subtle timing issues, but they can be easily reproduced via any 
> of the {{KafkaConsumerTest}} unit tests that invoke the {{consumer.poll(0)}} 
> API. Here's a bit of code that illustrates the difference between the two 
> approaches.
> {{LegacyKafkaConsumer}} performs a lot of its network I/O operations in a 
> manner similar to this:
> {code:java}
> public int getCount(Timer timer) {
> do {
> final RequestFuture future = sendSomeRequest(partitions);
> client.poll(future, timer);
> if (future.isDone())
> return future.get();
> } while (timer.notExpired());
> return -1;
> }
> {code}
> {{AsyncKafkaConsumer}} has similar logic, but it is structured like this:
> {code:java}
> private int getCount(Timer timer) {
> try {
> CompletableFuture future = new CompleteableFuture<>();
> applicationEventQueue.add(future);
> return future.get(timer.remainingMs(), TimeUnit.MILLISECONDS);
> } catch (TimeoutException e) {
> return -1;
> }
> }
> {code}
> The call to {{add}} enqueues the network operation, but it then _immediately_ 
> invokes {{Future.get()}} with the timeout to implement a time-bounded 
> blocking call. Since this method is being called with a timeout of 0, it 
> _immediately_ throws a {{{}TimeoutException{}}}. 
> h3. Suggested fix
> This task is to design and document the timeout policy for the new Consumer 
> implementation.
> The documentation lives here: 
> [https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16208) Design new Consumer timeout policy

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16208.
---
Resolution: Duplicate

> Design new Consumer timeout policy
> --
>
> Key: KAFKA-16208
> URL: https://issues.apache.org/jira/browse/KAFKA-16208
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer, documentation
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> This task is to design and document the timeout policy for the new Consumer 
> implementation.
> The documentation lives here: 
> https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15848) Consumer API timeout inconsistent between ConsumerDelegate implementations

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15848:
--
Description: 
The two {{ConsumerDelegate}} implementations ({{{}LegacyKafkaConsumer{}}} and 
{{{}AsyncKafkaConsumer{}}}) have a fundamental difference related to their use 
and interpretation of the {{Timer}} that is supplied.
h3. tl;dr

{{AsyncKafkaConsumer}} is very literal about the timeout, whereas 
{{LegacyKafkaConsumer}} seems to give a little wiggle room.

{{LegacyKafkaConsumer}} is structured so that the logic it uses can check for 
success of its operations _before_ checking the timer:
 # Submit operation asynchronously
 # Wait for operation to complete using {{NetworkClient.poll()}}
 # Check for result
 ## If successful, return success
 ## If fatal failure, return failure
 # Check timer
 ## If timer expired, return failure

{{AsyncKafkaConsumer}} uses {{Future.get()}} to wait for its operations:
 # Submit operation asynchronously
 # Wait for operation to complete using {{Future.get()}}
 ## If operation timed out, {{Future.get()}} will throw a timeout error
 # Check for result
 ## If successful, return success
 ## Otherwise, return failure

h3. How to reproduce

This causes subtle timing issues, but they can be easily reproduced via any of 
the {{KafkaConsumerTest}} unit tests that invoke the {{consumer.poll(0)}} API. 
Here's a bit of code that illustrates the difference between the two approaches.

{{LegacyKafkaConsumer}} performs a lot of its network I/O operations in a 
manner similar to this:
{code:java}
public int getCount(Timer timer) {
do {
final RequestFuture future = sendSomeRequest(partitions);
client.poll(future, timer);

if (future.isDone())
return future.get();
} while (timer.notExpired());

return -1;
}
{code}
{{AsyncKafkaConsumer}} has similar logic, but it is structured like this:
{code:java}
private int getCount(Timer timer) {
try {
CompletableFuture future = new CompleteableFuture<>();
applicationEventQueue.add(future);
return future.get(timer.remainingMs(), TimeUnit.MILLISECONDS);
} catch (TimeoutException e) {
return -1;
}
}
{code}
The call to {{add}} enqueues the network operation, but it then _immediately_ 
invokes {{Future.get()}} with the timeout to implement a time-bounded blocking 
call. Since this method is being called with a timeout of 0, it _immediately_ 
throws a {{{}TimeoutException{}}}. 
h3. Suggested fix

This task is to design and document the timeout policy for the new Consumer 
implementation.

The documentation lives here: 
[https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts]

  was:
The two {{ConsumerDelegate}} implementations ({{{}LegacyKafkaConsumer{}}} and 
{{{}AsyncKafkaConsumer{}}}) have a fundamental difference related to their use 
and interpretation of the {{Timer}} that is supplied.
h3. tl;dr

{{AsyncKafkaConsumer}} is very literal about the timeout, whereas 
{{LegacyKafkaConsumer}} seems to give a little wiggle room.

{{LegacyKafkaConsumer}} is structured so that the logic it uses can check for 
success of its operations _before_ checking the timer:
 # Submit operation asynchronously
 # Wait for operation to complete using {{NetworkClient.poll()}}
 # Check for result
 ## If successful, return success
 ## If fatal failure, return failure
 # Check timer
 ## If timer expired, return failure

{{AsyncKafkaConsumer}} uses {{Future.get()}} to wait for its operations:
 # Submit operation asynchronously
 # Wait for operation to complete using {{Future.get()}}
 ## If operation timed out, {{Future.get()}} will throw a timeout error
 # Check for result
 ## If successful, return success
 ## Otherwise, return failure

h3. How to reproduce

This causes subtle timing issues, but they can be easily reproduced via any of 
the {{KafkaConsumerTest}} unit tests that invoke the {{consumer.poll(0)}} API. 
Here's a bit of code that illustrates the difference between the two approaches.

{{LegacyKafkaConsumer}} performs a lot of its network I/O operations in a 
manner similar to this:
{code:java}
public int getCount(Timer timer) {
do {
final RequestFuture future = sendSomeRequest(partitions);
client.poll(future, timer);

if (future.isDone())
return future.get();
} while (timer.notExpired());

return -1;
}
{code}

{{AsyncKafkaConsumer}} has similar logic, but it is structured like this:

{code:java}
private int getCount(Timer timer) {
try {
CompletableFuture future = new CompleteableFuture<>();
applicationEventQueue.add(future);
return future.get(timer.remainingMs(), TimeUnit.MILLISECONDS);
} catch (TimeoutException e) {
return -1;
}
}
{code}
The call to {{add}} enqueues the network operation, but it then _immediately_ 
invokes {{Future.get()}} with the timeout to 

[jira] [Updated] (KAFKA-15848) Consumer API timeout inconsistent between ConsumerDelegate implementations

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-15848:
--
Component/s: documentation

> Consumer API timeout inconsistent between ConsumerDelegate implementations
> --
>
> Key: KAFKA-15848
> URL: https://issues.apache.org/jira/browse/KAFKA-15848
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, documentation
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> The two {{ConsumerDelegate}} implementations ({{{}LegacyKafkaConsumer{}}} and 
> {{{}AsyncKafkaConsumer{}}}) have a fundamental difference related to their 
> use and interpretation of the {{Timer}} that is supplied.
> h3. tl;dr
> {{AsyncKafkaConsumer}} is very literal about the timeout, whereas 
> {{LegacyKafkaConsumer}} seems to give a little wiggle room.
> {{LegacyKafkaConsumer}} is structured so that the logic it uses can check for 
> success of its operations _before_ checking the timer:
>  # Submit operation asynchronously
>  # Wait for operation to complete using {{NetworkClient.poll()}}
>  # Check for result
>  ## If successful, return success
>  ## If fatal failure, return failure
>  # Check timer
>  ## If timer expired, return failure
> {{AsyncKafkaConsumer}} uses {{Future.get()}} to wait for its operations:
>  # Submit operation asynchronously
>  # Wait for operation to complete using {{Future.get()}}
>  ## If operation timed out, {{Future.get()}} will throw a timeout error
>  # Check for result
>  ## If successful, return success
>  ## Otherwise, return failure
> h3. How to reproduce
> This causes subtle timing issues, but they can be easily reproduced via any 
> of the {{KafkaConsumerTest}} unit tests that invoke the {{consumer.poll(0)}} 
> API. Here's a bit of code that illustrates the difference between the two 
> approaches.
> {{LegacyKafkaConsumer}} performs a lot of its network I/O operations in a 
> manner similar to this:
> {code:java}
> public int getCount(Timer timer) {
> do {
> final RequestFuture future = sendSomeRequest(partitions);
> client.poll(future, timer);
> if (future.isDone())
> return future.get();
> } while (timer.notExpired());
> return -1;
> }
> {code}
> {{AsyncKafkaConsumer}} has similar logic, but it is structured like this:
> {code:java}
> private int getCount(Timer timer) {
> try {
> CompletableFuture future = new CompleteableFuture<>();
> applicationEventQueue.add(future);
> return future.get(timer.remainingMs(), TimeUnit.MILLISECONDS);
> } catch (TimeoutException e) {
> return -1;
> }
> }
> {code}
> The call to {{add}} enqueues the network operation, but it then _immediately_ 
> invokes {{Future.get()}} with the timeout to implement a time-bounded 
> blocking call. Since this method is being called with a timeout of 0, it 
> _immediately_ throws a {{{}TimeoutException{}}}. 
> h3. Suggested fix
> This task is to design and document the timeout policy for the new Consumer 
> implementation.
> The documentation lives here: 
> [https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16285) Make group metadata available when a new assignment is set in async Kafka consumer

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16285:
--
Fix Version/s: 3.8.0

> Make group metadata available when a new assignment is set in async Kafka 
> consumer
> --
>
> Key: KAFKA-16285
> URL: https://issues.apache.org/jira/browse/KAFKA-16285
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Critical
>  Labels: kip-848-client-support
> Fix For: 3.8.0
>
>
> Currently, the new async Kafka consumer sends an event from the background 
> thread to the application thread when the group metadata is updated. Group 
> metadata is updated when the background thread receives a new assignment. 
> More specifically, the member epoch is updated each time a new assignment is 
> received and and the member ID is updated with the first assignment. 
> In contrast to the group metadata update, the assignment is directly set in 
> the subscription without sending an update event from the background thread 
> to the application thread. That means that there is a delay between the 
> application thread being aware of the update to the assignment and the 
> application thread being aware of the update to the group metadata. This 
> behavior differs with respect to the legacy consumer were the assignment and 
> the group metadata is updated at the same time.
> We should make the update to the group metadata available to the application 
> thread when the update to the assignment is made available to the application 
> thread so that assignment an group metadata are in sync.
> For example, {{producer.sendOffsetsToTransaction(offsetsToCommit, 
> groupMetadata);}} benefits from this improvement because if the offsets to 
> commit are consistent with the current assignment also the group metadata 
> would be. Currently, that is not guaranteed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop

2024-02-20 Thread Kirk True (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818978#comment-17818978
 ] 

Kirk True commented on KAFKA-16160:
---

[~phuctran] same question on this one—do you need any more help from us on 
this? Thanks!

> AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
> --
>
> Key: KAFKA-16160
> URL: https://issues.apache.org/jira/browse/KAFKA-16160
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Observing some excessive logging running AsyncKafkaConsumer and observing 
> excessive logging of :
> {code:java}
> 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, 
> groupId=concurrent_consumer] Node is not ready, handle the request in the 
> next event loop: node=worker4:9092 (id: 2147483644 rack: null), 
> request=UnsentRequest{requestBuil     
> der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', 
> memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, 
> rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], 
> serverAssignor=null, topicP     
> artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), 
> handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b,
>  node=Optional[worker4:9092 (id: 2147483644 rack: null)]     , 
> timer=org.apache.kafka.common.utils.Timer@55ed4733} 
> (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code}
> This seems to be triggered by a tight poll loop of the network thread.  The 
> right thing to do is to backoff a bit for that given node and retry later.
> This should be a blocker for 3.8 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16160:
--
Priority: Major  (was: Critical)

> AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
> --
>
> Key: KAFKA-16160
> URL: https://issues.apache.org/jira/browse/KAFKA-16160
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Observing some excessive logging running AsyncKafkaConsumer and observing 
> excessive logging of :
> {code:java}
> 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, 
> groupId=concurrent_consumer] Node is not ready, handle the request in the 
> next event loop: node=worker4:9092 (id: 2147483644 rack: null), 
> request=UnsentRequest{requestBuil     
> der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', 
> memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, 
> rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], 
> serverAssignor=null, topicP     
> artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), 
> handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b,
>  node=Optional[worker4:9092 (id: 2147483644 rack: null)]     , 
> timer=org.apache.kafka.common.utils.Timer@55ed4733} 
> (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code}
> This seems to be triggered by a tight poll loop of the network thread.  The 
> right thing to do is to backoff a bit for that given node and retry later.
> This should be a blocker for 3.8 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15538) Client support for java regex based subscription

2024-02-20 Thread Kirk True (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818977#comment-17818977
 ] 

Kirk True commented on KAFKA-15538:
---

[~phuctran] thanks for your willingness to help out on this task! :)

We're working against a tight deadline to get these blocker/critical tasks done.

Is there any other information/help you need? Due you have a sense of when 
you'll have a pull request ready to review?

Thanks!

> Client support for java regex based subscription
> 
>
> Key: KAFKA-15538
> URL: https://issues.apache.org/jira/browse/KAFKA-15538
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Phuc Hong Tran
>Priority: Blocker
>  Labels: kip-848-client-support, newbie, regex
> Fix For: 3.8.0
>
>
> When using subscribe with a java regex (Pattern), we need to resolve it on 
> the client side to send the broker a list of topic names to subscribe to.
> Context:
> The new consumer group protocol uses [Google 
> RE2/J|https://github.com/google/re2j] for regular expressions and introduces 
> new methods in the consumer API to subscribe using a `SubscribePattern`. The 
> subscribe using a java `Pattern` will be still supported for a while but 
> eventually removed.
>  * When the subscribe with SubscriptionPattern is used, the client should 
> just send the regex to the broker and it will be resolved on the server side.
>  * In the case of the subscribe with Pattern, the regex should be resolved on 
> the client side.
> As part of this task, we should re-enable all integration tests defined in 
> the PlainTextAsyncConsumer that relate to subscription with pattern and that 
> are currently disabled for the new consumer + new protocol



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16288) Values.convertToDecimal throws ClassCastExceptions on String inputs

2024-02-20 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16288:
---

 Summary: Values.convertToDecimal throws ClassCastExceptions on 
String inputs
 Key: KAFKA-16288
 URL: https://issues.apache.org/jira/browse/KAFKA-16288
 Project: Kafka
  Issue Type: Bug
  Components: connect
Affects Versions: 1.1.0
Reporter: Greg Harris
Assignee: Greg Harris


The convertToDecimal function does a best-effort conversion of an arbitrary 
Object to a BigDecimal. Generally when a conversion cannot take place (such as 
when an unknown subclass is passed-in) the function throws a DataException. 
However, specifically for String inputs with valid number within, a 
ClassCastException is thrown.

This is because there is an extra "doubleValue" call in the implementation: 
[https://github.com/apache/kafka/blob/ead2431c37ace9255df88ffe819bb905311af088/connect/api/src/main/java/org/apache/kafka/connect/data/Values.java#L427]
 which immediately causes a ClassCastException in the caller: 
[https://github.com/apache/kafka/blob/ead2431c37ace9255df88ffe819bb905311af088/connect/api/src/main/java/org/apache/kafka/connect/data/Values.java#L305]
 

This appears accidental, because the case for String is explicitly handled, it 
just behaves poorly. Instead of the ClassCastException, the number should be 
parsed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16190) Member should send full heartbeat when rejoining

2024-02-20 Thread Kirk True (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818976#comment-17818976
 ] 

Kirk True commented on KAFKA-16190:
---

[~phong260702] thanks for your willingness to help out on this task! :)

We're working against a tight deadline to get these blocker/critical tasks done.

Is there any other information/help you need? Due you have a sense of when 
you'll have a pull request ready to review?

Thanks!

> Member should send full heartbeat when rejoining
> 
>
> Key: KAFKA-16190
> URL: https://issues.apache.org/jira/browse/KAFKA-16190
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Quoc Phong Dang
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support, newbie
> Fix For: 3.8.0
>
>
> The heartbeat request builder should make sure that all fields are sent in 
> the heartbeat request when the consumer rejoins (currently the 
> HeartbeatRequestManager request builder is reset on failure scenarios, which 
> should cover the fence+rejoin sequence). 
> Note that the existing HeartbeatRequestManagerTest.testHeartbeatState misses 
> this exact case given that it does explicitly change the subscription when it 
> gets fenced. We should ensure we test a consumer that keeps it same initial 
> subscription when it rejoins after being fenced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16227) Console consumer fails with `IllegalStateException`

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned KAFKA-16227:
-

Assignee: Philip Nee

> Console consumer fails with `IllegalStateException`
> ---
>
> Key: KAFKA-16227
> URL: https://issues.apache.org/jira/browse/KAFKA-16227
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: David Jacot
>Assignee: Philip Nee
>Priority: Critical
>  Labels: kip-848-client-support
> Fix For: 3.8.0
>
>
> I have seen a few occurrences like the following one. There is a race between 
> the background thread and the foreground thread. I imagine the following 
> steps:
>  * quickstart-events-2 is assigned by the background thread;
>  * the foreground thread starts the initialization of the partition (e.g. 
> reset offset);
>  * quickstart-events-2 is removed by the background thread;
>  * the initialization completes and quickstart-events-2 does not exist 
> anymore.
>  
> {code:java}
> [2024-02-06 16:21:57,375] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> java.lang.IllegalStateException: No current assignment for partition 
> quickstart-events-2
>   at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367)
>   at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579)
>   at 
> org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283)
>   at 
> org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226)
>   at 
> org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110)
>   at 
> org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540)
>   at 
> org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525)
>   at 
> org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874)
>   at 
> kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473)
>   at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103)
>   at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77)
>   at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
>   at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16199) Prune the event queue if event timeout expired before starting

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16199:
--
Priority: Major  (was: Critical)

> Prune the event queue if event timeout expired before starting
> --
>
> Key: KAFKA-16199
> URL: https://issues.apache.org/jira/browse/KAFKA-16199
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] KAFKA-16278 Missing license for scala related dependencies [kafka]

2024-02-20 Thread via GitHub


anton-liauchuk commented on PR #15398:
URL: https://github.com/apache/kafka/pull/15398#issuecomment-1954961890

   hello @divijvaidya 
   
   Please take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (KAFKA-16111) Implement tests for tricky rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16111:
--
Affects Version/s: 3.7.0

> Implement tests for tricky rebalance callback scenarios
> ---
>
> Key: KAFKA-16111
> URL: https://issues.apache.org/jira/browse/KAFKA-16111
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Major
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> There is justified concern that the new threading model may not play well 
> with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide 
> some assurance that it will support complicated patterns.
>  # Design and implement test scenarios
>  # Update and document any design changes with the callback sub-system where 
> needed
>  # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
> said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16287) Implement example test for common rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16287:
--
Affects Version/s: 3.7.0

> Implement example test for common rebalance callback scenarios
> --
>
> Key: KAFKA-16287
> URL: https://issues.apache.org/jira/browse/KAFKA-16287
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Blocker
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> We need to add an example test to the {{PlaintextConsumerTest}} that tests a 
> common {{ConsumerRebalanceListener}} use case. For example, create an 
> integration test that invokes the Consumer API to commit offsets in the 
> {{onPartitionsRevoked}} callback.
> Please develop this test in a reasonably general way with a view to using it 
> as a template from which other tests can be created later. Eventually we will 
> need to have a comprehensive set of tests that cover all the basic use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16111) Implement tests for tricky rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16111:
--
Priority: Major  (was: Blocker)

> Implement tests for tricky rebalance callback scenarios
> ---
>
> Key: KAFKA-16111
> URL: https://issues.apache.org/jira/browse/KAFKA-16111
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Major
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> There is justified concern that the new threading model may not play well 
> with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide 
> some assurance that it will support complicated patterns.
>  # Design and implement test scenarios
>  # Update and document any design changes with the callback sub-system where 
> needed
>  # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
> said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16287) Implement example test for common rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16287:
--
Description: 
We need to add an example test to the {{PlaintextConsumerTest}} that tests a 
common {{ConsumerRebalanceListener}} use case. For example, create an 
integration test that invokes the Consumer API to commit offsets in the 
{{onPartitionsRevoked}} callback.

Please develop this test in a reasonably general way with a view to using it as 
a template from which other tests can be created later. Eventually we will need 
to have a comprehensive set of tests that cover all the basic use cases.

  was:
There is justified concern that the new threading model may not play well with 
"tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide some 
assurance that it will support complicated patterns.
 # Design and implement test scenarios
 # Update and document any design changes with the callback sub-system where 
needed
 # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
said design


> Implement example test for common rebalance callback scenarios
> --
>
> Key: KAFKA-16287
> URL: https://issues.apache.org/jira/browse/KAFKA-16287
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Blocker
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> We need to add an example test to the {{PlaintextConsumerTest}} that tests a 
> common {{ConsumerRebalanceListener}} use case. For example, create an 
> integration test that invokes the Consumer API to commit offsets in the 
> {{onPartitionsRevoked}} callback.
> Please develop this test in a reasonably general way with a view to using it 
> as a template from which other tests can be created later. Eventually we will 
> need to have a comprehensive set of tests that cover all the basic use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] KAFKA-16278 Missing license for scala related dependencies [kafka]

2024-02-20 Thread via GitHub


anton-liauchuk opened a new pull request, #15398:
URL: https://github.com/apache/kafka/pull/15398

   KAFKA-16278 Missing license for scala related dependencies
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-16287) Implement example test for common rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)
Kirk True created KAFKA-16287:
-

 Summary: Implement example test for common rebalance callback 
scenarios
 Key: KAFKA-16287
 URL: https://issues.apache.org/jira/browse/KAFKA-16287
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer
Reporter: Kirk True
Assignee: Lucas Brutschy
 Fix For: 3.8.0


There is justified concern that the new threading model may not play well with 
"tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide some 
assurance that it will support complicated patterns.
 # Design and implement test scenarios
 # Update and document any design changes with the callback sub-system where 
needed
 # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16111) Implement tests for tricky rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16111:
--
Summary: Implement tests for tricky rebalance callback scenarios  (was: 
Implement example test for tricky rebalance callback scenarios)

> Implement tests for tricky rebalance callback scenarios
> ---
>
> Key: KAFKA-16111
> URL: https://issues.apache.org/jira/browse/KAFKA-16111
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Blocker
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> There is justified concern that the new threading model may not play well 
> with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide 
> some assurance that it will support complicated patterns.
>  # Design and implement test scenarios
>  # Update and document any design changes with the callback sub-system where 
> needed
>  # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
> said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16111) Implement example test for tricky rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16111:
--
Summary: Implement example test for tricky rebalance callback scenarios  
(was: Implement tests for tricky rebalance callback scenarios)

> Implement example test for tricky rebalance callback scenarios
> --
>
> Key: KAFKA-16111
> URL: https://issues.apache.org/jira/browse/KAFKA-16111
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Blocker
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> There is justified concern that the new threading model may not play well 
> with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide 
> some assurance that it will support complicated patterns.
>  # Design and implement test scenarios
>  # Update and document any design changes with the callback sub-system where 
> needed
>  # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
> said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16111) Implement tests for tricky rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16111:
--
Summary: Implement tests for tricky rebalance callback scenarios  (was: 
Implement tests for rebalance callback scenarios)

> Implement tests for tricky rebalance callback scenarios
> ---
>
> Key: KAFKA-16111
> URL: https://issues.apache.org/jira/browse/KAFKA-16111
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Blocker
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> There is justified concern that the new threading model may not play well 
> with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide 
> some assurance that it will support complicated patterns.
>  # Design and implement test scenarios
>  # Update and document any design changes with the callback sub-system where 
> needed
>  # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
> said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16111) Implement tests for rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16111:
--
Summary: Implement tests for rebalance callback scenarios  (was: Implement 
tests for tricky rebalance callback scenarios)

> Implement tests for rebalance callback scenarios
> 
>
> Key: KAFKA-16111
> URL: https://issues.apache.org/jira/browse/KAFKA-16111
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lucas Brutschy
>Priority: Blocker
>  Labels: callback, consumer-threading-refactor, integration-tests
> Fix For: 3.8.0
>
>
> There is justified concern that the new threading model may not play well 
> with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide 
> some assurance that it will support complicated patterns.
>  # Design and implement test scenarios
>  # Update and document any design changes with the callback sub-system where 
> needed
>  # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
> said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16285) Make group metadata available when a new assignment is set in async Kafka consumer

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16285:
--
Priority: Critical  (was: Major)

> Make group metadata available when a new assignment is set in async Kafka 
> consumer
> --
>
> Key: KAFKA-16285
> URL: https://issues.apache.org/jira/browse/KAFKA-16285
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Critical
>  Labels: kip-848-client-support
>
> Currently, the new async Kafka consumer sends an event from the background 
> thread to the application thread when the group metadata is updated. Group 
> metadata is updated when the background thread receives a new assignment. 
> More specifically, the member epoch is updated each time a new assignment is 
> received and and the member ID is updated with the first assignment. 
> In contrast to the group metadata update, the assignment is directly set in 
> the subscription without sending an update event from the background thread 
> to the application thread. That means that there is a delay between the 
> application thread being aware of the update to the assignment and the 
> application thread being aware of the update to the group metadata. This 
> behavior differs with respect to the legacy consumer were the assignment and 
> the group metadata is updated at the same time.
> We should make the update to the group metadata available to the application 
> thread when the update to the assignment is made available to the application 
> thread so that assignment an group metadata are in sync.
> For example, {{producer.sendOffsetsToTransaction(offsetsToCommit, 
> groupMetadata);}} benefits from this improvement because if the offsets to 
> commit are consistent with the current assignment also the group metadata 
> would be. Currently, that is not guaranteed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16258) Stale member should trigger onPartitionsLost when leaving group

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16258:
--
Priority: Critical  (was: Major)

> Stale member should trigger onPartitionsLost when leaving group
> ---
>
> Key: KAFKA-16258
> URL: https://issues.apache.org/jira/browse/KAFKA-16258
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Critical
>  Labels: client-transitions-issues, kip-848-client-support
> Fix For: 3.8.0
>
>
> When the poll timer expires, the new consumer proactively leaves the group 
> and clears its assignments, but it should also invoke the onPartitionsLost 
> callback. The legacy coordinator does the following sequence on poll timer 
> expiration: send leave group request 
> ([here|https://github.com/apache/kafka/blob/e8c70fce26626ed2ab90f2728a45f6e55e907ec1/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L1517]),
>  invoke onPartitionsLost, and when it completes it clears the assignment 
> (onJoinPrepare 
> [here|https://github.com/apache/kafka/blob/e8c70fce26626ed2ab90f2728a45f6e55e907ec1/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L779]).
> This issue is most probably what is causing the failures in the integration 
> tests that fail expecting callbacks when the poll interval expires (like 
> https://issues.apache.org/jira/browse/KAFKA-16008)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >