[jira] [Commented] (KAFKA-13051) Require Principal Serde to be defined for 3.0

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377792#comment-17377792
 ] 

Konstantine Karantasis commented on KAFKA-13051:


[~rdielhenn] judging from the title I'm targeting 3.0.0 for this issue. Is it 
also a release blocker (priority)?

> Require Principal Serde to be defined for 3.0
> -
>
> Key: KAFKA-13051
> URL: https://issues.apache.org/jira/browse/KAFKA-13051
> Project: Kafka
>  Issue Type: Task
>Reporter: Ryan Dielhenn
>Assignee: Ryan Dielhenn
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13051) Require Principal Serde to be defined for 3.0

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-13051:
---
Fix Version/s: 3.0.0

> Require Principal Serde to be defined for 3.0
> -
>
> Key: KAFKA-13051
> URL: https://issues.apache.org/jira/browse/KAFKA-13051
> Project: Kafka
>  Issue Type: Task
>Reporter: Ryan Dielhenn
>Assignee: Ryan Dielhenn
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12487) Sink connectors do not work with the cooperative consumer rebalance protocol

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377785#comment-17377785
 ] 

Konstantine Karantasis commented on KAFKA-12487:


Just a note that for tickets that need to target a specific version, it's 
highly recommended (if not necessary) to add the fix versions field. Marked the 
issue as blocker for 3.0 and will be taking a look before code freeze, which 
approaches quickly. 

> Sink connectors do not work with the cooperative consumer rebalance protocol
> 
>
> Key: KAFKA-12487
> URL: https://issues.apache.org/jira/browse/KAFKA-12487
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Blocker
> Fix For: 3.0.0
>
>
> The {{ConsumerRebalanceListener}} used by the framework to respond to 
> rebalance events in consumer groups for sink tasks is hard-coded with the 
> assumption that the consumer performs rebalances eagerly. In other words, it 
> assumes that whenever {{onPartitionsRevoked}} is called, all partitions have 
> been revoked from that consumer, and whenever {{onPartitionsAssigned}} is 
> called, the partitions passed in to that method comprise the complete set of 
> topic partitions assigned to that consumer.
> See the [WorkerSinkTask.HandleRebalance 
> class|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L669-L730]
>  for the specifics.
>  
> One issue this can cause is silently ignoring to-be-committed offsets 
> provided by sink tasks, since the framework ignores offsets provided by tasks 
> in their {{preCommit}} method if it does not believe that the consumer for 
> that task is currently assigned the topic partition for that offset. See 
> these lines in the [WorkerSinkTask::commitOffsets 
> method|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L429-L430]
>  for reference.
>  
> This may not be the only issue caused by configuring a sink connector's 
> consumer to use cooperative rebalancing. Rigorous unit and integration 
> testing should be added before claiming that the Connect framework supports 
> the use of cooperative consumers with sink connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12487) Sink connectors do not work with the cooperative consumer rebalance protocol

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12487:
---
Fix Version/s: 3.0.0

> Sink connectors do not work with the cooperative consumer rebalance protocol
> 
>
> Key: KAFKA-12487
> URL: https://issues.apache.org/jira/browse/KAFKA-12487
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0
>
>
> The {{ConsumerRebalanceListener}} used by the framework to respond to 
> rebalance events in consumer groups for sink tasks is hard-coded with the 
> assumption that the consumer performs rebalances eagerly. In other words, it 
> assumes that whenever {{onPartitionsRevoked}} is called, all partitions have 
> been revoked from that consumer, and whenever {{onPartitionsAssigned}} is 
> called, the partitions passed in to that method comprise the complete set of 
> topic partitions assigned to that consumer.
> See the [WorkerSinkTask.HandleRebalance 
> class|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L669-L730]
>  for the specifics.
>  
> One issue this can cause is silently ignoring to-be-committed offsets 
> provided by sink tasks, since the framework ignores offsets provided by tasks 
> in their {{preCommit}} method if it does not believe that the consumer for 
> that task is currently assigned the topic partition for that offset. See 
> these lines in the [WorkerSinkTask::commitOffsets 
> method|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L429-L430]
>  for reference.
>  
> This may not be the only issue caused by configuring a sink connector's 
> consumer to use cooperative rebalancing. Rigorous unit and integration 
> testing should be added before claiming that the Connect framework supports 
> the use of cooperative consumers with sink connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12487) Sink connectors do not work with the cooperative consumer rebalance protocol

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12487:
---
Priority: Blocker  (was: Major)

> Sink connectors do not work with the cooperative consumer rebalance protocol
> 
>
> Key: KAFKA-12487
> URL: https://issues.apache.org/jira/browse/KAFKA-12487
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Blocker
> Fix For: 3.0.0
>
>
> The {{ConsumerRebalanceListener}} used by the framework to respond to 
> rebalance events in consumer groups for sink tasks is hard-coded with the 
> assumption that the consumer performs rebalances eagerly. In other words, it 
> assumes that whenever {{onPartitionsRevoked}} is called, all partitions have 
> been revoked from that consumer, and whenever {{onPartitionsAssigned}} is 
> called, the partitions passed in to that method comprise the complete set of 
> topic partitions assigned to that consumer.
> See the [WorkerSinkTask.HandleRebalance 
> class|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L669-L730]
>  for the specifics.
>  
> One issue this can cause is silently ignoring to-be-committed offsets 
> provided by sink tasks, since the framework ignores offsets provided by tasks 
> in their {{preCommit}} method if it does not believe that the consumer for 
> that task is currently assigned the topic partition for that offset. See 
> these lines in the [WorkerSinkTask::commitOffsets 
> method|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L429-L430]
>  for reference.
>  
> This may not be the only issue caused by configuring a sink connector's 
> consumer to use cooperative rebalancing. Rigorous unit and integration 
> testing should be added before claiming that the Connect framework supports 
> the use of cooperative consumers with sink connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4327) Move Reset Tool from core to streams

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1739#comment-1739
 ] 

Konstantine Karantasis commented on KAFKA-4327:
---

Based on your last comment [~mjsax] I've updated the target version for this 
issue to 4.0

> Move Reset Tool from core to streams
> 
>
> Key: KAFKA-4327
> URL: https://issues.apache.org/jira/browse/KAFKA-4327
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Blocker
>  Labels: kip
> Fix For: 4.0.0
>
>
> This is a follow up on https://issues.apache.org/jira/browse/KAFKA-4008
> Currently, Kafka Streams Application Reset Tool is part of {{core}} module 
> due to ZK dependency. After KIP-4 got merged, this dependency can be dropped 
> and the Reset Tool can be moved to {{streams}} module.
> This should also update {{InternalTopicManager#filterExistingTopics}} that 
> revers to ResetTool in an exception message:
>  {{"Use 'kafka.tools.StreamsResetter' tool"}}
>  -> {{"Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"}}
> Doing this JIRA also requires to update the docs with regard to broker 
> backward compatibility – not all broker support "topic delete request" and 
> thus, the reset tool will not be backward compatible to all broker versions.
> KIP-756: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-756%3A+Move+StreamsResetter+tool+outside+of+core]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-4327) Move Reset Tool from core to streams

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-4327:
--
Fix Version/s: (was: 3.0.0)
   4.0.0

> Move Reset Tool from core to streams
> 
>
> Key: KAFKA-4327
> URL: https://issues.apache.org/jira/browse/KAFKA-4327
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Blocker
>  Labels: kip
> Fix For: 4.0.0
>
>
> This is a follow up on https://issues.apache.org/jira/browse/KAFKA-4008
> Currently, Kafka Streams Application Reset Tool is part of {{core}} module 
> due to ZK dependency. After KIP-4 got merged, this dependency can be dropped 
> and the Reset Tool can be moved to {{streams}} module.
> This should also update {{InternalTopicManager#filterExistingTopics}} that 
> revers to ResetTool in an exception message:
>  {{"Use 'kafka.tools.StreamsResetter' tool"}}
>  -> {{"Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"}}
> Doing this JIRA also requires to update the docs with regard to broker 
> backward compatibility – not all broker support "topic delete request" and 
> thus, the reset tool will not be backward compatible to all broker versions.
> KIP-756: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-756%3A+Move+StreamsResetter+tool+outside+of+core]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9295) KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1738#comment-1738
 ] 

Konstantine Karantasis commented on KAFKA-9295:
---

Thanks for the update [~ableegoldman]. Based on that I've added 3.0 as a target 
fix version for https://issues.apache.org/jira/browse/KAFKA-13008 and have 
marked that ticket as blocker. I'll be looking for any updates on that issue or 
its severity. 

> KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable
> --
>
> Key: KAFKA-9295
> URL: https://issues.apache.org/jira/browse/KAFKA-9295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.4.0, 2.6.0
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/27106/testReport/junit/org.apache.kafka.streams.integration/KTableKTableForeignKeyInnerJoinMultiIntegrationTest/shouldInnerJoinMultiPartitionQueryable/]
> {quote}java.lang.AssertionError: Did not receive all 1 records from topic 
> output- within 6 ms Expected: is a value equal to or greater than <1> 
> but: <0> was less than <1> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:515)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:511)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:489)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.verifyKTableKTableJoin(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:200)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:183){quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13008) Stream will stop processing data for a long time while waiting for the partition lag

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-13008:
---
Fix Version/s: 3.0.0

> Stream will stop processing data for a long time while waiting for the 
> partition lag
> 
>
> Key: KAFKA-13008
> URL: https://issues.apache.org/jira/browse/KAFKA-13008
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Luke Chen
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2021-07-07-11-19-55-630.png
>
>
> In KIP-695, we improved the task idling mechanism by checking partition lag. 
> It's a good improvement for timestamp sync. But I found it will cause the 
> stream stop processing the data for a long time while waiting for the 
> partition metadata.
>  
> I've been investigating this case for a while, and figuring out the issue 
> will happen in below situation (or similar situation):
>  # start 2 streams (each with 1 thread) to consume from a topicA (with 3 
> partitions: A-0, A-1, A-2)
>  # After 2 streams started, the partitions assignment are: (I skipped some 
> other processing related partitions for simplicity)
>  stream1-thread1: A-0, A-1 
>  stream2-thread1: A-2
>  # start processing some data, assume now, the position and high watermark is:
>  A-0: offset: 2, highWM: 2
>  A-1: offset: 2, highWM: 2
>  A-2: offset: 2, highWM: 2
>  # Now, stream3 joined, so trigger rebalance with this assignment:
>  stream1-thread1: A-0 
>  stream2-thread1: A-2
>  stream3-thread1: A-1
>  # Suddenly, stream3 left, so now, rebalance again, with the step 2 
> assignment:
>  stream1-thread1: A-0, *A-1* 
>  stream2-thread1: A-2
>  (note: after initialization, the  position of A-1 will be: position: null, 
> highWM: null)
>  # Now, note that, the partition A-1 used to get assigned to stream1-thread1, 
> and now, it's back. And also, assume the partition A-1 has slow input (ex: 1 
> record per 30 mins), and partition A-0 has fast input (ex: 10K records / 
> sec). So, now, the stream1-thread1 won't process any data until we got input 
> from partition A-1 (even if partition A-0 is buffered a lot, and we have 
> `{{max.task.idle.ms}}` set to 0).
>  
> The reason why the stream1-thread1 won't process any data is because we can't 
> get the lag of partition A-1. And why we can't get the lag? It's because
>  # In KIP-695, we use consumer's cache to get the partition lag, to avoid 
> remote call
>  # The lag for a partition will be cleared if the assignment in this round 
> doesn't have this partition. check 
> [here|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java#L272].
>  So, in the above example, the metadata cache for partition A-1 will be 
> cleared in step 4, and re-initialized (to null) in step 5
>  # In KIP-227, we introduced a fetch session to have incremental fetch 
> request/response. That is, if the session existed, the client(consumer) will 
> get the update only when the fetched partition have update (ex: new data). 
> So, in the above case, the partition A-1 has slow input (ex: 1 record per 30 
> mins), it won't have update until next 30 mins, or wait for the fetch session 
> become inactive for (default) 2 mins to be evicted. Either case, the metadata 
> won't be updated for a while.
>  
> In KIP-695, if we don't get the partition lag, we can't determine the 
> partition data status to do timestamp sync, so we'll keep waiting and not 
> processing any data. That's why this issue will happen.
>  
> *Proposed solution:*
>  # If we don't get the current lag for a partition, or the current lag > 0, 
> we start to wait for max.task.idle.ms, and reset the deadline when we get the 
> partition lag, like what we did in previous KIP-353
>  # Introduce a waiting time config when no partition lag, or partition lag 
> keeps > 0 (need KIP)
> [~vvcephei] [~guozhang] , any suggestions?
>  
> cc [~ableegoldman]  [~mjsax] , this is the root cause that in 
> [https://github.com/apache/kafka/pull/10736,] we discussed and thought 
> there's a data lose situation. FYI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13008) Stream will stop processing data for a long time while waiting for the partition lag

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-13008:
---
Priority: Blocker  (was: Major)

> Stream will stop processing data for a long time while waiting for the 
> partition lag
> 
>
> Key: KAFKA-13008
> URL: https://issues.apache.org/jira/browse/KAFKA-13008
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: image-2021-07-07-11-19-55-630.png
>
>
> In KIP-695, we improved the task idling mechanism by checking partition lag. 
> It's a good improvement for timestamp sync. But I found it will cause the 
> stream stop processing the data for a long time while waiting for the 
> partition metadata.
>  
> I've been investigating this case for a while, and figuring out the issue 
> will happen in below situation (or similar situation):
>  # start 2 streams (each with 1 thread) to consume from a topicA (with 3 
> partitions: A-0, A-1, A-2)
>  # After 2 streams started, the partitions assignment are: (I skipped some 
> other processing related partitions for simplicity)
>  stream1-thread1: A-0, A-1 
>  stream2-thread1: A-2
>  # start processing some data, assume now, the position and high watermark is:
>  A-0: offset: 2, highWM: 2
>  A-1: offset: 2, highWM: 2
>  A-2: offset: 2, highWM: 2
>  # Now, stream3 joined, so trigger rebalance with this assignment:
>  stream1-thread1: A-0 
>  stream2-thread1: A-2
>  stream3-thread1: A-1
>  # Suddenly, stream3 left, so now, rebalance again, with the step 2 
> assignment:
>  stream1-thread1: A-0, *A-1* 
>  stream2-thread1: A-2
>  (note: after initialization, the  position of A-1 will be: position: null, 
> highWM: null)
>  # Now, note that, the partition A-1 used to get assigned to stream1-thread1, 
> and now, it's back. And also, assume the partition A-1 has slow input (ex: 1 
> record per 30 mins), and partition A-0 has fast input (ex: 10K records / 
> sec). So, now, the stream1-thread1 won't process any data until we got input 
> from partition A-1 (even if partition A-0 is buffered a lot, and we have 
> `{{max.task.idle.ms}}` set to 0).
>  
> The reason why the stream1-thread1 won't process any data is because we can't 
> get the lag of partition A-1. And why we can't get the lag? It's because
>  # In KIP-695, we use consumer's cache to get the partition lag, to avoid 
> remote call
>  # The lag for a partition will be cleared if the assignment in this round 
> doesn't have this partition. check 
> [here|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java#L272].
>  So, in the above example, the metadata cache for partition A-1 will be 
> cleared in step 4, and re-initialized (to null) in step 5
>  # In KIP-227, we introduced a fetch session to have incremental fetch 
> request/response. That is, if the session existed, the client(consumer) will 
> get the update only when the fetched partition have update (ex: new data). 
> So, in the above case, the partition A-1 has slow input (ex: 1 record per 30 
> mins), it won't have update until next 30 mins, or wait for the fetch session 
> become inactive for (default) 2 mins to be evicted. Either case, the metadata 
> won't be updated for a while.
>  
> In KIP-695, if we don't get the partition lag, we can't determine the 
> partition data status to do timestamp sync, so we'll keep waiting and not 
> processing any data. That's why this issue will happen.
>  
> *Proposed solution:*
>  # If we don't get the current lag for a partition, or the current lag > 0, 
> we start to wait for max.task.idle.ms, and reset the deadline when we get the 
> partition lag, like what we did in previous KIP-353
>  # Introduce a waiting time config when no partition lag, or partition lag 
> keeps > 0 (need KIP)
> [~vvcephei] [~guozhang] , any suggestions?
>  
> cc [~ableegoldman]  [~mjsax] , this is the root cause that in 
> [https://github.com/apache/kafka/pull/10736,] we discussed and thought 
> there's a data lose situation. FYI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10714) Save unnecessary end txn call when the transaction is confirmed to be done

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1735#comment-1735
 ] 

Konstantine Karantasis commented on KAFKA-10714:


[~bchen225242] the parent issue targets 3.0. Is there any chance this issue can 
be fixed before code freeze? If now, I'll postpone it to the next release. 

> Save unnecessary end txn call when the transaction is confirmed to be done
> --
>
> Key: KAFKA-10714
> URL: https://issues.apache.org/jira/browse/KAFKA-10714
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>
> In certain error cases after KIP-588, we may skip the call of `EndTxn` to the 
> txn coordinator such as for the transaction timeout case, where we know the 
> transaction is already terminated on the broker side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1732#comment-1732
 ] 

Konstantine Karantasis commented on KAFKA-12493:


[~wenbing.shen] [~junrao] is this issue a blocker for 3.0? Code freeze is only 
a few days away. If not, I'll postpone the issue to the next release 

> The controller should handle the consistency between the controllerContext 
> and the partition replicas assignment on zookeeper
> -
>
> Key: KAFKA-12493
> URL: https://issues.apache.org/jira/browse/KAFKA-12493
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
>Reporter: Wenbing Shen
>Assignee: Wenbing Shen
>Priority: Major
> Fix For: 3.0.0
>
>
> This question can be linked to this email: 
> [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E]
>  
> This is a 100% recurring problem.
> Problem description:
> In the production environment of our customer’s site, the existing partitions 
> were redistributed in the code of colleagues in other departments and written 
> into zookeeper. This caused the controller to only judge the newly added 
> partitions when processing partition modification events. Partition 
> allocation plan and new partition and replica allocation in the partition 
> state machine and replica state machine, and issue LeaderAndISR and other 
> control requests.
> But the controller did not verify the existing partition replicas assigment 
> in the controllerContext and whether the original partition allocation on the 
> znode in zookeeper has changed. This seems to be no problem, but when we have 
> to restart the broker for some reasons, such as configuration updates and 
> upgrades Wait, this will cause this part of the topic in real-time production 
> to be abnormal, the controller cannot complete the allocation of the new 
> leader, and the original leader cannot correctly identify the replica 
> allocated on the current zookeeper. The real-time business in our customer's 
> on-site environment is interrupted and partially Data has been lost.
> This problem can be stably reproduced in the following ways:
> Adding partitions or modifying replicas of an existing topic through the 
> following code will cause the original partition replicas to be reallocated 
> and finally written to zookeeper.Next, the controller did not accurately 
> process this event, restart the topic related broker, this topic will not be 
> able to be produced and consumed.
>  
> {code:java}
> public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) {
> ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, 
> CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled());
> try {
> if (kafkaTopicVO.getPartitionNum() >= 0 && 
> kafkaTopicVO.getReplicationNum() >= 0) {
> // Get the original broker data information
> Seq brokerMetadata = 
> AdminUtils.getBrokerMetadatas(zkUtils,
> RackAwareMode.Enforced$.MODULE$,
> Option.apply(null));
> // Generate a new partition replica allocation plan
> scala.collection.Map> replicaAssign = 
> AdminUtils.assignReplicasToBrokers(brokerMetadata,
> kafkaTopicVO.getPartitionNum(), // Number of partitions
> kafkaTopicVO.getReplicationNum(), // Number of replicas 
> per partition
> -1,
> -1);
> // Modify the partition replica allocation plan
> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils,
> kafkaTopicVO.getTopicNameList().get(0),
> replicaAssign,
> null,
> true);
> }
> } catch (Exception e) {
> System.out.println("Adjust partition abnormal");
> System.exit(0);
> } finally {
> zkUtils.close();
> }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12176) Consider changing default log.message.timestamp.difference.max.ms

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12176:
---
Fix Version/s: (was: 3.0.0)
   3.1.0

> Consider changing default log.message.timestamp.difference.max.ms
> -
>
> Key: KAFKA-12176
> URL: https://issues.apache.org/jira/browse/KAFKA-12176
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.1.0
>
>
> The default `log.message.timestamp.difference.max.ms` is Long.MaxValue, which 
> means the broker will accept arbitrary timestamps. The broker relies on 
> timestamps internally for deciding when a segments should be rolled and when 
> they should be deleted. A buggy client which is producing messages with 
> timestamps too far in the future or past can cause segments to roll 
> frequently which can lead to open file exhaustion, or it can cause segments 
> to be retained indefinitely which can lead to disk space exhaustion.
> In https://issues.apache.org/jira/browse/KAFKA-4340, it was proposed to set 
> the default equal to `log.retention.ms`, which still seems to make sense. 
> This was rejected for two reasons as far as I can tell. First was 
> compatibility, which just means we would need a major upgrade. The second is 
> that we previously did not have a way to tell the user which record had 
> violated the max timestamp difference, but now we have 
> [KIP-467|https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12176) Consider changing default log.message.timestamp.difference.max.ms

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1731#comment-1731
 ] 

Konstantine Karantasis commented on KAFKA-12176:


[~hachikuji] [~junrao] I don't see any evidence that this feature made into 3.0 
before the KIP and Feature freezes respectively. I'm moving the target release 
to the next one, but let me know if I've missed something regarding this issue. 

> Consider changing default log.message.timestamp.difference.max.ms
> -
>
> Key: KAFKA-12176
> URL: https://issues.apache.org/jira/browse/KAFKA-12176
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> The default `log.message.timestamp.difference.max.ms` is Long.MaxValue, which 
> means the broker will accept arbitrary timestamps. The broker relies on 
> timestamps internally for deciding when a segments should be rolled and when 
> they should be deleted. A buggy client which is producing messages with 
> timestamps too far in the future or past can cause segments to roll 
> frequently which can lead to open file exhaustion, or it can cause segments 
> to be retained indefinitely which can lead to disk space exhaustion.
> In https://issues.apache.org/jira/browse/KAFKA-4340, it was proposed to set 
> the default equal to `log.retention.ms`, which still seems to make sense. 
> This was rejected for two reasons as far as I can tell. First was 
> compatibility, which just means we would need a major upgrade. The second is 
> that we previously did not have a way to tell the user which record had 
> violated the max timestamp difference, but now we have 
> [KIP-467|https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7435) Consider standardizing the config object pattern on interface/implementation.

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377768#comment-17377768
 ] 

Konstantine Karantasis commented on KAFKA-7435:
---

[~vvcephei] [~guozhang] this feature doesn't seem to have made it before 
feature freeze for 3.0. Postponing to the next release, but let me know if we 
should do otherwise. 

> Consider standardizing the config object pattern on interface/implementation.
> -
>
> Key: KAFKA-7435
> URL: https://issues.apache.org/jira/browse/KAFKA-7435
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, the majority of Streams's config objects are structured as a 
> "external" builder class (with protected state) and an "internal" subclass 
> exposing getters to the state. This is serviceable, but there is an 
> alternative we can consider: to use an interface for the external API and the 
> implementation class for the internal one.
> Advantages:
>  * we could use private state, which improves maintainability
>  * the setters and getters would all be defined in the same class, improving 
> readability
>  * users browsing the public API would be able to look at an interface that 
> contains less extraneous internal details than the current class
>  * there is more flexibility in implementation
> Alternatives
>  * instead of external-class/internal-subclass, we could use an external 
> *final* class with package-protected state and an internal accessor class 
> (not a subclass, obviously). This would make it impossible for users to try 
> and create custom subclasses of our config objects, which is generally not 
> allowed already, but is currently a runtime class cast exception.
> Example implementation: [https://github.com/apache/kafka/pull/5677]
> This change would break binary, but not source, compatibility, so the 
> earliest we could consider it is 3.0.
> To be clear, I'm *not* saying this *should* be done, just calling for a 
> discussion. Otherwise, I'd make a KIP.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7435) Consider standardizing the config object pattern on interface/implementation.

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-7435:
--
Fix Version/s: (was: 3.0.0)
   3.1.0

> Consider standardizing the config object pattern on interface/implementation.
> -
>
> Key: KAFKA-7435
> URL: https://issues.apache.org/jira/browse/KAFKA-7435
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, the majority of Streams's config objects are structured as a 
> "external" builder class (with protected state) and an "internal" subclass 
> exposing getters to the state. This is serviceable, but there is an 
> alternative we can consider: to use an interface for the external API and the 
> implementation class for the internal one.
> Advantages:
>  * we could use private state, which improves maintainability
>  * the setters and getters would all be defined in the same class, improving 
> readability
>  * users browsing the public API would be able to look at an interface that 
> contains less extraneous internal details than the current class
>  * there is more flexibility in implementation
> Alternatives
>  * instead of external-class/internal-subclass, we could use an external 
> *final* class with package-protected state and an internal accessor class 
> (not a subclass, obviously). This would make it impossible for users to try 
> and create custom subclasses of our config objects, which is generally not 
> allowed already, but is currently a runtime class cast exception.
> Example implementation: [https://github.com/apache/kafka/pull/5677]
> This change would break binary, but not source, compatibility, so the 
> earliest we could consider it is 3.0.
> To be clear, I'm *not* saying this *should* be done, just calling for a 
> discussion. Otherwise, I'd make a KIP.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10675) Error message from ConnectSchema.validateValue() should include the name of the schema.

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377765#comment-17377765
 ] 

Konstantine Karantasis commented on KAFKA-10675:


[~iskuskov] [~rhauch] left a comment on the PR which seems to have been 
approved. It'd be great if we could merge and resolve before the 3.0 code 
freeze 

> Error message from ConnectSchema.validateValue() should include the name of 
> the schema.
> ---
>
> Key: KAFKA-10675
> URL: https://issues.apache.org/jira/browse/KAFKA-10675
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Alexander Iskuskov
>Priority: Minor
> Fix For: 3.0.0, 2.8.1
>
>
> The following error message
> {code:java}
> org.apache.kafka.connect.errors.DataException: Invalid Java object for schema 
> type INT64: class java.lang.Long for field: "moderate_time"
> {code}
> can be confusing because {{java.lang.Long}} is acceptable type for schema 
> {{INT64}}. In fact, in this case {{org.apache.kafka.connect.data.Timestamp}} 
> is used but this info is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] kkonstantine commented on pull request #9541: KAFKA-10675: Add schema name to ConnectSchema.validateValue() error message

2021-07-08 Thread GitBox


kkonstantine commented on pull request #9541:
URL: https://github.com/apache/kafka/pull/9541#issuecomment-876887351


   @rhauch seems that this fix was approved but we were blocked on release 
candidates of previous releases. To include to 3.0 (as well as previous 
releases) and avoid entering a new code freeze phase we'd need to merge asap. 
Please take another look if possible. 
   
   @Iskuskov in the meantime a merge conflict has been introduced. Would you 
mind rebasing on top of the latest changes on `trunk`?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] kkonstantine commented on pull request #10827: [WIP] KAFKA-12899: Support --bootstrap-server in ReplicaVerificationTool

2021-07-08 Thread GitBox


kkonstantine commented on pull request #10827:
URL: https://github.com/apache/kafka/pull/10827#issuecomment-876886212


   @dongjinleekr @tombentley if we intend to include this fix to 3.0 we'll need 
to merge asap. Please let me know so I can update the status of 
https://issues.apache.org/jira/browse/KAFKA-12899 accordingly 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-12774) kafka-streams 2.8: logging in uncaught-exceptionhandler doesn't go through log4j

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377754#comment-17377754
 ] 

Konstantine Karantasis commented on KAFKA-12774:


Code freeze for 3.0 is in just a few days. I'll postpone to a subsequent 
release if we can't resolve the issue by then. 

> kafka-streams 2.8: logging in uncaught-exceptionhandler doesn't go through 
> log4j
> 
>
> Key: KAFKA-12774
> URL: https://issues.apache.org/jira/browse/KAFKA-12774
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Jørgen
>Priority: Minor
> Fix For: 3.0.0, 2.8.1
>
>
> When exceptions is handled in the uncaught-exception handler introduced in 
> KS2.8, the logging of the stacktrace doesn't seem to go through the logging 
> framework configured by the application (log4j2 in our case), but gets 
> printed to console "line-by-line".
> All other exceptions logged by kafka-streams go through log4j2 and gets 
> formatted properly according to the log4j2 appender (json in our case). 
> Haven't tested this on other frameworks like logback.
> Application setup:
>  * Spring-boot 2.4.5
>  * Log4j 2.13.3
>  * Slf4j 1.7.30
> Log4j2 appender config:
> {code:java}
> 
> 
>  stacktraceAsString="true" properties="true">
>  value="$${date:-MM-dd'T'HH:mm:ss.SSSZ}"/>
> 
> 
>  {code}
> Uncaught exception handler config:
> {code:java}
> kafkaStreams.setUncaughtExceptionHandler { exception ->
> logger.warn("Uncaught exception handled - replacing thread", exception) 
> // logged properly
> 
> StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD
> } {code}
> Stacktrace that gets printed line-by-line:
> {code:java}
> Exception in thread "xxx-f5860dff-9a41-490e-8ab0-540b1a7f9ce4-StreamThread-2" 
> org.apache.kafka.streams.errors.StreamsException: Error encountered sending 
> record to topic xxx-repartition for task 3_2 due 
> to:org.apache.kafka.common.errors.InvalidPidMappingException: The producer 
> attempted to use a producer id which is not currently assigned to its 
> transactional id.Exception handler choose to FAIL the processing, no more 
> records would be sent.  at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:226)
>at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.lambda$send$0(RecordCollectorImpl.java:196)
>  at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1365)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:231)
>  at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.abort(ProducerBatch.java:159)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortUndrainedBatches(RecordAccumulator.java:783)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:430)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:315)  
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:242)
>   at java.base/java.lang.Thread.run(Unknown Source)Caused by: 
> org.apache.kafka.common.errors.InvalidPidMappingException: The producer 
> attempted to use a producer id which is not currently assigned to its 
> transactional id. {code}
>  
> It's a little bit hard to reproduce as I haven't found any way to trigger 
> uncaught-exception-handler through junit-tests.
> Link to discussion on slack: 
> https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1620389197436700



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12686) Race condition in AlterIsr response handling

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-12686.

Resolution: Fixed

> Race condition in AlterIsr response handling
> 
>
> Key: KAFKA-12686
> URL: https://issues.apache.org/jira/browse/KAFKA-12686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.0, 2.8.0
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
> Fix For: 3.0.0
>
>
> In Partition.scala, there is a race condition between the handling of an 
> AlterIsrResponse and a LeaderAndIsrRequest. This is a pretty rare scenario 
> and would involve the AlterIsrResponse being delayed for some time, but it is 
> possible. This was observed in a test environment when lots of ISR and 
> leadership changes were happening due to broker restarts.
> When the leader handles the LeaderAndIsr, it calls Partition#makeLeader which 
> overrides the {{isrState}} variable and clears the pending ISR items via 
> {{AlterIsrManager#clearPending(TopicPartition)}}. 
> The bug is that AlterIsrManager does not check its inflight state before 
> clearing pending items. The way AlterIsrManager is designed, it retains 
> inflight items in the pending items collection until the response is 
> processed (to allow for retries). The result is that an inflight item is 
> inadvertently removed from this collection.
> Since the inflight item is cleared from the collection, AlterIsrManager 
> allows for new AlterIsrItem-s to be enqueued for this partition even though 
> it has an inflight AlterIsrItem. By allowing an update to be enqueued, 
> Partition will transition its {{isrState}} to one of the inflight states 
> (PendingIsrExpand, PendingIsrShrink, etc). Once the inflight partition's 
> response is handled, it will fail to update the {{isrState}} due to detecting 
> changes since the request was sent (which is by design). However, after the 
> response callback is run, AlterIsrManager will clear the partitions that it 
> saw in the response from the unsent items collection. This includes the newly 
> added (and unsent) update.
> The result is that Partition has a "inflight" isrState but AlterIsrManager 
> does not have an unsent item for this partition. This prevents any further 
> ISR updates on the partition until the next leader election (when 
> {{isrState}} is reset).
> If this bug is encountered, the workaround is to force a leader election 
> which will reset the partition's state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12781) Improve the endOffsets accuracy in TaskMetadata

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377752#comment-17377752
 ] 

Konstantine Karantasis commented on KAFKA-12781:


[~ableegoldman] [~wcarlson5] we have passed feature freeze for 3.0 and this 
issue doesn't seem to be a bug. Postponing to the subsequent release.
[|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13377582]

> Improve the endOffsets accuracy in TaskMetadata 
> 
>
> Key: KAFKA-12781
> URL: https://issues.apache.org/jira/browse/KAFKA-12781
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Walker Carlson
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently `TaskMetadata#endOffsets()` returns the highest offset seen by the 
> main consumer in streams. It should be possible to get the highest offset in 
> the topic via the consumer instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12781) Improve the endOffsets accuracy in TaskMetadata

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12781:
---
Fix Version/s: (was: 3.0.0)
   3.1.0

> Improve the endOffsets accuracy in TaskMetadata 
> 
>
> Key: KAFKA-12781
> URL: https://issues.apache.org/jira/browse/KAFKA-12781
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Walker Carlson
>Priority: Minor
> Fix For: 3.1.0
>
>
> Currently `TaskMetadata#endOffsets()` returns the highest offset seen by the 
> main consumer in streams. It should be possible to get the highest offset in 
> the topic via the consumer instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12766) Consider Disabling WAL-related Options in RocksDB

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12766:
---
Fix Version/s: (was: 3.0.0)
   3.1.0

> Consider Disabling WAL-related Options in RocksDB
> -
>
> Key: KAFKA-12766
> URL: https://issues.apache.org/jira/browse/KAFKA-12766
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Minor
>  Labels: newbie, newbie++
> Fix For: 3.1.0
>
>
> Streams disables the write-ahead log (WAL) provided by RocksDB since it 
> replicates the data in changelog topics. Hence, it does not make much sense 
> to set WAL-related configs for RocksDB instances within Streams.
> Streams could:
> - disable WAL-related options
> - ignore WAL-related options
> - throw an exception when a WAL-related option is set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12766) Consider Disabling WAL-related Options in RocksDB

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377751#comment-17377751
 ] 

Konstantine Karantasis commented on KAFKA-12766:


We are past feature freeze for 3.0 and this issue doesn't seem to be a bug. 
Postponing to the subsequent release. 

> Consider Disabling WAL-related Options in RocksDB
> -
>
> Key: KAFKA-12766
> URL: https://issues.apache.org/jira/browse/KAFKA-12766
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Minor
>  Labels: newbie, newbie++
> Fix For: 3.0.0
>
>
> Streams disables the write-ahead log (WAL) provided by RocksDB since it 
> replicates the data in changelog topics. Hence, it does not make much sense 
> to set WAL-related configs for RocksDB instances within Streams.
> Streams could:
> - disable WAL-related options
> - ignore WAL-related options
> - throw an exception when a WAL-related option is set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12699) Streams no longer overrides the java default uncaught exception handler

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377748#comment-17377748
 ] 

Konstantine Karantasis commented on KAFKA-12699:


[~ableegoldman] [~wcarlson5] is this issue a blocker for 3.0. Can we still make 
it on time before code freeze? Please advise if we should postpone to the next 
version or wait for a PR

> Streams no longer overrides the java default uncaught exception handler  
> -
>
> Key: KAFKA-12699
> URL: https://issues.apache.org/jira/browse/KAFKA-12699
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Walker Carlson
>Priority: Minor
> Fix For: 3.0.0
>
>
> If a user used `Thread.setUncaughtExceptionHanlder()` to set the handler for 
> all threads in the runtime streams would override that with its own handler. 
> However since streams does not use the `Thread` handler anymore it will no 
> longer do so. This can cause problems if the user does something like 
> `System.exit(1)` in the handler. 
>  
> If using the old handler in streams it will still work as it used to



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10233) KafkaConsumer polls in a tight loop if group is not authorized

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377746#comment-17377746
 ] 

Konstantine Karantasis commented on KAFKA-10233:


The PR didn't get any updates for a while. Postponing for the next release but 
let me know if it can still be merged to 3.0 before code freeze. 

> KafkaConsumer polls in a tight loop if group is not authorized
> --
>
> Key: KAFKA-10233
> URL: https://issues.apache.org/jira/browse/KAFKA-10233
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 3.0.0
>
>
> Consumer propagates GroupAuthorizationException from poll immediately when 
> trying to find coordinator even though it is a retriable exception. If the 
> application polls in a loop, ignoring retriable exceptions, the consumer 
> tries to find coordinator in a tight loop without any backoff. We should 
> apply retry backoff in this case to avoid overloading brokers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10233) KafkaConsumer polls in a tight loop if group is not authorized

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10233:
---
Fix Version/s: (was: 3.0.0)
   3.1.0

> KafkaConsumer polls in a tight loop if group is not authorized
> --
>
> Key: KAFKA-10233
> URL: https://issues.apache.org/jira/browse/KAFKA-10233
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 3.1.0
>
>
> Consumer propagates GroupAuthorizationException from poll immediately when 
> trying to find coordinator even though it is a retriable exception. If the 
> application polls in a loop, ignoring retriable exceptions, the consumer 
> tries to find coordinator in a tight loop without any backoff. We should 
> apply retry backoff in this case to avoid overloading brokers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12884) Remove "--zookeeper" in system tests

2021-07-08 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-12884.
---
Resolution: Done

> Remove "--zookeeper" in system tests
> 
>
> Key: KAFKA-12884
> URL: https://issues.apache.org/jira/browse/KAFKA-12884
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Have a quick scan, found currently, we did use "–zookeeper" option for some 
> cases. Need to re-visit them to see if they need to removed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7025) Android client support

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-7025:
--
Fix Version/s: (was: 3.0.0)
   3.1.0

> Android client support
> --
>
> Key: KAFKA-7025
> URL: https://issues.apache.org/jira/browse/KAFKA-7025
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 1.0.1
>Reporter: Martin Vysny
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: image-2020-07-21-06-42-03-140.png
>
>
> Adding kafka-clients:1.0.0 (also 1.0.1 and 1.1.0) to an Android project would 
> make  the compilation fail: com.android.tools.r8.ApiLevelException: 
> MethodHandle.invoke and MethodHandle.invokeExact are only supported starting 
> with Android O (--min-api 26)
>  
> Would it be possible to make the kafka-clients backward compatible with 
> reasonable Android API (say, 4.4.x) please?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7025) Android client support

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377744#comment-17377744
 ] 

Konstantine Karantasis commented on KAFKA-7025:
---

I see this issue is still open but we are now past feature freeze for 3.0. I'm 
postponing this issue to the next version, but let me know if it can be 
resolved instead (now that https://issues.apache.org/jira/browse/KAFKA-12327 is 
fixed)

> Android client support
> --
>
> Key: KAFKA-7025
> URL: https://issues.apache.org/jira/browse/KAFKA-7025
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 1.0.1
>Reporter: Martin Vysny
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2020-07-21-06-42-03-140.png
>
>
> Adding kafka-clients:1.0.0 (also 1.0.1 and 1.1.0) to an Android project would 
> make  the compilation fail: com.android.tools.r8.ApiLevelException: 
> MethodHandle.invoke and MethodHandle.invokeExact are only supported starting 
> with Android O (--min-api 26)
>  
> Would it be possible to make the kafka-clients backward compatible with 
> reasonable Android API (say, 4.4.x) please?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12884) Remove "--zookeeper" in system tests

2021-07-08 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377745#comment-17377745
 ] 

Luke Chen commented on KAFKA-12884:
---

[~kkonstantine], thanks to [~rndgstn] , this is fixed in this PR: 
[https://github.com/apache/kafka/pull/10918.] So we can close this ticket now.

> Remove "--zookeeper" in system tests
> 
>
> Key: KAFKA-12884
> URL: https://issues.apache.org/jira/browse/KAFKA-12884
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Have a quick scan, found currently, we did use "–zookeeper" option for some 
> cases. Need to re-visit them to see if they need to removed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12622) Automate LICENSE file validation

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12622:
---
Summary: Automate LICENSE file validation  (was: Automate LICENCSE file 
validation)

> Automate LICENSE file validation
> 
>
> Key: KAFKA-12622
> URL: https://issues.apache.org/jira/browse/KAFKA-12622
> Project: Kafka
>  Issue Type: Task
>Reporter: John Roesler
>Priority: Major
> Fix For: 3.0.0, 2.8.1
>
>
> In https://issues.apache.org/jira/browse/KAFKA-12602, we manually constructed 
> a correct license file for 2.8.0. This file will certainly become wrong again 
> in later releases, so we need to write some kind of script to automate a 
> check.
> It crossed my mind to automate the generation of the file, but it seems to be 
> an intractable problem, considering that each dependency may change licenses, 
> may package license files, link to them from their poms, link to them from 
> their repos, etc. I've also found multiple URLs listed with various 
> delimiters, broken links that I have to chase down, etc.
> Therefore, it seems like the solution to aim for is simply: list all the jars 
> that we package, and print out a report of each jar that's extra or missing 
> vs. the ones in our `LICENSE-binary` file.
> The check should be part of the release script at least, if not part of the 
> regular build (so we keep it up to date as dependencies change).
>  
> Here's how I do this manually right now:
> {code:java}
> // build the binary artifacts
> $ ./gradlewAll releaseTarGz
> // unpack the binary artifact 
> $ tar xf core/build/distributions/kafka_2.13-X.Y.Z.tgz
> $ cd xf kafka_2.13-X.Y.Z
> // list the packaged jars 
> // (you can ignore the jars for our own modules, like kafka, kafka-clients, 
> etc.)
> $ ls libs/
> // cross check the jars with the packaged LICENSE
> // make sure all dependencies are listed with the right versions
> $ cat LICENSE
> // also double check all the mentioned license files are present
> $ ls licenses {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12792) Fix metrics bug and introduce TimelineInteger

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-12792.

Resolution: Fixed

> Fix metrics bug and introduce TimelineInteger
> -
>
> Key: KAFKA-12792
> URL: https://issues.apache.org/jira/browse/KAFKA-12792
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.0.0
>
>
> Introduce a TimelineInteger class which represents a single integer value 
> which can be changed while maintaining snapshot consistency.  Fix a case 
> where a metric value would be corrupted after a snapshot restore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9910) Implement new transaction timed out error

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377741#comment-17377741
 ] 

Konstantine Karantasis commented on KAFKA-9910:
---

[~bchen225242] / [~zhaohaidao] similar question but for 3.0 this time. I see 
there has been progress on the PR. However code freeze for 3.0 is approaching 
quickly. 
Please consider postponing or let me know if the issue can be resolved before 
code freeze. 

> Implement new transaction timed out error
> -
>
> Key: KAFKA-9910
> URL: https://issues.apache.org/jira/browse/KAFKA-9910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core
>Reporter: Boyang Chen
>Assignee: HaiyuanZhao
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10588) Rename kafka-console-consumer CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377738#comment-17377738
 ] 

Konstantine Karantasis commented on KAFKA-10588:


The parent issue targets 3.0.0 and this issue is one of the two remaining to 
resolve it. 
Added as blocker for 3.0.0 but let me know if there's no chance for this issue 
to be completed before the code freeze of July 14th, 2021

> Rename kafka-console-consumer CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10588
> URL: https://issues.apache.org/jira/browse/KAFKA-10588
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10589) Rename kafka-replica-verification CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10589:
---
Priority: Blocker  (was: Major)

> Rename kafka-replica-verification CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10589
> URL: https://issues.apache.org/jira/browse/KAFKA-10589
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10589) Rename kafka-replica-verification CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10589:
---
Fix Version/s: 3.0.0

> Rename kafka-replica-verification CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10589
> URL: https://issues.apache.org/jira/browse/KAFKA-10589
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10589) Rename kafka-replica-verification CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377739#comment-17377739
 ] 

Konstantine Karantasis commented on KAFKA-10589:


[~omnia_h_ibrahim] the parent issue targets 3.0.0 and this issue is one of the 
two remaining to resolve it. 
Added as blocker for 3.0.0 but let me know if there's no chance for this issue 
to be completed before the code freeze of July 14th, 2021

> Rename kafka-replica-verification CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10589
> URL: https://issues.apache.org/jira/browse/KAFKA-10589
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10588) Rename kafka-console-consumer CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10588:
---
Priority: Blocker  (was: Major)

> Rename kafka-console-consumer CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10588
> URL: https://issues.apache.org/jira/browse/KAFKA-10588
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10588) Rename kafka-console-consumer CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10588:
---
Fix Version/s: 3.0.0

> Rename kafka-console-consumer CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10588
> URL: https://issues.apache.org/jira/browse/KAFKA-10588
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10587) Rename kafka-mirror-maker CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10587:
---
Fix Version/s: 3.0.0

> Rename kafka-mirror-maker CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10587
> URL: https://issues.apache.org/jira/browse/KAFKA-10587
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10587) Rename kafka-mirror-maker CLI command line arguments for KIP-629

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-10587.

Resolution: Fixed

> Rename kafka-mirror-maker CLI command line arguments for KIP-629
> 
>
> Key: KAFKA-10587
> URL: https://issues.apache.org/jira/browse/KAFKA-10587
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Xavier Léauté
>Assignee: Omnia Ibrahim
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12886) Enable request forwarding by default in 3.0

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377735#comment-17377735
 ] 

Konstantine Karantasis commented on KAFKA-12886:


[~rdielhenn] is this improvement still targeting 3.0? Code freeze is 
approaching, please consider updating the status or the target version 
accordingly. 

> Enable request forwarding by default in 3.0
> ---
>
> Key: KAFKA-12886
> URL: https://issues.apache.org/jira/browse/KAFKA-12886
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Ryan Dielhenn
>Priority: Major
> Fix For: 3.0.0
>
>
> KIP-590 documents that request forwarding will be enabled in 3.0 by default: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller.
>  This makes it a requirement for users with custom principal implementations 
> to provide a `KafkaPrincipalSerde` implementation. We waited until 3.0 
> because we saw this as a compatibility break. 
> The KIP documents that use of forwarding will be controlled by the IBP. So 
> once the IBP has been configured to 3.0 or above, then the brokers will begin 
> forwarding.
> (Note that forwarding has always been a requirement for kraft.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] cmccabe commented on pull request #10997: MINOR: Fix flaky deleteSnapshots test

2021-07-08 Thread GitBox


cmccabe commented on pull request #10997:
URL: https://github.com/apache/kafka/pull/10997#issuecomment-876822894


   Committed to trunk and 3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] cmccabe closed pull request #10997: MINOR: Fix flaky deleteSnapshots test

2021-07-08 Thread GitBox


cmccabe closed pull request #10997:
URL: https://github.com/apache/kafka/pull/10997


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] cmccabe commented on pull request #10991: MINOR: system test fix for 3 co-located KRaft controllers

2021-07-08 Thread GitBox


cmccabe commented on pull request #10991:
URL: https://github.com/apache/kafka/pull/10991#issuecomment-876822385


   > That's a good idea. I think the diff above has a problem in that we would 
emit the message too early for the case where process.roles=broker, but I think 
the idea in general is sound.
   
   How about something like this?
   ```
   diff --git a/core/src/main/scala/kafka/server/KafkaRaftServer.scala 
b/core/src/main/scala/kafka/server/KafkaRaftServer.scala
   index 8e77357383..310aed8253 100644
   --- a/core/src/main/scala/kafka/server/KafkaRaftServer.scala
   +++ b/core/src/main/scala/kafka/server/KafkaRaftServer.scala
   @@ -106,10 +106,15 @@ class KafkaRaftServer(
  override def startup(): Unit = {
Mx4jLoader.maybeLoad()
raftManager.startup()
   -controller.foreach(_.startup())
   -broker.foreach(_.startup())
   +controller.foreach { controller =>
   +  controller.startup()
   +  info(KafkaBroker.STARTED_MESSAGE + " controller")
   +}
   +broker.foreach { broker =>
   +  broker.startup()
   +  info(KafkaBroker.STARTED_MESSAGE + " broker")
   +}
AppInfoParser.registerAppInfo(Server.MetricsPrefix, 
config.brokerId.toString, metrics, time.milliseconds())
   -info(KafkaBroker.STARTED_MESSAGE)
  }

  override def shutdown(): Unit = {
   ```
   
   > Maybe we emit a different message when there is a controller role and that 
starts, and we emit the same message we emit now after we start the broker (if 
any). So it might look like this (Im typing this free-form, so ignore any typos 
-- it's just to get the point across) [...] And then in the system test code, 
where we check for the log message, we check for the CONTROLLER_STARTED_MESSAGE 
when the roles include the controller role -- otherwise we check for the 
standard message.
   
   Hmm, what's the advantage of checking for two different strings?
   
   It seems like if we have the broker emit "Kafka Server started broker" and 
have the controller emit "Kafka Server started controller" we can check for 
different messages if we want later, but still have simple code now.
   
   > Just realized that one potential problem with the log message approach is 
that the last broker in a 3-broker cluster with 3 co-located controllers dos 
not shut down cleanly. The current approach deals with this by sending SIGKILL 
instead of SIGTERM. The log message approach doesn't deal with it, so the 
shutdown in the system test would end up timing out after 60 seconds and then 
sending SIGKILL later as part of the cleanup (the test would still pass). Not 
sure if this adds credence to the current approach?
   
   I think you do need to special case this and send SIGKILL to the last n / 2 
(round down) combined nodes. But at least we can avoid special-casing startup, 
right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jsancio opened a new pull request #11005: MINOR: Print the cached broker epoch

2021-07-08 Thread GitBox


jsancio opened a new pull request #11005:
URL: https://github.com/apache/kafka/pull/11005


   When the broker epochs do not match make sure to print both broker
   epochs.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] cmccabe commented on a change in pull request #10997: MINOR: Fix flaky deleteSnapshots test

2021-07-08 Thread GitBox


cmccabe commented on a change in pull request #10997:
URL: https://github.com/apache/kafka/pull/10997#discussion_r666583174



##
File path: core/src/main/scala/kafka/raft/KafkaMetadataLog.scala
##
@@ -404,13 +404,13 @@ final class KafkaMetadataLog private (
   return false
 
 var didClean = false
-snapshots.keys.toSeq.sliding(2).toSeq.takeWhile {
+snapshots.keys.toSeq.sliding(2).foreach {
   case Seq(snapshot: OffsetAndEpoch, nextSnapshot: OffsetAndEpoch) =>
 if (predicate(snapshot) && deleteBeforeSnapshot(nextSnapshot)) {
   didClean = true
   true

Review comment:
   This "true" is not needed now, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (KAFKA-13008) Stream will stop processing data for a long time while waiting for the partition lag

2021-07-08 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377644#comment-17377644
 ] 

Colin McCabe edited comment on KAFKA-13008 at 7/8/21, 11:42 PM:


Thanks, [~showuon]... this is a good find.

[~ableegoldman] wrote:
bq. Surely if a partition was removed from the assignment and then added back, 
this should constitute a new 'session', and thus it should get the metadata 
again on assignment

Sessions may be changed without creating a new session. They wouldn't be much 
use otherwise, since many consumers often change their subscriptions or mute 
partitions, etc.

bq. If so, then maybe we should consider allowing metadata to remain around 
after a partition is unassigned, in case it gets this same partition back 
within the session? Could there be other consequences of this lack of metadata, 
outside of Streams?

The issue is that the metadata would be incorrect. If the fetch session doesn't 
contain a partition, we really can't say anything about what its lag is, other 
than potentially caching a stale value. But we don't know how long the 
partition could be muted (it could be hours, for example).

Ultimately there is a tradeoff here between having the most up-to-date lag 
information for each partition, and efficiency. I'm not totally sure what the 
best way to resolve this is (we'd have to look at the Streams use-case more 
carefully).


was (Author: cmccabe):
bq. Surely if a partition was removed from the assignment and then added back, 
this should constitute a new 'session', and thus it should get the metadata 
again on assignment

No, sessions may be changed without creating a new session. They wouldn't be 
much use otherwise, since many consumers often change their subscriptions or 
mute partitions, etc.

bq. If so, then maybe we should consider allowing metadata to remain around 
after a partition is unassigned, in case it gets this same partition back 
within the session? Could there be other consequences of this lack of metadata, 
outside of Streams?

The issue is that the metadata would be incorrect. If the fetch session doesn't 
contain a partition, we really can't say anything about what its lag is, other 
than potentially caching a stale value. But we don't know how long the 
partition could be muted (it could be hours, for example).

Ultimately there is a tradeoff here between having the most up-to-date lag 
information for each partition, and efficiency. I'm not totally sure what the 
best way to resolve this is (we'd have to look at the Streams use-case more 
carefully).

> Stream will stop processing data for a long time while waiting for the 
> partition lag
> 
>
> Key: KAFKA-13008
> URL: https://issues.apache.org/jira/browse/KAFKA-13008
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Luke Chen
>Priority: Major
> Attachments: image-2021-07-07-11-19-55-630.png
>
>
> In KIP-695, we improved the task idling mechanism by checking partition lag. 
> It's a good improvement for timestamp sync. But I found it will cause the 
> stream stop processing the data for a long time while waiting for the 
> partition metadata.
>  
> I've been investigating this case for a while, and figuring out the issue 
> will happen in below situation (or similar situation):
>  # start 2 streams (each with 1 thread) to consume from a topicA (with 3 
> partitions: A-0, A-1, A-2)
>  # After 2 streams started, the partitions assignment are: (I skipped some 
> other processing related partitions for simplicity)
>  stream1-thread1: A-0, A-1 
>  stream2-thread1: A-2
>  # start processing some data, assume now, the position and high watermark is:
>  A-0: offset: 2, highWM: 2
>  A-1: offset: 2, highWM: 2
>  A-2: offset: 2, highWM: 2
>  # Now, stream3 joined, so trigger rebalance with this assignment:
>  stream1-thread1: A-0 
>  stream2-thread1: A-2
>  stream3-thread1: A-1
>  # Suddenly, stream3 left, so now, rebalance again, with the step 2 
> assignment:
>  stream1-thread1: A-0, *A-1* 
>  stream2-thread1: A-2
>  (note: after initialization, the  position of A-1 will be: position: null, 
> highWM: null)
>  # Now, note that, the partition A-1 used to get assigned to stream1-thread1, 
> and now, it's back. And also, assume the partition A-1 has slow input (ex: 1 
> record per 30 mins), and partition A-0 has fast input (ex: 10K records / 
> sec). So, now, the stream1-thread1 won't process any data until we got input 
> from partition A-1 (even if partition A-0 is buffered a lot, and we have 
> `{{max.task.idle.ms}}` set to 0).
>  
> The reason why the stream1-thread1 won't process any data is because we can't 
> get the lag of partition A-1. And why we can't get the lag? It's because
>  # In 

[jira] [Commented] (KAFKA-13008) Stream will stop processing data for a long time while waiting for the partition lag

2021-07-08 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377644#comment-17377644
 ] 

Colin McCabe commented on KAFKA-13008:
--

> Surely if a partition was removed from the assignment and then added back, 
> this should constitute a new 'session', and thus it should get the metadata 
> again on assignment

No, sessions may be changed without creating a new session. They wouldn't be 
much use otherwise, since many consumers often change their subscriptions or 
mute partitions, etc.

> If so, then maybe we should consider allowing metadata to remain around after 
> a partition is unassigned, in case it gets this same partition back within 
> the session? Could there be other consequences of this lack of metadata, 
> outside of Streams?

The issue is that the metadata would be incorrect. If the fetch session doesn't 
contain a partition, we really can't say anything about what its lag is, other 
than potentially caching a stale value. But we don't know how long the 
partition could be muted (it could be hours, for example).

Ultimately there is a tradeoff here between having the most up-to-date lag 
information for each partition, and efficiency. I'm not totally sure what the 
best way to resolve this is (we'd have to look at the Streams use-case more 
carefully).

> Stream will stop processing data for a long time while waiting for the 
> partition lag
> 
>
> Key: KAFKA-13008
> URL: https://issues.apache.org/jira/browse/KAFKA-13008
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Luke Chen
>Priority: Major
> Attachments: image-2021-07-07-11-19-55-630.png
>
>
> In KIP-695, we improved the task idling mechanism by checking partition lag. 
> It's a good improvement for timestamp sync. But I found it will cause the 
> stream stop processing the data for a long time while waiting for the 
> partition metadata.
>  
> I've been investigating this case for a while, and figuring out the issue 
> will happen in below situation (or similar situation):
>  # start 2 streams (each with 1 thread) to consume from a topicA (with 3 
> partitions: A-0, A-1, A-2)
>  # After 2 streams started, the partitions assignment are: (I skipped some 
> other processing related partitions for simplicity)
>  stream1-thread1: A-0, A-1 
>  stream2-thread1: A-2
>  # start processing some data, assume now, the position and high watermark is:
>  A-0: offset: 2, highWM: 2
>  A-1: offset: 2, highWM: 2
>  A-2: offset: 2, highWM: 2
>  # Now, stream3 joined, so trigger rebalance with this assignment:
>  stream1-thread1: A-0 
>  stream2-thread1: A-2
>  stream3-thread1: A-1
>  # Suddenly, stream3 left, so now, rebalance again, with the step 2 
> assignment:
>  stream1-thread1: A-0, *A-1* 
>  stream2-thread1: A-2
>  (note: after initialization, the  position of A-1 will be: position: null, 
> highWM: null)
>  # Now, note that, the partition A-1 used to get assigned to stream1-thread1, 
> and now, it's back. And also, assume the partition A-1 has slow input (ex: 1 
> record per 30 mins), and partition A-0 has fast input (ex: 10K records / 
> sec). So, now, the stream1-thread1 won't process any data until we got input 
> from partition A-1 (even if partition A-0 is buffered a lot, and we have 
> `{{max.task.idle.ms}}` set to 0).
>  
> The reason why the stream1-thread1 won't process any data is because we can't 
> get the lag of partition A-1. And why we can't get the lag? It's because
>  # In KIP-695, we use consumer's cache to get the partition lag, to avoid 
> remote call
>  # The lag for a partition will be cleared if the assignment in this round 
> doesn't have this partition. check 
> [here|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java#L272].
>  So, in the above example, the metadata cache for partition A-1 will be 
> cleared in step 4, and re-initialized (to null) in step 5
>  # In KIP-227, we introduced a fetch session to have incremental fetch 
> request/response. That is, if the session existed, the client(consumer) will 
> get the update only when the fetched partition have update (ex: new data). 
> So, in the above case, the partition A-1 has slow input (ex: 1 record per 30 
> mins), it won't have update until next 30 mins, or wait for the fetch session 
> become inactive for (default) 2 mins to be evicted. Either case, the metadata 
> won't be updated for a while.
>  
> In KIP-695, if we don't get the partition lag, we can't determine the 
> partition data status to do timestamp sync, so we'll keep waiting and not 
> processing any data. That's why this issue will happen.
>  
> *Proposed solution:*
>  # If we don't get the current lag for a partition, or 

[jira] [Comment Edited] (KAFKA-13008) Stream will stop processing data for a long time while waiting for the partition lag

2021-07-08 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377644#comment-17377644
 ] 

Colin McCabe edited comment on KAFKA-13008 at 7/8/21, 11:41 PM:


bq. Surely if a partition was removed from the assignment and then added back, 
this should constitute a new 'session', and thus it should get the metadata 
again on assignment

No, sessions may be changed without creating a new session. They wouldn't be 
much use otherwise, since many consumers often change their subscriptions or 
mute partitions, etc.

bq. If so, then maybe we should consider allowing metadata to remain around 
after a partition is unassigned, in case it gets this same partition back 
within the session? Could there be other consequences of this lack of metadata, 
outside of Streams?

The issue is that the metadata would be incorrect. If the fetch session doesn't 
contain a partition, we really can't say anything about what its lag is, other 
than potentially caching a stale value. But we don't know how long the 
partition could be muted (it could be hours, for example).

Ultimately there is a tradeoff here between having the most up-to-date lag 
information for each partition, and efficiency. I'm not totally sure what the 
best way to resolve this is (we'd have to look at the Streams use-case more 
carefully).


was (Author: cmccabe):
> Surely if a partition was removed from the assignment and then added back, 
> this should constitute a new 'session', and thus it should get the metadata 
> again on assignment

No, sessions may be changed without creating a new session. They wouldn't be 
much use otherwise, since many consumers often change their subscriptions or 
mute partitions, etc.

> If so, then maybe we should consider allowing metadata to remain around after 
> a partition is unassigned, in case it gets this same partition back within 
> the session? Could there be other consequences of this lack of metadata, 
> outside of Streams?

The issue is that the metadata would be incorrect. If the fetch session doesn't 
contain a partition, we really can't say anything about what its lag is, other 
than potentially caching a stale value. But we don't know how long the 
partition could be muted (it could be hours, for example).

Ultimately there is a tradeoff here between having the most up-to-date lag 
information for each partition, and efficiency. I'm not totally sure what the 
best way to resolve this is (we'd have to look at the Streams use-case more 
carefully).

> Stream will stop processing data for a long time while waiting for the 
> partition lag
> 
>
> Key: KAFKA-13008
> URL: https://issues.apache.org/jira/browse/KAFKA-13008
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Luke Chen
>Priority: Major
> Attachments: image-2021-07-07-11-19-55-630.png
>
>
> In KIP-695, we improved the task idling mechanism by checking partition lag. 
> It's a good improvement for timestamp sync. But I found it will cause the 
> stream stop processing the data for a long time while waiting for the 
> partition metadata.
>  
> I've been investigating this case for a while, and figuring out the issue 
> will happen in below situation (or similar situation):
>  # start 2 streams (each with 1 thread) to consume from a topicA (with 3 
> partitions: A-0, A-1, A-2)
>  # After 2 streams started, the partitions assignment are: (I skipped some 
> other processing related partitions for simplicity)
>  stream1-thread1: A-0, A-1 
>  stream2-thread1: A-2
>  # start processing some data, assume now, the position and high watermark is:
>  A-0: offset: 2, highWM: 2
>  A-1: offset: 2, highWM: 2
>  A-2: offset: 2, highWM: 2
>  # Now, stream3 joined, so trigger rebalance with this assignment:
>  stream1-thread1: A-0 
>  stream2-thread1: A-2
>  stream3-thread1: A-1
>  # Suddenly, stream3 left, so now, rebalance again, with the step 2 
> assignment:
>  stream1-thread1: A-0, *A-1* 
>  stream2-thread1: A-2
>  (note: after initialization, the  position of A-1 will be: position: null, 
> highWM: null)
>  # Now, note that, the partition A-1 used to get assigned to stream1-thread1, 
> and now, it's back. And also, assume the partition A-1 has slow input (ex: 1 
> record per 30 mins), and partition A-0 has fast input (ex: 10K records / 
> sec). So, now, the stream1-thread1 won't process any data until we got input 
> from partition A-1 (even if partition A-0 is buffered a lot, and we have 
> `{{max.task.idle.ms}}` set to 0).
>  
> The reason why the stream1-thread1 won't process any data is because we can't 
> get the lag of partition A-1. And why we can't get the lag? It's because
>  # In KIP-695, we use consumer's cache to get the partition lag, to 

[GitHub] [kafka] xvrl commented on pull request #9404: KAFKA-10589 replica verification tool changes for KIP-629

2021-07-08 Thread GitBox


xvrl commented on pull request #9404:
URL: https://github.com/apache/kafka/pull/9404#issuecomment-876810546


   Hi @dongjinleekr, I believe @OmniaGM is taking care of this already, so I'll 
close this out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] xvrl closed pull request #9404: KAFKA-10589 replica verification tool changes for KIP-629

2021-07-08 Thread GitBox


xvrl closed pull request #9404:
URL: https://github.com/apache/kafka/pull/9404


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-12477) Smart rebalancing with dynamic protocol selection

2021-07-08 Thread A. Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377635#comment-17377635
 ] 

A. Sophie Blee-Goldman commented on KAFKA-12477:


No, we already decided to postpone this work to 3.1 so we can focus on some 
related things. Moved the Fix Version to 3.1.0

> Smart rebalancing with dynamic protocol selection
> -
>
> Key: KAFKA-12477
> URL: https://issues.apache.org/jira/browse/KAFKA-12477
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: A. Sophie Blee-Goldman
>Assignee: A. Sophie Blee-Goldman
>Priority: Major
> Fix For: 3.1.0
>
>
> Users who want to upgrade their applications and enable the COOPERATIVE 
> rebalancing protocol in their consumer apps are required to follow a double 
> rolling bounce upgrade path. The reason for this is laid out in the [Consumer 
> Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer]
>  section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing 
> protocol in its constructor based on the list of supported partition 
> assignors. The protocol is selected as the highest protocol that is commonly 
> supported by all assignors in the list, and never changes after that.
> This is a bit unfortunate because it may end up using an older protocol even 
> after every member in the group has been updated to support the newer 
> protocol. After the first rolling bounce of the upgrade, all members will 
> have two assignors: "cooperative-sticky" and "range" (or 
> sticky/round-robin/etc). At this point the EAGER protocol will still be 
> selected due to the presence of the "range" assignor, but it's the 
> "cooperative-sticky" assignor that will ultimately be selected for use in 
> rebalances if that assignor is preferred (ie positioned first in the list). 
> The only reason for the second rolling bounce is to strip off the "range" 
> assignor and allow the upgraded members to switch over to COOPERATIVE. We 
> can't allow them to use cooperative rebalancing until everyone has been 
> upgraded, but once they have it's safe to do so.
> And there is already a way for the client to detect that everyone is on the 
> new byte code: if the CooperativeStickyAssignor is selected by the group 
> coordinator, then that means it is supported by all consumers in the group 
> and therefore everyone must be upgraded. 
> We may be able to save the second rolling bounce by dynamically updating the 
> rebalancing protocol inside the ConsumerCoordinator as "the highest protocol 
> supported by the assignor chosen by the group coordinator". This means we'll 
> still be using EAGER at the first rebalance, since we of course need to wait 
> for this initial rebalance to get the response from the group coordinator. 
> But we should take the hint from the chosen assignor rather than dropping 
> this information on the floor and sticking with the original protocol.
> Concrete Proposal:
> This assumes we will change the default assignor to ["cooperative-sticky", 
> "range"] in KIP-726. It also acknowledges that users may attempt any kind of 
> upgrade without reading the docs, and so we need to put in safeguards against 
> data corruption rather than assume everyone will follow the safe upgrade path.
> With this proposal, 
> 1) New applications on 3.0 will enable cooperative rebalancing by default
> 2) Existing applications which don’t set an assignor can safely upgrade to 
> 3.0 using a single rolling bounce with no extra steps, and will automatically 
> transition to cooperative rebalancing
> 3) Existing applications which do set an assignor that uses EAGER can 
> likewise upgrade their applications to COOPERATIVE with a single rolling 
> bounce
> 4) Once on 3.0, applications can safely go back and forth between EAGER and 
> COOPERATIVE
> 5) Applications can safely downgrade away from 3.0
> The high-level idea for dynamic protocol upgrades is that the group will 
> leverage the assignor selected by the group coordinator to determine when 
> it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect the 
> group in case of rare events or user misconfiguration. The group coordinator 
> selects the most preferred assignor that’s supported by all members of the 
> group, so we know that all members will support COOPERATIVE once we receive 
> the “cooperative-sticky” assignor after a rebalance. At this point, each 
> member can upgrade their own protocol to COOPERATIVE. However, there may be 
> situations in which an EAGER member may join the group even after upgrading 
> to COOPERATIVE. For example, during a rolling upgrade if the last remaining 
> member on 

[jira] [Updated] (KAFKA-12477) Smart rebalancing with dynamic protocol selection

2021-07-08 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman updated KAFKA-12477:
---
Fix Version/s: (was: 3.0.0)
   3.1.0

> Smart rebalancing with dynamic protocol selection
> -
>
> Key: KAFKA-12477
> URL: https://issues.apache.org/jira/browse/KAFKA-12477
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: A. Sophie Blee-Goldman
>Assignee: A. Sophie Blee-Goldman
>Priority: Major
> Fix For: 3.1.0
>
>
> Users who want to upgrade their applications and enable the COOPERATIVE 
> rebalancing protocol in their consumer apps are required to follow a double 
> rolling bounce upgrade path. The reason for this is laid out in the [Consumer 
> Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer]
>  section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing 
> protocol in its constructor based on the list of supported partition 
> assignors. The protocol is selected as the highest protocol that is commonly 
> supported by all assignors in the list, and never changes after that.
> This is a bit unfortunate because it may end up using an older protocol even 
> after every member in the group has been updated to support the newer 
> protocol. After the first rolling bounce of the upgrade, all members will 
> have two assignors: "cooperative-sticky" and "range" (or 
> sticky/round-robin/etc). At this point the EAGER protocol will still be 
> selected due to the presence of the "range" assignor, but it's the 
> "cooperative-sticky" assignor that will ultimately be selected for use in 
> rebalances if that assignor is preferred (ie positioned first in the list). 
> The only reason for the second rolling bounce is to strip off the "range" 
> assignor and allow the upgraded members to switch over to COOPERATIVE. We 
> can't allow them to use cooperative rebalancing until everyone has been 
> upgraded, but once they have it's safe to do so.
> And there is already a way for the client to detect that everyone is on the 
> new byte code: if the CooperativeStickyAssignor is selected by the group 
> coordinator, then that means it is supported by all consumers in the group 
> and therefore everyone must be upgraded. 
> We may be able to save the second rolling bounce by dynamically updating the 
> rebalancing protocol inside the ConsumerCoordinator as "the highest protocol 
> supported by the assignor chosen by the group coordinator". This means we'll 
> still be using EAGER at the first rebalance, since we of course need to wait 
> for this initial rebalance to get the response from the group coordinator. 
> But we should take the hint from the chosen assignor rather than dropping 
> this information on the floor and sticking with the original protocol.
> Concrete Proposal:
> This assumes we will change the default assignor to ["cooperative-sticky", 
> "range"] in KIP-726. It also acknowledges that users may attempt any kind of 
> upgrade without reading the docs, and so we need to put in safeguards against 
> data corruption rather than assume everyone will follow the safe upgrade path.
> With this proposal, 
> 1) New applications on 3.0 will enable cooperative rebalancing by default
> 2) Existing applications which don’t set an assignor can safely upgrade to 
> 3.0 using a single rolling bounce with no extra steps, and will automatically 
> transition to cooperative rebalancing
> 3) Existing applications which do set an assignor that uses EAGER can 
> likewise upgrade their applications to COOPERATIVE with a single rolling 
> bounce
> 4) Once on 3.0, applications can safely go back and forth between EAGER and 
> COOPERATIVE
> 5) Applications can safely downgrade away from 3.0
> The high-level idea for dynamic protocol upgrades is that the group will 
> leverage the assignor selected by the group coordinator to determine when 
> it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect the 
> group in case of rare events or user misconfiguration. The group coordinator 
> selects the most preferred assignor that’s supported by all members of the 
> group, so we know that all members will support COOPERATIVE once we receive 
> the “cooperative-sticky” assignor after a rebalance. At this point, each 
> member can upgrade their own protocol to COOPERATIVE. However, there may be 
> situations in which an EAGER member may join the group even after upgrading 
> to COOPERATIVE. For example, during a rolling upgrade if the last remaining 
> member on the old bytecode misses a rebalance, the other members will be 
> allowed to upgrade to COOPERATIVE. If 

[jira] [Commented] (KAFKA-12884) Remove "--zookeeper" in system tests

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377634#comment-17377634
 ] 

Konstantine Karantasis commented on KAFKA-12884:


[~showuon] will this make it for 3.0 ? The parent issue targets 3.0. If not I'd 
like to postpone this fix

> Remove "--zookeeper" in system tests
> 
>
> Key: KAFKA-12884
> URL: https://issues.apache.org/jira/browse/KAFKA-12884
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
>
> Have a quick scan, found currently, we did use "–zookeeper" option for some 
> cases. Need to re-visit them to see if they need to removed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12884) Remove "--zookeeper" in system tests

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12884:
---
Priority: Blocker  (was: Major)

> Remove "--zookeeper" in system tests
> 
>
> Key: KAFKA-12884
> URL: https://issues.apache.org/jira/browse/KAFKA-12884
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
>
> Have a quick scan, found currently, we did use "–zookeeper" option for some 
> cases. Need to re-visit them to see if they need to removed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12884) Remove "--zookeeper" in system tests

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12884:
---
Fix Version/s: 3.0.0

> Remove "--zookeeper" in system tests
> 
>
> Key: KAFKA-12884
> URL: https://issues.apache.org/jira/browse/KAFKA-12884
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Have a quick scan, found currently, we did use "–zookeeper" option for some 
> cases. Need to re-visit them to see if they need to removed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12588) Remove deprecated --zookeeper in shell commands

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12588:
---
Priority: Blocker  (was: Major)

> Remove deprecated --zookeeper in shell commands
> ---
>
> Key: KAFKA-12588
> URL: https://issues.apache.org/jira/browse/KAFKA-12588
> Project: Kafka
>  Issue Type: Task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
>
> At first check, there are still 4 commands existing *--zookeeper* option. 
> Should be removed in V3.0.0
>  
> _preferredReplicaLeaderElectionCommand_
> _ConfigCommand_
> _ReassignPartitionsCommand_
> _TopicCommand_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10091) Improve task idling

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377633#comment-17377633
 ] 

Konstantine Karantasis commented on KAFKA-10091:


[~vvcephei] confirmed offline that we are only waiting for docs changes. 
Marking as blocker in the meantime. Please mark as resolved once all the 
relevant PRs are cherry-picked to 3.0

> Improve task idling
> ---
>
> Key: KAFKA-10091
> URL: https://issues.apache.org/jira/browse/KAFKA-10091
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> When Streams is processing a task with multiple inputs, each time it is ready 
> to process a record, it has to choose which input to process next. It always 
> takes from the input for which the next record has the least timestamp. The 
> result of this is that Streams processes data in timestamp order. However, if 
> the buffer for one of the inputs is empty, Streams doesn't know what 
> timestamp the next record for that input will be.
> Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address 
> this issue.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization]
> The config allows Streams to wait some amount of time for data to arrive on 
> the empty input, so that it can make a timestamp-ordered decision about which 
> input to pull from next.
> However, this config can be hard to use reliably and efficiently, since what 
> we're really waiting for is the next poll that _would_ return data from the 
> empty input's partition, and this guarantee is a function of the poll 
> interval, the max poll interval, and the internal logic that governs when 
> Streams will poll again.
> The ideal case is you'd be able to guarantee at a minimum that _any_ amount 
> of idling would guarantee you poll data from the empty partition if there's 
> data to fetch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7632) Support Compression Level

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377632#comment-17377632
 ] 

Konstantine Karantasis commented on KAFKA-7632:
---

The PR was rebased by [~dongjin]. Marking as blocker in case we can get an 
approval very soon. If not I'll postpone it for the next release. 

> Support Compression Level
> -
>
> Key: KAFKA-7632
> URL: https://issues.apache.org/jira/browse/KAFKA-7632
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 2.1.0
> Environment: all
>Reporter: Dave Waters
>Assignee: Dongjin Lee
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> The compression level for ZSTD is currently set to use the default level (3), 
> which is a conservative setting that in some use cases eliminates the value 
> that ZSTD provides with improved compression. Each use case will vary, so 
> exposing the level as a broker configuration setting will allow the user to 
> adjust the level.
> Since it applies to the other compression codecs, we should add the same 
> functionalities to them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7632) Support Compression Level

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-7632:
--
Priority: Blocker  (was: Major)

> Support Compression Level
> -
>
> Key: KAFKA-7632
> URL: https://issues.apache.org/jira/browse/KAFKA-7632
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 2.1.0
> Environment: all
>Reporter: Dave Waters
>Assignee: Dongjin Lee
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> The compression level for ZSTD is currently set to use the default level (3), 
> which is a conservative setting that in some use cases eliminates the value 
> that ZSTD provides with improved compression. Each use case will vary, so 
> exposing the level as a broker configuration setting will allow the user to 
> adjust the level.
> Since it applies to the other compression codecs, we should add the same 
> functionalities to them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10091) Improve task idling

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10091:
---
Priority: Blocker  (was: Major)

> Improve task idling
> ---
>
> Key: KAFKA-10091
> URL: https://issues.apache.org/jira/browse/KAFKA-10091
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> When Streams is processing a task with multiple inputs, each time it is ready 
> to process a record, it has to choose which input to process next. It always 
> takes from the input for which the next record has the least timestamp. The 
> result of this is that Streams processes data in timestamp order. However, if 
> the buffer for one of the inputs is empty, Streams doesn't know what 
> timestamp the next record for that input will be.
> Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address 
> this issue.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization]
> The config allows Streams to wait some amount of time for data to arrive on 
> the empty input, so that it can make a timestamp-ordered decision about which 
> input to pull from next.
> However, this config can be hard to use reliably and efficiently, since what 
> we're really waiting for is the next poll that _would_ return data from the 
> empty input's partition, and this guarantee is a function of the poll 
> interval, the max poll interval, and the internal logic that governs when 
> Streams will poll again.
> The ideal case is you'd be able to guarantee at a minimum that _any_ amount 
> of idling would guarantee you poll data from the empty partition if there's 
> data to fetch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10292) fix flaky streams/streams_broker_bounce_test.py

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-10292.

Resolution: Fixed

> fix flaky streams/streams_broker_bounce_test.py
> ---
>
> Key: KAFKA-10292
> URL: https://issues.apache.org/jira/browse/KAFKA-10292
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, system tests
>Reporter: Chia-Ping Tsai
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 3.0.0
>
>
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "leader",
>   "failure_mode": "clean_bounce",
>   "num_threads": 1,
>   "sleep_time_secs": 120
> \}
> {quote}
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "controller",
>   "failure_mode": "hard_shutdown",
>   "num_threads": 3,
>   "sleep_time_secs": 120
> \}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10292) fix flaky streams/streams_broker_bounce_test.py

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377631#comment-17377631
 ] 

Konstantine Karantasis commented on KAFKA-10292:


PR has been merged in time for 3.0. Not sure if the issue still persists. Will 
close for now. 

> fix flaky streams/streams_broker_bounce_test.py
> ---
>
> Key: KAFKA-10292
> URL: https://issues.apache.org/jira/browse/KAFKA-10292
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, system tests
>Reporter: Chia-Ping Tsai
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 3.0.0
>
>
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "leader",
>   "failure_mode": "clean_bounce",
>   "num_threads": 1,
>   "sleep_time_secs": 120
> \}
> {quote}
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "controller",
>   "failure_mode": "hard_shutdown",
>   "num_threads": 3,
>   "sleep_time_secs": 120
> \}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-5905) Remove PrincipalBuilder and DefaultPrincipalBuilder

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-5905:
--
Priority: Blocker  (was: Major)

> Remove PrincipalBuilder and DefaultPrincipalBuilder
> ---
>
> Key: KAFKA-5905
> URL: https://issues.apache.org/jira/browse/KAFKA-5905
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Manikumar
>Priority: Blocker
> Fix For: 3.0.0
>
>
> These classes were deprecated after KIP-189: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-189%3A+Improve+principal+builder+interface+and+add+support+for+SASL,
>  which is part of 1.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5905) Remove PrincipalBuilder and DefaultPrincipalBuilder

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377629#comment-17377629
 ] 

Konstantine Karantasis commented on KAFKA-5905:
---

[~omkreddy] [~hachikuji] [~ijuma] if this issue needs to get into 3.0 we'll 
need to merge the PR very soon. I see the PR is still open. 
Can you confirm if this is a blocker for 3.0 ? Marking it as such until I get 
confirmation from one of you. Thanks

> Remove PrincipalBuilder and DefaultPrincipalBuilder
> ---
>
> Key: KAFKA-5905
> URL: https://issues.apache.org/jira/browse/KAFKA-5905
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Manikumar
>Priority: Major
> Fix For: 3.0.0
>
>
> These classes were deprecated after KIP-189: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-189%3A+Improve+principal+builder+interface+and+add+support+for+SASL,
>  which is part of 1.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12477) Smart rebalancing with dynamic protocol selection

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377622#comment-17377622
 ] 

Konstantine Karantasis commented on KAFKA-12477:


[~ableegoldman] [~guozhang]  is this issue a blocker for 3.0 ? If not I'd like 
to postpone it to the next release

> Smart rebalancing with dynamic protocol selection
> -
>
> Key: KAFKA-12477
> URL: https://issues.apache.org/jira/browse/KAFKA-12477
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: A. Sophie Blee-Goldman
>Assignee: A. Sophie Blee-Goldman
>Priority: Major
> Fix For: 3.0.0
>
>
> Users who want to upgrade their applications and enable the COOPERATIVE 
> rebalancing protocol in their consumer apps are required to follow a double 
> rolling bounce upgrade path. The reason for this is laid out in the [Consumer 
> Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer]
>  section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing 
> protocol in its constructor based on the list of supported partition 
> assignors. The protocol is selected as the highest protocol that is commonly 
> supported by all assignors in the list, and never changes after that.
> This is a bit unfortunate because it may end up using an older protocol even 
> after every member in the group has been updated to support the newer 
> protocol. After the first rolling bounce of the upgrade, all members will 
> have two assignors: "cooperative-sticky" and "range" (or 
> sticky/round-robin/etc). At this point the EAGER protocol will still be 
> selected due to the presence of the "range" assignor, but it's the 
> "cooperative-sticky" assignor that will ultimately be selected for use in 
> rebalances if that assignor is preferred (ie positioned first in the list). 
> The only reason for the second rolling bounce is to strip off the "range" 
> assignor and allow the upgraded members to switch over to COOPERATIVE. We 
> can't allow them to use cooperative rebalancing until everyone has been 
> upgraded, but once they have it's safe to do so.
> And there is already a way for the client to detect that everyone is on the 
> new byte code: if the CooperativeStickyAssignor is selected by the group 
> coordinator, then that means it is supported by all consumers in the group 
> and therefore everyone must be upgraded. 
> We may be able to save the second rolling bounce by dynamically updating the 
> rebalancing protocol inside the ConsumerCoordinator as "the highest protocol 
> supported by the assignor chosen by the group coordinator". This means we'll 
> still be using EAGER at the first rebalance, since we of course need to wait 
> for this initial rebalance to get the response from the group coordinator. 
> But we should take the hint from the chosen assignor rather than dropping 
> this information on the floor and sticking with the original protocol.
> Concrete Proposal:
> This assumes we will change the default assignor to ["cooperative-sticky", 
> "range"] in KIP-726. It also acknowledges that users may attempt any kind of 
> upgrade without reading the docs, and so we need to put in safeguards against 
> data corruption rather than assume everyone will follow the safe upgrade path.
> With this proposal, 
> 1) New applications on 3.0 will enable cooperative rebalancing by default
> 2) Existing applications which don’t set an assignor can safely upgrade to 
> 3.0 using a single rolling bounce with no extra steps, and will automatically 
> transition to cooperative rebalancing
> 3) Existing applications which do set an assignor that uses EAGER can 
> likewise upgrade their applications to COOPERATIVE with a single rolling 
> bounce
> 4) Once on 3.0, applications can safely go back and forth between EAGER and 
> COOPERATIVE
> 5) Applications can safely downgrade away from 3.0
> The high-level idea for dynamic protocol upgrades is that the group will 
> leverage the assignor selected by the group coordinator to determine when 
> it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect the 
> group in case of rare events or user misconfiguration. The group coordinator 
> selects the most preferred assignor that’s supported by all members of the 
> group, so we know that all members will support COOPERATIVE once we receive 
> the “cooperative-sticky” assignor after a rebalance. At this point, each 
> member can upgrade their own protocol to COOPERATIVE. However, there may be 
> situations in which an EAGER member may join the group even after upgrading 
> to COOPERATIVE. For example, during a rolling upgrade if the last remaining 
> member on the old 

[jira] [Commented] (KAFKA-10345) Add auto reloading for trust/key store paths

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377621#comment-17377621
 ] 

Konstantine Karantasis commented on KAFKA-10345:


[~ableegoldman] [~guozhang] is this issue a blocker for 3.0 ? If not I'd like 
to postpone it to the next release

> Add auto reloading for trust/key store paths
> 
>
> Key: KAFKA-10345
> URL: https://issues.apache.org/jira/browse/KAFKA-10345
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 3.0.0
>
>
> With forwarding enabled, per-broker alter-config doesn't go to the target 
> broker anymore, we need to have a mechanism to propagate in-place file update 
> through file watch and time-based reloading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9705) Zookeeper mutation protocols should be redirected to Controller only

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377620#comment-17377620
 ] 

Konstantine Karantasis commented on KAFKA-9705:
---

cc [~bchen225242]

> Zookeeper mutation protocols should be redirected to Controller only
> 
>
> Key: KAFKA-9705
> URL: https://issues.apache.org/jira/browse/KAFKA-9705
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 3.0.0
>
>
> In the bridge release, we need to restrict the direct access of ZK to 
> controller only. This means the existing AlterConfig path should be migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9705) Zookeeper mutation protocols should be redirected to Controller only

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377619#comment-17377619
 ] 

Konstantine Karantasis commented on KAFKA-9705:
---

[~cmccabe] do you mind confirming if all the subtasks required for 3.0 are in 
at this time? I'd like to move the target release for the ones that are not 
blockers for 3.0

> Zookeeper mutation protocols should be redirected to Controller only
> 
>
> Key: KAFKA-9705
> URL: https://issues.apache.org/jira/browse/KAFKA-9705
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 3.0.0
>
>
> In the bridge release, we need to restrict the direct access of ZK to 
> controller only. This means the existing AlterConfig path should be migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9295) KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable

2021-07-08 Thread A. Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377616#comment-17377616
 ] 

A. Sophie Blee-Goldman commented on KAFKA-9295:
---

[~kkonstantine] Luke was able to figure out the likely root cause and it 
actually does seem to be a potentially severe bug. See 
https://issues.apache.org/jira/browse/KAFKA-13008

> KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable
> --
>
> Key: KAFKA-9295
> URL: https://issues.apache.org/jira/browse/KAFKA-9295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.4.0, 2.6.0
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/27106/testReport/junit/org.apache.kafka.streams.integration/KTableKTableForeignKeyInnerJoinMultiIntegrationTest/shouldInnerJoinMultiPartitionQueryable/]
> {quote}java.lang.AssertionError: Did not receive all 1 records from topic 
> output- within 6 ms Expected: is a value equal to or greater than <1> 
> but: <0> was less than <1> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:515)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:511)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:489)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.verifyKTableKTableJoin(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:200)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:183){quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9295) KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-9295:
--
Priority: Blocker  (was: Critical)

> KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable
> --
>
> Key: KAFKA-9295
> URL: https://issues.apache.org/jira/browse/KAFKA-9295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.4.0, 2.6.0
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/27106/testReport/junit/org.apache.kafka.streams.integration/KTableKTableForeignKeyInnerJoinMultiIntegrationTest/shouldInnerJoinMultiPartitionQueryable/]
> {quote}java.lang.AssertionError: Did not receive all 1 records from topic 
> output- within 6 ms Expected: is a value equal to or greater than <1> 
> but: <0> was less than <1> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:515)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:511)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:489)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.verifyKTableKTableJoin(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:200)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:183){quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9295) KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377614#comment-17377614
 ] 

Konstantine Karantasis commented on KAFKA-9295:
---

I see it's a flaky test (missed this somehow). Have you noticed it still 
failing? Any chance we can investigate before code freeze for 3.0? Thanks

> KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable
> --
>
> Key: KAFKA-9295
> URL: https://issues.apache.org/jira/browse/KAFKA-9295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.4.0, 2.6.0
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Critical
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/27106/testReport/junit/org.apache.kafka.streams.integration/KTableKTableForeignKeyInnerJoinMultiIntegrationTest/shouldInnerJoinMultiPartitionQueryable/]
> {quote}java.lang.AssertionError: Did not receive all 1 records from topic 
> output- within 6 ms Expected: is a value equal to or greater than <1> 
> but: <0> was less than <1> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:515)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:511)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:489)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.verifyKTableKTableJoin(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:200)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:183){quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9295) KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377613#comment-17377613
 ] 

Konstantine Karantasis commented on KAFKA-9295:
---

[~showuon] [~ableegoldman] is this reopened issue a blocker for 3.0? I see all 
3 known PRs are merged but this has been reopened. I'd suggest let's postpone 
if we can't get in before code freeze. 

> KTableKTableForeignKeyInnerJoinMultiIntegrationTest#shouldInnerJoinMultiPartitionQueryable
> --
>
> Key: KAFKA-9295
> URL: https://issues.apache.org/jira/browse/KAFKA-9295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.4.0, 2.6.0
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Critical
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/27106/testReport/junit/org.apache.kafka.streams.integration/KTableKTableForeignKeyInnerJoinMultiIntegrationTest/shouldInnerJoinMultiPartitionQueryable/]
> {quote}java.lang.AssertionError: Did not receive all 1 records from topic 
> output- within 6 ms Expected: is a value equal to or greater than <1> 
> but: <0> was less than <1> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:515)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:511)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:489)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.verifyKTableKTableJoin(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:200)
>  at 
> org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable(KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java:183){quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12598) Remove deprecated --zookeeper in ConfigCommand

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377612#comment-17377612
 ] 

Konstantine Karantasis commented on KAFKA-12598:


Marking this issue as Blocker and keeping the target to 3.0. The PR will need 
to get merged asap if this still needs to get into AK 3.0
Otherwise, please let me know if we should postpone

> Remove deprecated --zookeeper in ConfigCommand
> --
>
> Key: KAFKA-12598
> URL: https://issues.apache.org/jira/browse/KAFKA-12598
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12598) Remove deprecated --zookeeper in ConfigCommand

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-12598:
---
Priority: Blocker  (was: Critical)

> Remove deprecated --zookeeper in ConfigCommand
> --
>
> Key: KAFKA-12598
> URL: https://issues.apache.org/jira/browse/KAFKA-12598
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] vvcephei commented on pull request #10661: MINOR: upgrade pip from 20.2.2 to 21.1.1

2021-07-08 Thread GitBox


vvcephei commented on pull request #10661:
URL: https://github.com/apache/kafka/pull/10661#issuecomment-876783256


   Ah, and now I noticed that we've added a message to this effect in the 
script today: https://github.com/apache/kafka/pull/10995
   
   Carry on :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] vvcephei commented on pull request #10661: MINOR: upgrade pip from 20.2.2 to 21.1.1

2021-07-08 Thread GitBox


vvcephei commented on pull request #10661:
URL: https://github.com/apache/kafka/pull/10661#issuecomment-876782447


   Hmm, I also hit that same error today.
   
   It seems to have been caused because the image was running Debian 9, and the 
latest version of `python3` available in Debian 9 is 3.5, which only supports 
pip up to version 20. I confirmed this was the issue by getting a terminal on 
top of the errored image, downloading and compiling python 3.6, and running 
`python3.6 -m pip install -U pip=21.1.1`.
   
   After running `docker system prune -a`, I see it's running Debian 10:
   ```
   root@826557ee9546:/# cat /etc/issue
   Debian GNU/Linux 10 \n \l
   ```
   
   Building on the recent discussion here, maybe what's happening is that we 
had an older cached version of the openjdk:8 image, which limited the python 
version to 3.5, which in turn make it impossible to upgrade to pip 21.
   
   Just wanted to provide more context. I assume others will hit the same 
thing. The solution is to run `docker system prune -a`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] kkonstantine commented on pull request #10826: KAFKA-7632: Support Compression Level

2021-07-08 Thread GitBox


kkonstantine commented on pull request #10826:
URL: https://github.com/apache/kafka/pull/10826#issuecomment-876779515


   @tombentley do you think you'll be able to finish the review soon? If we had 
an approval I'd be willing to make an exception and consider this feature for 
inclusion to AK 3.0. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-10733) Enforce exception thrown for KafkaProducer txn APIs

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377611#comment-17377611
 ] 

Konstantine Karantasis commented on KAFKA-10733:


Thanks for the update [~bchen225242]. Moved the target release to the next one. 

> Enforce exception thrown for KafkaProducer txn APIs
> ---
>
> Key: KAFKA-10733
> URL: https://issues.apache.org/jira/browse/KAFKA-10733
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer , streams
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: need-kip
> Fix For: 3.1.0
>
>
> In general, KafkaProducer could throw both fatal and non-fatal errors as 
> KafkaException, which makes the exception catching hard. Furthermore, not 
> every single fatal exception (checked) is marked on the function signature 
> yet as of 2.7.
> We should have a general supporting strategy in long term for this matter, as 
> whether to declare all non-fatal exceptions as wrapped KafkaException while 
> extracting all fatal ones, or just add a flag to KafkaException indicating 
> fatal or not, to maintain binary compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10733) Enforce exception thrown for KafkaProducer txn APIs

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-10733:
---
Fix Version/s: (was: 3.0.0)
   3.1.0

> Enforce exception thrown for KafkaProducer txn APIs
> ---
>
> Key: KAFKA-10733
> URL: https://issues.apache.org/jira/browse/KAFKA-10733
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer , streams
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: need-kip
> Fix For: 3.1.0
>
>
> In general, KafkaProducer could throw both fatal and non-fatal errors as 
> KafkaException, which makes the exception catching hard. Furthermore, not 
> every single fatal exception (checked) is marked on the function signature 
> yet as of 2.7.
> We should have a general supporting strategy in long term for this matter, as 
> whether to declare all non-fatal exceptions as wrapped KafkaException while 
> extracting all fatal ones, or just add a flag to KafkaException indicating 
> fatal or not, to maintain binary compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] emveee edited a comment on pull request #7898: KAFKA-9366: Change log4j dependency into log4j2

2021-07-08 Thread GitBox


emveee edited a comment on pull request #7898:
URL: https://github.com/apache/kafka/pull/7898#issuecomment-876773034


   > @priyavj08 I'm sorry that I can't be certain. But as far as I know, any 
project with log4j 1.2.7 is not safe. (It is why I have been working on this 
issue.)
   > 
   > +1. I also released 2.6.1 backport.
   > 
   > * working branch: 
[kafka-2.6+log4j2](https://github.com/dongjinleekr/kafka/tree/kafka-2.6%2Blog4j2)
   > * custom build distribution: 
[kafka_2.13-2.6.1+log4j2-0.tgz](https://drive.google.com/file/d/1PfFaUj0UAN9CpfBN52BCBF-W0UNB3EWT/view?usp=sharing)
   > * 2.6.1 patch: 
[kafka-2.6.1+log4j2-0.patch](https://drive.google.com/file/d/1KQgRqYtLaL65lrHz4SiuSiZ7FOAQ4Xxv/view?usp=sharing)
   
   @dongjinleekr Do you have a patch that will work with 2.8.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-6718:
--
Fix Version/s: (was: 3.0.0)
   3.1.0

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Assignee: Levani Kokhreidze
>Priority: Major
>  Labels: kip
> Fix For: 3.1.0
>
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] emveee commented on pull request #7898: KAFKA-9366: Change log4j dependency into log4j2

2021-07-08 Thread GitBox


emveee commented on pull request #7898:
URL: https://github.com/apache/kafka/pull/7898#issuecomment-876773034


   > @priyavj08 I'm sorry that I can't be certain. But as far as I know, any 
project with log4j 1.2.7 is not safe. (It is why I have been working on this 
issue.)
   > 
   > +1. I also released 2.6.1 backport.
   > 
   > * working branch: 
[kafka-2.6+log4j2](https://github.com/dongjinleekr/kafka/tree/kafka-2.6%2Blog4j2)
   > * custom build distribution: 
[kafka_2.13-2.6.1+log4j2-0.tgz](https://drive.google.com/file/d/1PfFaUj0UAN9CpfBN52BCBF-W0UNB3EWT/view?usp=sharing)
   > * 2.6.1 patch: 
[kafka-2.6.1+log4j2-0.patch](https://drive.google.com/file/d/1KQgRqYtLaL65lrHz4SiuSiZ7FOAQ4Xxv/view?usp=sharing)
   
   Do you have a patch that will work with 2.8.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2021-07-08 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377607#comment-17377607
 ] 

Konstantine Karantasis commented on KAFKA-6718:
---

The main PR for this feature hasn't been approved on time for AK 3.0 feature 
freeze. I'm moving the target release to the next one. 

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Assignee: Levani Kokhreidze
>Priority: Major
>  Labels: kip
> Fix For: 3.0.0
>
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (KAFKA-12362) Determine if a Task is idling

2021-07-08 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis closed KAFKA-12362.
--

> Determine if a Task is idling
> -
>
> Key: KAFKA-12362
> URL: https://issues.apache.org/jira/browse/KAFKA-12362
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Walker Carlson
>Priority: Major
> Fix For: 3.0.0
>
>
> determine if a task is idling given the task Id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-13054) Unexpected Polling Behavior

2021-07-08 Thread Gary Russell (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377606#comment-17377606
 ] 

Gary Russell edited comment on KAFKA-13054 at 7/8/21, 9:55 PM:
---

Yes, after increasing the partitions, I get 2000 on each poll and 1000 on the 
last.


was (Author: grussell):
Yes, after increasing the partitions, I get 2000 on each poll and 1000 on the 
last; you can close this.

> Unexpected Polling Behavior
> ---
>
> Key: KAFKA-13054
> URL: https://issues.apache.org/jira/browse/KAFKA-13054
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Gary Russell
>Priority: Major
>
> Given a topic with 10 partitions and 9000 records (evenly distributed) and
> {noformat}
> max.poll.records: 2000
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 1
> fetch.min.bytes = 1024
> {noformat}
> We see odd results
> {code:java}
> log.info("" + recs.count() + "\n"
>   + recs.partitions().stream()
>   .map(part -> "" + part.partition() 
>   + "(" + recs.records(part).size() + ")")
>   .collect(Collectors.toList()));
> {code}
> Result:
> {noformat}
> 2021-07-08 15:04:11.131  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 2000
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 
> 6(191)]
> 2021-07-08 15:04:11.137  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 10
> [6(10)]
> 2021-07-08 15:04:21.170  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1809
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
> 2021-07-08 15:04:21.214  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 2000
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 
> 6(191)]
> 2021-07-08 15:04:21.215  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 10
> [6(10)]
> 2021-07-08 15:04:31.248  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1809
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
> 2021-07-08 15:04:41.267  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1083
> [1(27), 0(87), 5(189), 4(93), 3(114), 2(129), 9(108), 8(93), 7(42), 6(201)]
> 2021-07-08 15:04:51.276  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 201
> [6(201)]
> 2021-07-08 15:05:01.279  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 78
> [6(78)]
> {noformat}
> I can understand the second poll returning immediately (presumably over-fetch 
> from the previous poll) but why does the third poll take 10 seconds 
> (fetch.max.wait)? The fourth and fifth polls are like the first and second 
> and return almost immediately, but the sixth once again takes 10 seconds and 
> returns a partial result.
> Is there a document that explains the fetch mechanism that might explain this 
> behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13054) Unexpected Polling Behavior

2021-07-08 Thread Gary Russell (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377606#comment-17377606
 ] 

Gary Russell commented on KAFKA-13054:
--

Yes, after increasing the partitions, I get 2000 on each poll and 1000 on the 
last; you can close this.

> Unexpected Polling Behavior
> ---
>
> Key: KAFKA-13054
> URL: https://issues.apache.org/jira/browse/KAFKA-13054
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Gary Russell
>Priority: Major
>
> Given a topic with 10 partitions and 9000 records (evenly distributed) and
> {noformat}
> max.poll.records: 2000
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 1
> fetch.min.bytes = 1024
> {noformat}
> We see odd results
> {code:java}
> log.info("" + recs.count() + "\n"
>   + recs.partitions().stream()
>   .map(part -> "" + part.partition() 
>   + "(" + recs.records(part).size() + ")")
>   .collect(Collectors.toList()));
> {code}
> Result:
> {noformat}
> 2021-07-08 15:04:11.131  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 2000
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 
> 6(191)]
> 2021-07-08 15:04:11.137  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 10
> [6(10)]
> 2021-07-08 15:04:21.170  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1809
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
> 2021-07-08 15:04:21.214  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 2000
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 
> 6(191)]
> 2021-07-08 15:04:21.215  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 10
> [6(10)]
> 2021-07-08 15:04:31.248  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1809
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
> 2021-07-08 15:04:41.267  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1083
> [1(27), 0(87), 5(189), 4(93), 3(114), 2(129), 9(108), 8(93), 7(42), 6(201)]
> 2021-07-08 15:04:51.276  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 201
> [6(201)]
> 2021-07-08 15:05:01.279  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 78
> [6(78)]
> {noformat}
> I can understand the second poll returning immediately (presumably over-fetch 
> from the previous poll) but why does the third poll take 10 seconds 
> (fetch.max.wait)? The fourth and fifth polls are like the first and second 
> and return almost immediately, but the sixth once again takes 10 seconds and 
> returns a partial result.
> Is there a document that explains the fetch mechanism that might explain this 
> behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13054) Unexpected Polling Behavior

2021-07-08 Thread Gary Russell (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377600#comment-17377600
 ] 

Gary Russell commented on KAFKA-13054:
--

I think I see a pattern; the 3rd poll does not fetch any records from the 
partition that had over-fetches last time (always 6 here).

> Unexpected Polling Behavior
> ---
>
> Key: KAFKA-13054
> URL: https://issues.apache.org/jira/browse/KAFKA-13054
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Gary Russell
>Priority: Major
>
> Given a topic with 10 partitions and 9000 records (evenly distributed) and
> {noformat}
> max.poll.records: 2000
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 1
> fetch.min.bytes = 1024
> {noformat}
> We see odd results
> {code:java}
> log.info("" + recs.count() + "\n"
>   + recs.partitions().stream()
>   .map(part -> "" + part.partition() 
>   + "(" + recs.records(part).size() + ")")
>   .collect(Collectors.toList()));
> {code}
> Result:
> {noformat}
> 2021-07-08 15:04:11.131  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 2000
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 
> 6(191)]
> 2021-07-08 15:04:11.137  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 10
> [6(10)]
> 2021-07-08 15:04:21.170  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1809
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
> 2021-07-08 15:04:21.214  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 2000
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 
> 6(191)]
> 2021-07-08 15:04:21.215  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 10
> [6(10)]
> 2021-07-08 15:04:31.248  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1809
> [1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
> 2021-07-08 15:04:41.267  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 1083
> [1(27), 0(87), 5(189), 4(93), 3(114), 2(129), 9(108), 8(93), 7(42), 6(201)]
> 2021-07-08 15:04:51.276  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 201
> [6(201)]
> 2021-07-08 15:05:01.279  INFO 45792 --- [o68201599-0-C-1] 
> com.example.demo.So68201599Application   : 78
> [6(78)]
> {noformat}
> I can understand the second poll returning immediately (presumably over-fetch 
> from the previous poll) but why does the third poll take 10 seconds 
> (fetch.max.wait)? The fourth and fifth polls are like the first and second 
> and return almost immediately, but the sixth once again takes 10 seconds and 
> returns a partial result.
> Is there a document that explains the fetch mechanism that might explain this 
> behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] vvcephei edited a comment on pull request #10602: KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests.

2021-07-08 Thread GitBox


vvcephei edited a comment on pull request #10602:
URL: https://github.com/apache/kafka/pull/10602#issuecomment-876765383


   I haven't verified it, but I think your problem is that you needed to do 
`./gradlew clean systemTestLibs` instead of `:streams:testAll`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] vvcephei commented on pull request #10602: KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests.

2021-07-08 Thread GitBox


vvcephei commented on pull request #10602:
URL: https://github.com/apache/kafka/pull/10602#issuecomment-876765383


   Hmm, I had some trouble with the required pip version... I went ahead and 
kicked off: 
   
   I haven't verified it, but I think your problem is that you needed to do 
`./gradlew clean systemTestLibs` instead of `:streams:testAll`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jolshan commented on a change in pull request #11004: KAFKA-12257: Consumer mishandles topics deleted and recreated with the same name (trunk)

2021-07-08 Thread GitBox


jolshan commented on a change in pull request #11004:
URL: https://github.com/apache/kafka/pull/11004#discussion_r666535365



##
File path: 
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
##
@@ -1206,7 +1205,8 @@ private void validatePositionsOnMetadataChange() {
 fetchable.put(node, builder);
 }
 
-builder.add(partition, 
topicIds.getOrDefault(partition.topic(), Uuid.ZERO_UUID), new 
FetchRequest.PartitionData(position.offset,
+Uuid topicId = metadata.topicId(partition.topic());

Review comment:
   One change we are making for this PR is to just get the topic ID for a 
single provided topic name. I want to double check that the metadata (and 
underlying map) can not change when adding these partitions to the builder 
since the builder assumes IDs do not change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jolshan opened a new pull request #11004: KAFKA-12257: Consumer mishandles topics deleted and recreated with the same name (trunk)

2021-07-08 Thread GitBox


jolshan opened a new pull request #11004:
URL: https://github.com/apache/kafka/pull/11004


   Trunk version of https://github.com/apache/kafka/pull/10952
   
   This PR slightly cleans up some of the changes made in 
https://github.com/apache/kafka/pull/9944
   
   Store topic ID info in consumer metadata. We will always take the topic ID 
from the latest metadata response and remove any topic IDs from the cache if 
the metadata response did not return a topic ID for the topic.
   
   With the addition of topic IDs, when we encounter a new topic ID (recreated 
topic) we can choose to get the topic's metadata even if the epoch is lower 
than the deleted topic.
   
   The idea is that when we update from no topic IDs to using topic IDs, we 
will not count the topic as new (It could be the same topic but with a new ID). 
We will only take the update if the topic ID changed.
   
   Added tests for this scenario as well as some tests for storing the topic 
IDs. Also added tests for topic IDs in metadata cache.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-13054) Unexpected Polling Behavior

2021-07-08 Thread Gary Russell (Jira)
Gary Russell created KAFKA-13054:


 Summary: Unexpected Polling Behavior
 Key: KAFKA-13054
 URL: https://issues.apache.org/jira/browse/KAFKA-13054
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.8.0
Reporter: Gary Russell


Given a topic with 10 partitions and 9000 records (evenly distributed) and
{noformat}
max.poll.records: 2000
fetch.max.bytes = 52428800
fetch.max.wait.ms = 1
fetch.min.bytes = 1024
{noformat}

We see odd results
{code:java}
log.info("" + recs.count() + "\n"
+ recs.partitions().stream()
.map(part -> "" + part.partition() 
+ "(" + recs.records(part).size() + ")")
.collect(Collectors.toList()));
{code}

Result:

{noformat}
2021-07-08 15:04:11.131  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 2000
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 6(191)]
2021-07-08 15:04:11.137  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 10
[6(10)]
2021-07-08 15:04:21.170  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 1809
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
2021-07-08 15:04:21.214  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 2000
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 6(191)]
2021-07-08 15:04:21.215  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 10
[6(10)]
2021-07-08 15:04:31.248  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 1809
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
2021-07-08 15:04:41.267  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 1083
[1(27), 0(87), 5(189), 4(93), 3(114), 2(129), 9(108), 8(93), 7(42), 6(201)]
2021-07-08 15:04:51.276  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 201
[6(201)]
2021-07-08 15:05:01.279  INFO 45792 --- [o68201599-0-C-1] 
com.example.demo.So68201599Application   : 78
[6(78)]
{noformat}

I can understand the second poll returning immediately (presumably over-fetch 
from the previous poll) but why does the third poll take 10 seconds 
(fetch.max.wait)? The fourth and fifth polls are like the first and second and 
return almost immediately, but the sixth once again takes 10 seconds and 
returns a partial result.

Is there a document that explains the fetch mechanism that might explain this 
behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] vvcephei commented on pull request #10602: KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests.

2021-07-08 Thread GitBox


vvcephei commented on pull request #10602:
URL: https://github.com/apache/kafka/pull/10602#issuecomment-876758534


   Hmm, I had some trouble with the required pip version... I went ahead and 
kicked off 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4586/ while I 
debug my local env.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] vvcephei commented on pull request #10602: KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests.

2021-07-08 Thread GitBox


vvcephei commented on pull request #10602:
URL: https://github.com/apache/kafka/pull/10602#issuecomment-876748402


   Thanks, @kamalcph ! I'll take a look now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] vvcephei commented on pull request #11003: KAFKA-12360: Document new time semantics

2021-07-08 Thread GitBox


vvcephei commented on pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#issuecomment-876743838


   Thanks for the review, @JimGalasyn ! I've just reworded the main 
documentation section in response to your feedback. I also included a few other 
improvements I noticed on the second pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-13053) Bump frame version for KRaft records

2021-07-08 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13053:
---

 Summary: Bump frame version for KRaft records
 Key: KAFKA-13053
 URL: https://issues.apache.org/jira/browse/KAFKA-13053
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


We have at least one compatibility break to the record format of KRaft records 
(we added a version field to the LeaderChange schema). Since there was never an 
expectation of compatibility between 2.8 and 3.0, we want to make this explicit 
by bumping the frame version. At the same time, we will reset the record 
versions back to 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >