[jira] [Updated] (KAFKA-14174) Operation documentation for KRaft

2022-09-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Affects Version/s: (was: 3.3.0)

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14174) Operation documentation for KRaft

2022-09-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14174.

Resolution: Fixed

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs

2022-09-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14265.

Resolution: Fixed

> Prefix ACLs may shadow other prefix ACLs
> 
>
> Key: KAFKA-14265
> URL: https://issues.apache.org/jira/browse/KAFKA-14265
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.1
>
>
> Prefix ACLs may shadow other prefix ACLs. Consider the case where we have 
> prefix ACLs for foobar, fooa, and f. If we were matching a resource named 
> "foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop 
> -- missing the f ACL.
> To fix this, we should re-scan for ACLs at the first divergence point (in 
> this case, f) whenever we hit a mismatch of this kind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14259:
---
Fix Version/s: 3.3.1
   (was: 3.3.0)

> BrokerRegistration#toString throws an exception, terminating metadata replay
> 
>
> Key: KAFKA-14259
> URL: https://issues.apache.org/jira/browse/KAFKA-14259
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.1
>
>
> BrokerRegistration#toString throws an exception, terminating metadata replay, 
> because the sorted() method is used on an entry set rather than a key set.
> {noformat}
> Caused by:
>   
>  
> java.util.concurrent.ExecutionException: 
> java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to 
> class java.lang.Comparable (java.util.HashMap$Node and java.lan
> g.Comparable are in module java.base of loader 'bootstrap')   
>   
>  
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>   
>   
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>   
>
> at kafka.server.BrokerServer.startup(BrokerServer.scala:846)  
>   
>  
> ... 147 more  
>   
>  
>   
>   
>  
> Caused by:
>   
>  
> java.lang.ClassCastException: class java.util.HashMap$Node cannot 
> be cast to class java.lang.Comparable (java.util.HashMap$Node and 
> java.lang.Comparable are in module java.base 
> of loader 'bootstrap')
>   
>  
> at 
> java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
>   
>  
> at 
> java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   
>   
> at java.base/java.util.TimSort.sort(TimSort.java:220) 
>   
>  
> at java.base/java.util.Arrays.sort(Arrays.java:1307)  
>   
>  
> at 
> java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
>   
>  
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>   
>   
> at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   
>  
> at 
> 

[jira] [Updated] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14259:
---
Affects Version/s: 3.3.0

> BrokerRegistration#toString throws an exception, terminating metadata replay
> 
>
> Key: KAFKA-14259
> URL: https://issues.apache.org/jira/browse/KAFKA-14259
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.1
>
>
> BrokerRegistration#toString throws an exception, terminating metadata replay, 
> because the sorted() method is used on an entry set rather than a key set.
> {noformat}
> Caused by:
>   
>  
> java.util.concurrent.ExecutionException: 
> java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to 
> class java.lang.Comparable (java.util.HashMap$Node and java.lan
> g.Comparable are in module java.base of loader 'bootstrap')   
>   
>  
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>   
>   
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>   
>
> at kafka.server.BrokerServer.startup(BrokerServer.scala:846)  
>   
>  
> ... 147 more  
>   
>  
>   
>   
>  
> Caused by:
>   
>  
> java.lang.ClassCastException: class java.util.HashMap$Node cannot 
> be cast to class java.lang.Comparable (java.util.HashMap$Node and 
> java.lang.Comparable are in module java.base 
> of loader 'bootstrap')
>   
>  
> at 
> java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
>   
>  
> at 
> java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   
>   
> at java.base/java.util.TimSort.sort(TimSort.java:220) 
>   
>  
> at java.base/java.util.Arrays.sort(Arrays.java:1307)  
>   
>  
> at 
> java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
>   
>  
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>   
>   
> at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   
>  
> at 
> 

[jira] [Updated] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs

2022-09-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14265:
---
Affects Version/s: 3.3.0

> Prefix ACLs may shadow other prefix ACLs
> 
>
> Key: KAFKA-14265
> URL: https://issues.apache.org/jira/browse/KAFKA-14265
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.1
>
>
> Prefix ACLs may shadow other prefix ACLs. Consider the case where we have 
> prefix ACLs for foobar, fooa, and f. If we were matching a resource named 
> "foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop 
> -- missing the f ACL.
> To fix this, we should re-scan for ACLs at the first divergence point (in 
> this case, f) whenever we hit a mismatch of this kind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs

2022-09-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14265:
---
Fix Version/s: 3.3.1

> Prefix ACLs may shadow other prefix ACLs
> 
>
> Key: KAFKA-14265
> URL: https://issues.apache.org/jira/browse/KAFKA-14265
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.1
>
>
> Prefix ACLs may shadow other prefix ACLs. Consider the case where we have 
> prefix ACLs for foobar, fooa, and f. If we were matching a resource named 
> "foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop 
> -- missing the f ACL.
> To fix this, we should re-scan for ACLs at the first divergence point (in 
> this case, f) whenever we hit a mismatch of this kind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13657) StandardAuthorizer should implement the early start listener logic described in KIP-801

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13657:
---
Fix Version/s: (was: 3.3)

> StandardAuthorizer should implement the early start listener logic described 
> in KIP-801
> ---
>
> Key: KAFKA-13657
> URL: https://issues.apache.org/jira/browse/KAFKA-13657
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Jason Gustafson
>Priority: Major
>  Labels: kip-500, kip-801
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13206) shutting down broker needs to stop fetching as a follower in KRaft mode

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13206:
---
Fix Version/s: (was: 3.3)

> shutting down broker needs to stop fetching as a follower in KRaft mode
> ---
>
> Key: KAFKA-13206
> URL: https://issues.apache.org/jira/browse/KAFKA-13206
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft, replication
>Affects Versions: 3.0.0
>Reporter: Jun Rao
>Assignee: HaiyuanZhao
>Priority: Major
>  Labels: kip-500
>
> In the ZK mode, the controller will send a stopReplica(with deletion flag as 
> false) request to the shutting down broker so that it will stop the followers 
> from fetching. In KRaft mode, we don't have a corresponding logic. This means 
> unnecessary rejected fetch follower requests during controlled shutdown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14259.

Resolution: Fixed

> BrokerRegistration#toString throws an exception, terminating metadata replay
> 
>
> Key: KAFKA-14259
> URL: https://issues.apache.org/jira/browse/KAFKA-14259
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> BrokerRegistration#toString throws an exception, terminating metadata replay, 
> because the sorted() method is used on an entry set rather than a key set.
> {noformat}
> Caused by:
>   
>  
> java.util.concurrent.ExecutionException: 
> java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to 
> class java.lang.Comparable (java.util.HashMap$Node and java.lan
> g.Comparable are in module java.base of loader 'bootstrap')   
>   
>  
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>   
>   
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>   
>
> at kafka.server.BrokerServer.startup(BrokerServer.scala:846)  
>   
>  
> ... 147 more  
>   
>  
>   
>   
>  
> Caused by:
>   
>  
> java.lang.ClassCastException: class java.util.HashMap$Node cannot 
> be cast to class java.lang.Comparable (java.util.HashMap$Node and 
> java.lang.Comparable are in module java.base 
> of loader 'bootstrap')
>   
>  
> at 
> java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
>   
>  
> at 
> java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   
>   
> at java.base/java.util.TimSort.sort(TimSort.java:220) 
>   
>  
> at java.base/java.util.Arrays.sort(Arrays.java:1307)  
>   
>  
> at 
> java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
>   
>  
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>   
>   
> at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   
>  
> at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
>

[jira] [Updated] (KAFKA-13806) Check CRC when reading snapshots

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13806:
---
Fix Version/s: (was: 3.3)

> Check CRC when reading snapshots
> 
>
> Key: KAFKA-13806
> URL: https://issues.apache.org/jira/browse/KAFKA-13806
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14259:
---
Fix Version/s: 3.3.0
   (was: 3.3)

> BrokerRegistration#toString throws an exception, terminating metadata replay
> 
>
> Key: KAFKA-14259
> URL: https://issues.apache.org/jira/browse/KAFKA-14259
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> BrokerRegistration#toString throws an exception, terminating metadata replay, 
> because the sorted() method is used on an entry set rather than a key set.
> {noformat}
> Caused by:
>   
>  
> java.util.concurrent.ExecutionException: 
> java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to 
> class java.lang.Comparable (java.util.HashMap$Node and java.lan
> g.Comparable are in module java.base of loader 'bootstrap')   
>   
>  
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>   
>   
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>   
>
> at kafka.server.BrokerServer.startup(BrokerServer.scala:846)  
>   
>  
> ... 147 more  
>   
>  
>   
>   
>  
> Caused by:
>   
>  
> java.lang.ClassCastException: class java.util.HashMap$Node cannot 
> be cast to class java.lang.Comparable (java.util.HashMap$Node and 
> java.lang.Comparable are in module java.base 
> of loader 'bootstrap')
>   
>  
> at 
> java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
>   
>  
> at 
> java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   
>   
> at java.base/java.util.TimSort.sort(TimSort.java:220) 
>   
>  
> at java.base/java.util.Arrays.sort(Arrays.java:1307)  
>   
>  
> at 
> java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
>   
>  
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>   
>   
> at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   
>  
> at 
> 

[jira] [Updated] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14259:
---
Affects Version/s: (was: 3.3)

> BrokerRegistration#toString throws an exception, terminating metadata replay
> 
>
> Key: KAFKA-14259
> URL: https://issues.apache.org/jira/browse/KAFKA-14259
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> BrokerRegistration#toString throws an exception, terminating metadata replay, 
> because the sorted() method is used on an entry set rather than a key set.
> {noformat}
> Caused by:
>   
>  
> java.util.concurrent.ExecutionException: 
> java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to 
> class java.lang.Comparable (java.util.HashMap$Node and java.lan
> g.Comparable are in module java.base of loader 'bootstrap')   
>   
>  
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>   
>   
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>   
>
> at kafka.server.BrokerServer.startup(BrokerServer.scala:846)  
>   
>  
> ... 147 more  
>   
>  
>   
>   
>  
> Caused by:
>   
>  
> java.lang.ClassCastException: class java.util.HashMap$Node cannot 
> be cast to class java.lang.Comparable (java.util.HashMap$Node and 
> java.lang.Comparable are in module java.base 
> of loader 'bootstrap')
>   
>  
> at 
> java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
>   
>  
> at 
> java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   
>   
> at java.base/java.util.TimSort.sort(TimSort.java:220) 
>   
>  
> at java.base/java.util.Arrays.sort(Arrays.java:1307)  
>   
>  
> at 
> java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
>   
>  
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>   
>   
> at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   
>  
> at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
>  

[jira] [Updated] (KAFKA-13888) KIP-836: Expose replication information of the cluster metadata

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13888:
---
Summary: KIP-836: Expose replication information of the cluster metadata  
(was: KIP-836: Addition of Information in DescribeQuorumResponse about Voter 
Lag)

> KIP-836: Expose replication information of the cluster metadata
> ---
>
> Key: KAFKA-13888
> URL: https://issues.apache.org/jira/browse/KAFKA-13888
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Niket Goel
>Assignee: Niket Goel
>Priority: Blocker
> Fix For: 3.3.0
>
>
> Tracking issue for the implementation of KIP:836



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13410) KRaft to KRaft Upgrades

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13410:
---
Summary: KRaft to KRaft Upgrades  (was: KRaft Upgrades)

> KRaft to KRaft Upgrades
> ---
>
> Key: KAFKA-13410
> URL: https://issues.apache.org/jira/browse/KAFKA-13410
> Project: Kafka
>  Issue Type: New Feature
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.3.0
>
>
> This is the placeholder JIRA for KIP-778



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-26 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14207:
---
Fix Version/s: 3.3.0

> Add a 6.10 section for KRaft
> 
>
> Key: KAFKA-14207
> URL: https://issues.apache.org/jira/browse/KAFKA-14207
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> The section should talk about:
>  # Limitation
>  # Recommended deployment: external controller
>  # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-26 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14207.

Resolution: Fixed

> Add a 6.10 section for KRaft
> 
>
> Key: KAFKA-14207
> URL: https://issues.apache.org/jira/browse/KAFKA-14207
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: documentation, kraft
>
> The section should talk about:
>  # Limitation
>  # Recommended deployment: external controller
>  # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-10140) Incremental config api excludes plugin config changes

2022-09-26 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-10140:
---
Description: 
I was trying to alter the jmx metric filters using the incremental alter config 
api and hit this error:
{code:java}
java.util.NoSuchElementException: key not found: metrics.jmx.blacklist
at scala.collection.MapLike.default(MapLike.scala:235)
at scala.collection.MapLike.default$(MapLike.scala:234)
at scala.collection.AbstractMap.default(Map.scala:65)
at scala.collection.MapLike.apply(MapLike.scala:144)
at scala.collection.MapLike.apply$(MapLike.scala:143)
at scala.collection.AbstractMap.apply(Map.scala:65)
at kafka.server.AdminManager.listType$1(AdminManager.scala:681)
at 
kafka.server.AdminManager.$anonfun$prepareIncrementalConfigs$1(AdminManager.scala:693)
at kafka.server.AdminManager.prepareIncrementalConfigs(AdminManager.scala:687)
at 
kafka.server.AdminManager.$anonfun$incrementalAlterConfigs$1(AdminManager.scala:618)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:154)
at scala.collection.TraversableLike.map(TraversableLike.scala:273)
at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at kafka.server.AdminManager.incrementalAlterConfigs(AdminManager.scala:589)
at 
kafka.server.KafkaApis.handleIncrementalAlterConfigsRequest(KafkaApis.scala:2698)
at kafka.server.KafkaApis.handle(KafkaApis.scala:188)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:78)
at java.base/java.lang.Thread.run(Thread.java:834) {code}
It looks like we are only allowing changes to the keys defined in `KafkaConfig` 
through this API. This excludes config changes to any plugin components such as 
`JmxReporter`.

Note that I was able to use the regular `alterConfig` API to change this config.

  was:
I was trying to alter the jmx metric filters using the incremental alter config 
api and hit this error:
{code:java}
java.util.NoSuchElementException: key not found: metrics.jmx.blacklist
at scala.collection.MapLike.default(MapLike.scala:235)
at scala.collection.MapLike.default$(MapLike.scala:234)
at scala.collection.AbstractMap.default(Map.scala:65)
at scala.collection.MapLike.apply(MapLike.scala:144)
at scala.collection.MapLike.apply$(MapLike.scala:143)
at scala.collection.AbstractMap.apply(Map.scala:65)
at kafka.server.AdminManager.listType$1(AdminManager.scala:681)
at 
kafka.server.AdminManager.$anonfun$prepareIncrementalConfigs$1(AdminManager.scala:693)
at kafka.server.AdminManager.prepareIncrementalConfigs(AdminManager.scala:687)
at 
kafka.server.AdminManager.$anonfun$incrementalAlterConfigs$1(AdminManager.scala:618)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:154)
at scala.collection.TraversableLike.map(TraversableLike.scala:273)
at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at kafka.server.AdminManager.incrementalAlterConfigs(AdminManager.scala:589)
at 
kafka.server.KafkaApis.handleIncrementalAlterConfigsRequest(KafkaApis.scala:2698)
at kafka.server.KafkaApis.handle(KafkaApis.scala:188)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:78)
at java.base/java.lang.Thread.run(Thread.java:834) {code}
 

It looks like we are only allowing changes to the keys defined in `KafkaConfig` 
through this API. This excludes config changes to any plugin components such as 
`JmxReporter`.

Note that I was able to use the regular `alterConfig` API to change this config.


> Incremental config api excludes plugin config changes
> -
>
> Key: KAFKA-10140
> URL: https://issues.apache.org/jira/browse/KAFKA-10140
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Critical
>
> I was trying to alter the jmx metric filters using the incremental alter 
> config api and hit this error:
> {code:java}
> java.util.NoSuchElementException: key not found: metrics.jmx.blacklist
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:65)
> at scala.collection.MapLike.apply(MapLike.scala:144)
> at scala.collection.MapLike.apply$(MapLike.scala:143)
> at scala.collection.AbstractMap.apply(Map.scala:65)
> at kafka.server.AdminManager.listType$1(AdminManager.scala:681)
> at 
> kafka.server.AdminManager.$anonfun$prepareIncrementalConfigs$1(AdminManager.scala:693)
> at kafka.server.AdminManager.prepareIncrementalConfigs(AdminManager.scala:687)
> at 
> 

[jira] [Updated] (KAFKA-14243) Temporarily disable unsafe downgrade

2022-09-20 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14243:
---
Fix Version/s: 3.3.0

> Temporarily disable unsafe downgrade
> 
>
> Key: KAFKA-14243
> URL: https://issues.apache.org/jira/browse/KAFKA-14243
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> Temporarily disable unsafe downgrade since we haven't implemented reloading 
> snapshots on unsafe downgrade



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14174) Operation documentation for KRaft

2022-09-19 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Fix Version/s: (was: 3.3.0)

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-19 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14207:
---
Fix Version/s: (was: 3.3.0)

> Add a 6.10 section for KRaft
> 
>
> Key: KAFKA-14207
> URL: https://issues.apache.org/jira/browse/KAFKA-14207
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: documentation, kraft
>
> The section should talk about:
>  # Limitation
>  # Recommended deployment: external controller
>  # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-19 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14207:
---
Affects Version/s: 3.3.0

> Add a 6.10 section for KRaft
> 
>
> Key: KAFKA-14207
> URL: https://issues.apache.org/jira/browse/KAFKA-14207
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> The section should talk about:
>  # Limitation
>  # Recommended deployment: external controller
>  # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-19 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14207:
---
Labels: documentation kraft  (was: )

> Add a 6.10 section for KRaft
> 
>
> Key: KAFKA-14207
> URL: https://issues.apache.org/jira/browse/KAFKA-14207
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> The section should talk about:
>  # Limitation
>  # Recommended deployment: external controller
>  # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14241) Implement the snapshot cleanup policy

2022-09-19 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606669#comment-17606669
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14241:


Yes and those topics are internal at the moment. So at high-level we need to 
have a more complicated validation logic that is able to distinguish if the 
affected topic is a KRaft topic vs an ISR topic.

> Implement the snapshot cleanup policy
> -
>
> Key: KAFKA-14241
> URL: https://issues.apache.org/jira/browse/KAFKA-14241
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.4.0
>
>
> It looks like delete policy needs to be set to either delete or compact:
> {code:java}
>         .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, 
> ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc,
>           KafkaConfig.LogCleanupPolicyProp)
> {code}
> Neither is correct for KRaft topics. KIP-630 talks about adding a third 
> policy called snapshot:
> {code:java}
> The __cluster_metadata topic will have snapshot as the cleanup.policy. {code}
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14241) Implement the snapshot cleanup policy

2022-09-19 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606669#comment-17606669
 ] 

Jose Armando Garcia Sancio edited comment on KAFKA-14241 at 9/19/22 4:32 PM:
-

Yes and that topic is internal at the moment. So at high-level we need to have 
a more complicated validation logic that is able to distinguish if the affected 
topic is a KRaft topic vs an ISR topic.


was (Author: jagsancio):
Yes and those topics are internal at the moment. So at high-level we need to 
have a more complicated validation logic that is able to distinguish if the 
affected topic is a KRaft topic vs an ISR topic.

> Implement the snapshot cleanup policy
> -
>
> Key: KAFKA-14241
> URL: https://issues.apache.org/jira/browse/KAFKA-14241
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.4.0
>
>
> It looks like delete policy needs to be set to either delete or compact:
> {code:java}
>         .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, 
> ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc,
>           KafkaConfig.LogCleanupPolicyProp)
> {code}
> Neither is correct for KRaft topics. KIP-630 talks about adding a third 
> policy called snapshot:
> {code:java}
> The __cluster_metadata topic will have snapshot as the cleanup.policy. {code}
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606000#comment-17606000
 ] 

Jose Armando Garcia Sancio edited comment on KAFKA-14238 at 9/16/22 10:57 PM:
--

Was able to write a test that fails with the current implementation:
{code:java}
> Task :core:test FAILED
kafka.raft.KafkaMetadataLogTest.testSegmentLessThanLatestSnapshot() failed, log 
available in 
/home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentLessThanLatestSnapshot().test.stdoutKafkaMetadataLogTest
 > testSegmentNotDeleteWithoutSnapshot() FAILED
    org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be 
>= log start offset (20010) ==> expected:  but was: 
        at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
        at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
        at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
        at 
kafka.raft.KafkaMetadataLogTest.testSegmentLessThanLatestSnapshot(KafkaMetadataLogTest.scala:921)
 {code}


was (Author: jagsancio):
Was able to write a test that fails with the current implementation:
{code:java}
> Task :core:test FAILED
kafka.raft.KafkaMetadataLogTest.testSegmentLessThanLatestSnapshot() failed, log 
available in 
/home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot().test.stdoutKafkaMetadataLogTest
 > testSegmentNotDeleteWithoutSnapshot() FAILED
    org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be 
>= log start offset (20010) ==> expected:  but was: 
        at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
        at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
        at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
        at 
kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot(KafkaMetadataLogTest.scala:921)
 {code}

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   val 

[jira] [Commented] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606006#comment-17606006
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14238:


PR: [https://github.com/apache/kafka/pull/12655]

Created an issue to implement the snapshot cleanup policy as a followup after 
3.3.0

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   val props = new Properties()
>   props.put(LogConfig.MaxMessageBytesProp, 
> config.maxBatchSizeInBytes.toString)
>   props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
>   props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
>   props.put(LogConfig.FileDeleteDelayMsProp, 
> Int.box(Defaults.FileDeleteDelayMs))
>   LogConfig.validateValues(props)
>   val defaultLogConfig = LogConfig(props){code}
> Segment deletion code:
> {code:java}
>  def deleteOldSegments(): Int = {
>   if (config.delete) {
> deleteLogStartOffsetBreachedSegments() +
>   deleteRetentionSizeBreachedSegments() +
>   deleteRetentionMsBreachedSegments()
>   } else {
> deleteLogStartOffsetBreachedSegments()
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14241) Implement the snapshot cleanup policy

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14241:
--

 Summary: Implement the snapshot cleanup policy
 Key: KAFKA-14241
 URL: https://issues.apache.org/jira/browse/KAFKA-14241
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.4.0


It looks like delete policy needs to be set to either delete or compact:
{code:java}
        .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, 
ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc,
          KafkaConfig.LogCleanupPolicyProp)
{code}
Neither is correct for KRaft topics. KIP-630 talks about adding a third policy 
called snapshot:
{code:java}
The __cluster_metadata topic will have snapshot as the cleanup.policy. {code}
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14240) Ensure kraft metadata log dir is initialized with valid snapshot state

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14240:
---
Fix Version/s: 3.3.0

> Ensure kraft metadata log dir is initialized with valid snapshot state
> --
>
> Key: KAFKA-14240
> URL: https://issues.apache.org/jira/browse/KAFKA-14240
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.3.0
>
>
> If the first segment under __cluster_metadata has a base offset greater than 
> 0, then there must exist at least one snapshot which has a larger offset than 
> whatever the first segment starts at. We should check for this at startup to 
> prevent the controller from initialization with invalid state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606000#comment-17606000
 ] 

Jose Armando Garcia Sancio edited comment on KAFKA-14238 at 9/16/22 9:44 PM:
-

Was able to write a test that fails with the current implementation:
{code:java}
> Task :core:test FAILED
kafka.raft.KafkaMetadataLogTest.testSegmentLessThanLatestSnapshot() failed, log 
available in 
/home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot().test.stdoutKafkaMetadataLogTest
 > testSegmentNotDeleteWithoutSnapshot() FAILED
    org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be 
>= log start offset (20010) ==> expected:  but was: 
        at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
        at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
        at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
        at 
kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot(KafkaMetadataLogTest.scala:921)
 {code}


was (Author: jagsancio):
Was able to write a test that fails with the current implementation:
{code:java}
> Task :core:test FAILED
kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot() failed, 
log available in 
/home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot().test.stdoutKafkaMetadataLogTest
 > testSegmentNotDeleteWithoutSnapshot() FAILED
    org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be 
>= log start offset (20010) ==> expected:  but was: 
        at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
        at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
        at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
        at 
kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot(KafkaMetadataLogTest.scala:921)
 {code}

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   

[jira] [Commented] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606000#comment-17606000
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14238:


Was able to write a test that fails with the current implementation:
{code:java}
> Task :core:test FAILED
kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot() failed, 
log available in 
/home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot().test.stdoutKafkaMetadataLogTest
 > testSegmentNotDeleteWithoutSnapshot() FAILED
    org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be 
>= log start offset (20010) ==> expected:  but was: 
        at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
        at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
        at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
        at 
kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot(KafkaMetadataLogTest.scala:921)
 {code}

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   val props = new Properties()
>   props.put(LogConfig.MaxMessageBytesProp, 
> config.maxBatchSizeInBytes.toString)
>   props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
>   props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
>   props.put(LogConfig.FileDeleteDelayMsProp, 
> Int.box(Defaults.FileDeleteDelayMs))
>   LogConfig.validateValues(props)
>   val defaultLogConfig = LogConfig(props){code}
> Segment deletion code:
> {code:java}
>  def deleteOldSegments(): Int = {
>   if (config.delete) {
> deleteLogStartOffsetBreachedSegments() +
>   deleteRetentionSizeBreachedSegments() +
>   deleteRetentionMsBreachedSegments()
>   } else {
> deleteLogStartOffsetBreachedSegments()
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605926#comment-17605926
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14238:


It looks like delete policy needs to be set to either delete or compact:
{code:java}
        .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, 
ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc,
          KafkaConfig.LogCleanupPolicyProp)
{code}
Neither is correct for KRaft topics. KIP-630 talks about adding a third policy 
called snapshot:
{code:java}
The __cluster_metadata topic will have snapshot as the cleanup.policy. {code}
https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   val props = new Properties()
>   props.put(LogConfig.MaxMessageBytesProp, 
> config.maxBatchSizeInBytes.toString)
>   props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
>   props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
>   props.put(LogConfig.FileDeleteDelayMsProp, 
> Int.box(Defaults.FileDeleteDelayMs))
>   LogConfig.validateValues(props)
>   val defaultLogConfig = LogConfig(props){code}
> Segment deletion code:
> {code:java}
>  def deleteOldSegments(): Int = {
>   if (config.delete) {
> deleteLogStartOffsetBreachedSegments() +
>   deleteRetentionSizeBreachedSegments() +
>   deleteRetentionMsBreachedSegments()
>   } else {
> deleteLogStartOffsetBreachedSegments()
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-14238:
--

Assignee: Jose Armando Garcia Sancio

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   val props = new Properties()
>   props.put(LogConfig.MaxMessageBytesProp, 
> config.maxBatchSizeInBytes.toString)
>   props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
>   props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
>   props.put(LogConfig.FileDeleteDelayMsProp, 
> Int.box(Defaults.FileDeleteDelayMs))
>   LogConfig.validateValues(props)
>   val defaultLogConfig = LogConfig(props){code}
> Segment deletion code:
> {code:java}
>  def deleteOldSegments(): Int = {
>   if (config.delete) {
> deleteLogStartOffsetBreachedSegments() +
>   deleteRetentionSizeBreachedSegments() +
>   deleteRetentionMsBreachedSegments()
>   } else {
> deleteLogStartOffsetBreachedSegments()
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14238:
--

 Summary: KRaft replicas can delete segments not included in a 
snapshot
 Key: KAFKA-14238
 URL: https://issues.apache.org/jira/browse/KAFKA-14238
 Project: Kafka
  Issue Type: Bug
  Components: core, kraft
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.3.0


We see this in the log
{code:java}
Deleting segment LogSegment(baseOffset=243864, size=9269150, 
lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) due 
to retention time 60480ms breach based on the largest record timestamp in 
the segment {code}
This then cause {{KafkaRaftClient}} to throw an exception when sending batches 
to the listener:
{code:java}
 java.lang.IllegalStateException: Snapshot expected since next offset of 
org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 is 
0, log start offset is 369668 and high-watermark is 547379
at 
org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
at java.base/java.util.Optional.orElseThrow(Optional.java:403)
at 
org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
at 
org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
The on disk state for the cluster metadata partition confirms this:
{code:java}
 ls __cluster_metadata-0/
00369668.index
00369668.log
00369668.timeindex
00503411.index
00503411.log
00503411.snapshot
00503411.timeindex
00548746.snapshot
leader-epoch-checkpoint
partition.metadata
quorum-state{code}
Noticed that there are no {{checkpoint}} files and the log doesn't have a 
segment at base offset 0.

This is happening because the {{LogConfig}} used for KRaft sets the retention 
policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
old segments even if there are no snaspshot for it. For KRaft, Kafka should 
only delete segment that breach the log start offset.

Log configuration for KRaft:
{code:java}
  val props = new Properties()
  props.put(LogConfig.MaxMessageBytesProp, 
config.maxBatchSizeInBytes.toString)
  props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
  props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
  props.put(LogConfig.FileDeleteDelayMsProp, 
Int.box(Defaults.FileDeleteDelayMs))
  LogConfig.validateValues(props)
  val defaultLogConfig = LogConfig(props){code}
Segment deletion code:
{code:java}
 def deleteOldSegments(): Int = {
  if (config.delete) {
deleteLogStartOffsetBreachedSegments() +
  deleteRetentionSizeBreachedSegments() +
  deleteRetentionMsBreachedSegments()
  } else {
deleteLogStartOffsetBreachedSegments()
  }
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14206) Upgrade zookeeper to 3.7.1 to address security vulnerabilities

2022-09-13 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14206:
---
Fix Version/s: 3.4.0

> Upgrade zookeeper to 3.7.1 to address security vulnerabilities
> --
>
> Key: KAFKA-14206
> URL: https://issues.apache.org/jira/browse/KAFKA-14206
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Affects Versions: 3.2.1
>Reporter: Valeriy Kassenbayev
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.4.0
>
>
> Kafka 3.2.1 is using ZooKeeper, which is affected by 
> [CVE-2021-37136|https://security.snyk.io/vuln/SNYK-JAVA-IONETTY-1584064] and 
> [CVE-2021-37137:|https://www.cve.org/CVERecord?id=CVE-2021-37137]
> {code:java}
>   ✗ Denial of Service (DoS) [High 
> Severity][https://security.snyk.io/vuln/SNYK-JAVA-IONETTY-1584063] in 
> io.netty:netty-codec@4.1.63.Final
>     introduced by org.apache.kafka:kafka_2.13@3.2.1 > 
> org.apache.zookeeper:zookeeper@3.6.3 > io.netty:netty-handler@4.1.63.Final > 
> io.netty:netty-codec@4.1.63.Final
>   This issue was fixed in versions: 4.1.68.Final
>   ✗ Denial of Service (DoS) [High 
> Severity][https://security.snyk.io/vuln/SNYK-JAVA-IONETTY-1584064] in 
> io.netty:netty-codec@4.1.63.Final
>     introduced by org.apache.kafka:kafka_2.13@3.2.1 > 
> org.apache.zookeeper:zookeeper@3.6.3 > io.netty:netty-handler@4.1.63.Final > 
> io.netty:netty-codec@4.1.63.Final
>   This issue was fixed in versions: 4.1.68.Final {code}
> The issues were fixed in the next versions of ZooKeeper (starting from 
> 3.6.4). ZooKeeper 3.7.1 is the next stable 
> [release|https://zookeeper.apache.org/releases.html] at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14073) Logging the reason for creating a snapshot

2022-09-13 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14073.

Resolution: Fixed

> Logging the reason for creating a snapshot
> --
>
> Key: KAFKA-14073
> URL: https://issues.apache.org/jira/browse/KAFKA-14073
> Project: Kafka
>  Issue Type: Improvement
>Reporter: dengziming
>Priority: Minor
>  Labels: kraft, newbie
>
> So far we have two reasons for creating a snapshot. 1. X bytes were applied. 
> 2. the metadata version changed. we should log the reason when creating 
> snapshot both in the broker side and controller side. see 
> https://github.com/apache/kafka/pull/12265#discussion_r915972383



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14222) Exhausted BatchMemoryPool

2022-09-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14222:
--

 Summary: Exhausted BatchMemoryPool
 Key: KAFKA-14222
 URL: https://issues.apache.org/jira/browse/KAFKA-14222
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0


For a large number of topics and partition the broker can encounter this issue:
{code:java}
[2022-09-12 14:14:42,114] ERROR [BrokerMetadataSnapshotter id=4] Unexpected 
error handling CreateSnapshotEvent 
(kafka.server.metadata.BrokerMetadataSnapshotter)
org.apache.kafka.raft.errors.BufferAllocationException: Append failed because 
we failed to allocate memory to write the batch
at 
org.apache.kafka.raft.internals.BatchAccumulator.append(BatchAccumulator.java:161)
at 
org.apache.kafka.raft.internals.BatchAccumulator.append(BatchAccumulator.java:112)
at 
org.apache.kafka.snapshot.RecordsSnapshotWriter.append(RecordsSnapshotWriter.java:167)
at 
kafka.server.metadata.RecordListConsumer.accept(BrokerMetadataSnapshotter.scala:49)
at 
kafka.server.metadata.RecordListConsumer.accept(BrokerMetadataSnapshotter.scala:42)
at org.apache.kafka.image.TopicImage.write(TopicImage.java:78)
at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:79)
at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:129)
at 
kafka.server.metadata.BrokerMetadataSnapshotter$CreateSnapshotEvent.run(BrokerMetadataSnapshotter.scala:116)
at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
at java.base/java.lang.Thread.run(Thread.java:829) {code}
This can happen because the snapshot is larger than {{{}5 * 8MB{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14215) KRaft forwarded requests have no quota enforcement

2022-09-09 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14215:
---
Fix Version/s: 3.3.0

> KRaft forwarded requests have no quota enforcement
> --
>
> Key: KAFKA-14215
> URL: https://issues.apache.org/jira/browse/KAFKA-14215
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Critical
> Fix For: 3.3.0
>
>
> On the broker, the `BrokerMetadataPublisher` is responsible for propagating 
> quota changes from `ClientQuota` records to `ClientQuotaManager`. On the 
> controller, there is no similar logic, so no client quotas are enforced on 
> the controller.
> On the broker side, there is no enforcement as well since the broker assumes 
> that the controller will be the one to do it. Basically it looks at the 
> throttle time returned in the response from the controller. If it is 0, then 
> the response is sent immediately without any throttling. 
> So the consequence of both of these issues is that controller-bound requests 
> have no throttling today.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14215) KRaft forwarded requests have no quota enforcement

2022-09-09 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-14215:
--

Assignee: Jason Gustafson

> KRaft forwarded requests have no quota enforcement
> --
>
> Key: KAFKA-14215
> URL: https://issues.apache.org/jira/browse/KAFKA-14215
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 3.3.0
>
>
> On the broker, the `BrokerMetadataPublisher` is responsible for propagating 
> quota changes from `ClientQuota` records to `ClientQuotaManager`. On the 
> controller, there is no similar logic, so no client quotas are enforced on 
> the controller.
> On the broker side, there is no enforcement as well since the broker assumes 
> that the controller will be the one to do it. Basically it looks at the 
> throttle time returned in the response from the controller. If it is 0, then 
> the response is sent immediately without any throttling. 
> So the consequence of both of these issues is that controller-bound requests 
> have no throttling today.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14208) KafkaConsumer#commitAsync throws unexpected WakeupException

2022-09-09 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602575#comment-17602575
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14208:


Okay. Thanks for the update. We have a few other blockers for 3.3 so we have 
time to get this merged and cherry picked.

> KafkaConsumer#commitAsync throws unexpected WakeupException
> ---
>
> Key: KAFKA-14208
> URL: https://issues.apache.org/jira/browse/KAFKA-14208
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Qingsheng Ren
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We recently encountered a bug after upgrading Kafka client to 3.2.1 in Flink 
> Kafka connector (FLINK-29153). Here's the exception:
> {code:java}
> org.apache.kafka.common.errors.WakeupException
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:514)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:252)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:493)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:1055)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1573)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaConsumerThread.run(KafkaConsumerThread.java:226)
>  {code}
> As {{WakeupException}} is not listed in the JavaDoc of 
> {{{}KafkaConsumer#commitAsync{}}}, Flink Kafka connector doesn't catch the 
> exception thrown directly from KafkaConsumer#commitAsync but handles all 
> exceptions in the callback.
> I checked the source code and suspect this is caused by KAFKA-13563. Also we 
> never had this exception in commitAsync when we used Kafka client 2.4.1 & 
> 2.8.1. 
> I'm wondering if this is kind of breaking the public API as the 
> WakeupException is not listed in JavaDoc, and maybe it's better to invoke the 
> callback to handle the {{WakeupException}} instead of throwing it directly 
> from the method itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14214) StandardAuthorizer may transiently process ACLs out of write order

2022-09-09 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14214:
---
Fix Version/s: 3.3.0

> StandardAuthorizer may transiently process ACLs out of write order
> --
>
> Key: KAFKA-14214
> URL: https://issues.apache.org/jira/browse/KAFKA-14214
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3
>Reporter: Akhilesh Chaganti
>Priority: Blocker
> Fix For: 3.3.0
>
>
> The issue with StandardAuthorizer#authorize is, that it looks up 
> aclsByResources (which is of type ConcurrentSkipListMap)twice for every 
> authorize call and uses Iterator with weak consistency guarantees on top of 
> aclsByResources. This can cause the authorize function call to process the 
> concurrent writes out of order.
> *Issue 1:*
> When StandardAuthorizer calls into a simple authorize function, we check the 
> ACLs for literal/prefix matches for the resource and then make one more call 
> to check the ACLs for matching wildcard entries. Between the two 
> (checkSection) calls, let’s assume we add a DENY for resource literal and add 
> an ALLOW ALL wildcard. The first call to check literal/prefix rules will SKIP 
> DENY ACL since the writes are not yet processed and the second call would 
> find ALLOW wildcard entry which results in ALLOW authorization for the 
> resource when it is actually DENY.
> *Issue: 2*
> For authorization, StandardAuthorizer depends on an iterator that iterates 
> through the ordered set of ACLs. The iterator has weak consistency 
> guarantees. So when writes for two ACLs occur, one of the ACLs might be still 
> visible to the iterator while the other is not. 
> Let’s say below two ACLS are getting added in the following order to the set.
> Acl1 = StandardAcl(TOPIC, foobar, LITERAL, DENY, READ, user1)
> Acl2 = StandardAcl(TOPIC, foo, PREFIX, ALLOW, READ, user1)
> Depending on the position of the iterator on the ordered set during the write 
> call, the iterator might just see Acl2 which prompts it to ALLOW the topic to 
> be READ even though the DENY rule was written before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14194) NPE in Cluster.nodeIfOnline

2022-09-09 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14194:
---
Fix Version/s: 3.3.0

> NPE in Cluster.nodeIfOnline
> ---
>
> Key: KAFKA-14194
> URL: https://issues.apache.org/jira/browse/KAFKA-14194
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Andrew Dean
>Assignee: Andrew Dean
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> When utilizing rack-aware Kafka consumers and the Kafka broker cluster is 
> restarted an NPE can occur during transient metadata updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-08 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14204.

Resolution: Fixed

> QuorumController must correctly handle overly large batches
> ---
>
> Key: KAFKA-14204
> URL: https://issues.apache.org/jira/browse/KAFKA-14204
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, kraft
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14201) Consumer should not send group instance ID if committing with empty member ID

2022-09-08 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14201:
---
Fix Version/s: 3.3.0

> Consumer should not send group instance ID if committing with empty member ID
> -
>
> Key: KAFKA-14201
> URL: https://issues.apache.org/jira/browse/KAFKA-14201
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.3.0
>
>
> The consumer group instance ID is used to support a notion of "static" 
> consumer groups. The idea is to be able to identify the same group instance 
> across restarts so that a rebalance is not needed. However, if the user sets 
> `group.instance.id` in the consumer configuration, but uses "simple" 
> assignment with `assign()`, then the instance ID nevertheless is sent in the 
> OffsetCommit request to the coordinator. This may result in a surprising 
> UNKNOWN_MEMBER_ID error. The consumer should probably be smart enough to only 
> send the instance ID when committing as part of a consumer group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14198) Release package contains snakeyaml 1.30

2022-09-08 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-14198:
--

Assignee: Ismael Juma

> Release package contains snakeyaml 1.30
> ---
>
> Key: KAFKA-14198
> URL: https://issues.apache.org/jira/browse/KAFKA-14198
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Mickael Maison
>Assignee: Ismael Juma
>Priority: Major
> Fix For: 3.3.0
>
>
> snakeyaml 1.30 is vulnerable to CVE-2022-25857: 
> https://security.snyk.io/vuln/SNYK-JAVA-ORGYAML-2806360
> It looks like we pull this dependency because of swagger. It's unclear how or 
> even if this can be exploited in Kafka but it's flagged by scanning tools. 
> I wonder if we could make the swagger dependencies compile time only and 
> avoid shipping them. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14203) KRaft broker should disable snapshot generation after error replaying the metadata log

2022-09-08 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-14203:
--

Assignee: David Arthur

> KRaft broker should disable snapshot generation after error replaying the 
> metadata log
> --
>
> Key: KAFKA-14203
> URL: https://issues.apache.org/jira/browse/KAFKA-14203
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.3.0
>
>
> The broker skips records for which there was an error when replaying the log. 
> This means that the MetadataImage has diverged from the state persistent in 
> the log. The broker should disable snapshot generation else the next time a 
> snapshot gets generated it will result in inconsistent data getting persisted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-07 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14207:
--

 Summary: Add a 6.10 section for KRaft
 Key: KAFKA-14207
 URL: https://issues.apache.org/jira/browse/KAFKA-14207
 Project: Kafka
  Issue Type: Sub-task
  Components: documentation
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0


The section should talk about:
 # Limitation
 # Recommended deployment: external controller
 # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14198) Release package contains snakeyaml 1.30

2022-09-07 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14198:
---
Fix Version/s: 3.3.0

> Release package contains snakeyaml 1.30
> ---
>
> Key: KAFKA-14198
> URL: https://issues.apache.org/jira/browse/KAFKA-14198
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Mickael Maison
>Priority: Major
> Fix For: 3.3.0
>
>
> snakeyaml 1.30 is vulnerable to CVE-2022-25857: 
> https://security.snyk.io/vuln/SNYK-JAVA-ORGYAML-2806360
> It looks like we pull this dependency because of swagger. It's unclear how or 
> even if this can be exploited in Kafka but it's flagged by scanning tools. 
> I wonder if we could make the swagger dependencies compile time only and 
> avoid shipping them. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14198) Release package contains snakeyaml 1.30

2022-09-07 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601456#comment-17601456
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14198:


Thanks. I'll make it as a blocker for 3.3.0 for now. I assume snakeyaml is only 
needed a compile time. [~mimaison] do you know how to make that change in 
Gradle?

> Release package contains snakeyaml 1.30
> ---
>
> Key: KAFKA-14198
> URL: https://issues.apache.org/jira/browse/KAFKA-14198
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Mickael Maison
>Priority: Major
>
> snakeyaml 1.30 is vulnerable to CVE-2022-25857: 
> https://security.snyk.io/vuln/SNYK-JAVA-ORGYAML-2806360
> It looks like we pull this dependency because of swagger. It's unclear how or 
> even if this can be exploited in Kafka but it's flagged by scanning tools. 
> I wonder if we could make the swagger dependencies compile time only and 
> avoid shipping them. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14205) Document how to recover from kraft controller disk failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14205:
--

 Summary: Document how to recover from kraft controller disk failure
 Key: KAFKA-14205
 URL: https://issues.apache.org/jira/browse/KAFKA-14205
 Project: Kafka
  Issue Type: Sub-task
  Components: documentation
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14204:
---
Affects Version/s: (was: 3.3.0)

> QuorumController must correctly handle overly large batches
> ---
>
> Key: KAFKA-14204
> URL: https://issues.apache.org/jira/browse/KAFKA-14204
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, kraft
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14197) Kraft broker fails to startup after topic creation failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14197:
---
Affects Version/s: 3.3

> Kraft broker fails to startup after topic creation failure
> --
>
> Key: KAFKA-14197
> URL: https://issues.apache.org/jira/browse/KAFKA-14197
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3
>Reporter: Luke Chen
>Priority: Blocker
>
> In kraft ControllerWriteEvent, we start by trying to apply the record to 
> controller in-memory state, then sent out the record via raft client. But if 
> there is error during sending the records, there's no way to revert the 
> change to controller in-memory state[1].
> The issue happened when creating topics, controller state is updated with 
> topic and partition metadata (ex: broker to ISR map), but the record doesn't 
> send out successfully (ex: RecordBatchTooLargeException). Then, when shutting 
> down the node, the controlled shutdown will try to remove the broker from ISR 
> by[2]:
> {code:java}
> generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", 
> brokerId, NO_LEADER, records, 
> brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code}
>  
> After we appending the partitionChangeRecords, and send to metadata topic 
> successfully, it'll cause the brokers failed to "replay" these partition 
> change since these topic/partitions didn't get created successfully 
> previously.
> Even worse, after restarting the node, all the metadata records will replay 
> again, and the same error happened again, cause the broker cannot start up 
> successfully.
>  
> The error and call stack is like this, basically, it complains the topic 
> image can't be found
> {code:java}
> [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error 
> replaying metadata log record at offset 81 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> java.lang.NullPointerException
>     at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69)
>     at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186)
>     at 
> kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> [1] 
> [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804]
>  
> [2] 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14204:
---
Affects Version/s: 3.3.0

> QuorumController must correctly handle overly large batches
> ---
>
> Key: KAFKA-14204
> URL: https://issues.apache.org/jira/browse/KAFKA-14204
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14204:
---
Component/s: controller
 kraft

> QuorumController must correctly handle overly large batches
> ---
>
> Key: KAFKA-14204
> URL: https://issues.apache.org/jira/browse/KAFKA-14204
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, kraft
>Affects Versions: 3.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14197) Kraft broker fails to startup after topic creation failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14197:
---
Fix Version/s: (was: 3.3.0)

> Kraft broker fails to startup after topic creation failure
> --
>
> Key: KAFKA-14197
> URL: https://issues.apache.org/jira/browse/KAFKA-14197
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Luke Chen
>Priority: Blocker
>
> In kraft ControllerWriteEvent, we start by trying to apply the record to 
> controller in-memory state, then sent out the record via raft client. But if 
> there is error during sending the records, there's no way to revert the 
> change to controller in-memory state[1].
> The issue happened when creating topics, controller state is updated with 
> topic and partition metadata (ex: broker to ISR map), but the record doesn't 
> send out successfully (ex: RecordBatchTooLargeException). Then, when shutting 
> down the node, the controlled shutdown will try to remove the broker from ISR 
> by[2]:
> {code:java}
> generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", 
> brokerId, NO_LEADER, records, 
> brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code}
>  
> After we appending the partitionChangeRecords, and send to metadata topic 
> successfully, it'll cause the brokers failed to "replay" these partition 
> change since these topic/partitions didn't get created successfully 
> previously.
> Even worse, after restarting the node, all the metadata records will replay 
> again, and the same error happened again, cause the broker cannot start up 
> successfully.
>  
> The error and call stack is like this, basically, it complains the topic 
> image can't be found
> {code:java}
> [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error 
> replaying metadata log record at offset 81 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> java.lang.NullPointerException
>     at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69)
>     at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186)
>     at 
> kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> [1] 
> [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804]
>  
> [2] 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14204:
---
Fix Version/s: 3.3.0

> QuorumController must correctly handle overly large batches
> ---
>
> Key: KAFKA-14204
> URL: https://issues.apache.org/jira/browse/KAFKA-14204
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14203) KRaft broker should disable snapshot generation after error replaying the metadata log

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14203:
--

 Summary: KRaft broker should disable snapshot generation after 
error replaying the metadata log
 Key: KAFKA-14203
 URL: https://issues.apache.org/jira/browse/KAFKA-14203
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.3.0
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.3.0


The broker skips records for which there was an error when replaying the log. 
This means that the MetadataImage has diverged from the state persistent in the 
log. The broker should disable snapshot generation else the next time a 
snapshot gets generated it will result in inconsistent data getting persisted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14200) kafka-features.sh must exit with non-zero error code on error

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14200:
---
Fix Version/s: 3.3.0

> kafka-features.sh must exit with non-zero error code on error
> -
>
> Key: KAFKA-14200
> URL: https://issues.apache.org/jira/browse/KAFKA-14200
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> kafka-features.sh must exit with a non-zero error code on error. We must do 
> this in order to catch regressions like KAFKA-13990.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14201) Consumer should not send group instance ID if committing with empty member ID

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600876#comment-17600876
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14201:


[~hachikuji] [~dajac] Is this a blocker for 3.3.0?

> Consumer should not send group instance ID if committing with empty member ID
> -
>
> Key: KAFKA-14201
> URL: https://issues.apache.org/jira/browse/KAFKA-14201
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: David Jacot
>Priority: Major
>
> The consumer group instance ID is used to support a notion of "static" 
> consumer groups. The idea is to be able to identify the same group instance 
> across restarts so that a rebalance is not needed. However, if the user sets 
> `group.instance.id` in the consumer configuration, but uses "simple" 
> assignment with `assign()`, then the instance ID nevertheless is sent in the 
> OffsetCommit request to the coordinator. This may result in a surprising 
> UNKNOWN_MEMBER_ID error. The consumer should probably be smart enough to only 
> send the instance ID when committing as part of a consumer group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14197) Kraft broker fails to startup after topic creation failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600874#comment-17600874
 ] 

Jose Armando Garcia Sancio commented on KAFKA-14197:


We should get this fixed in 3.3.0. Added it as a blocker for 3.3.0.

> Kraft broker fails to startup after topic creation failure
> --
>
> Key: KAFKA-14197
> URL: https://issues.apache.org/jira/browse/KAFKA-14197
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Luke Chen
>Priority: Blocker
> Fix For: 3.3.0
>
>
> In kraft ControllerWriteEvent, we start by trying to apply the record to 
> controller in-memory state, then sent out the record via raft client. But if 
> there is error during sending the records, there's no way to revert the 
> change to controller in-memory state[1].
> The issue happened when creating topics, controller state is updated with 
> topic and partition metadata (ex: broker to ISR map), but the record doesn't 
> send out successfully (ex: RecordBatchTooLargeException). Then, when shutting 
> down the node, the controlled shutdown will try to remove the broker from ISR 
> by[2]:
> {code:java}
> generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", 
> brokerId, NO_LEADER, records, 
> brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code}
>  
> After we appending the partitionChangeRecords, and send to metadata topic 
> successfully, it'll cause the brokers failed to "replay" these partition 
> change since these topic/partitions didn't get created successfully 
> previously.
> Even worse, after restarting the node, all the metadata records will replay 
> again, and the same error happened again, cause the broker cannot start up 
> successfully.
>  
> The error and call stack is like this, basically, it complains the topic 
> image can't be found
> {code:java}
> [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error 
> replaying metadata log record at offset 81 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> java.lang.NullPointerException
>     at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69)
>     at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186)
>     at 
> kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> [1] 
> [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804]
>  
> [2] 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14179.

Fix Version/s: (was: 3.3.0)
   Resolution: Duplicate

> Improve docs/upgrade.html to talk about metadata.version upgrades
> -
>
> Key: KAFKA-14179
> URL: https://issues.apache.org/jira/browse/KAFKA-14179
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Blocker
>  Labels: documentation, kraft
>
> The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
> upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14197) Kraft broker fails to startup after topic creation failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14197:
---
Fix Version/s: 3.3.0

> Kraft broker fails to startup after topic creation failure
> --
>
> Key: KAFKA-14197
> URL: https://issues.apache.org/jira/browse/KAFKA-14197
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Luke Chen
>Priority: Major
> Fix For: 3.3.0
>
>
> In kraft ControllerWriteEvent, we start by trying to apply the record to 
> controller in-memory state, then sent out the record via raft client. But if 
> there is error during sending the records, there's no way to revert the 
> change to controller in-memory state[1].
> The issue happened when creating topics, controller state is updated with 
> topic and partition metadata (ex: broker to ISR map), but the record doesn't 
> send out successfully (ex: RecordBatchTooLargeException). Then, when shutting 
> down the node, the controlled shutdown will try to remove the broker from ISR 
> by[2]:
> {code:java}
> generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", 
> brokerId, NO_LEADER, records, 
> brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code}
>  
> After we appending the partitionChangeRecords, and send to metadata topic 
> successfully, it'll cause the brokers failed to "replay" these partition 
> change since these topic/partitions didn't get created successfully 
> previously.
> Even worse, after restarting the node, all the metadata records will replay 
> again, and the same error happened again, cause the broker cannot start up 
> successfully.
>  
> The error and call stack is like this, basically, it complains the topic 
> image can't be found
> {code:java}
> [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error 
> replaying metadata log record at offset 81 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> java.lang.NullPointerException
>     at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69)
>     at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186)
>     at 
> kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> [1] 
> [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804]
>  
> [2] 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14197) Kraft broker fails to startup after topic creation failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14197:
---
Priority: Blocker  (was: Major)

> Kraft broker fails to startup after topic creation failure
> --
>
> Key: KAFKA-14197
> URL: https://issues.apache.org/jira/browse/KAFKA-14197
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Luke Chen
>Priority: Blocker
> Fix For: 3.3.0
>
>
> In kraft ControllerWriteEvent, we start by trying to apply the record to 
> controller in-memory state, then sent out the record via raft client. But if 
> there is error during sending the records, there's no way to revert the 
> change to controller in-memory state[1].
> The issue happened when creating topics, controller state is updated with 
> topic and partition metadata (ex: broker to ISR map), but the record doesn't 
> send out successfully (ex: RecordBatchTooLargeException). Then, when shutting 
> down the node, the controlled shutdown will try to remove the broker from ISR 
> by[2]:
> {code:java}
> generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", 
> brokerId, NO_LEADER, records, 
> brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code}
>  
> After we appending the partitionChangeRecords, and send to metadata topic 
> successfully, it'll cause the brokers failed to "replay" these partition 
> change since these topic/partitions didn't get created successfully 
> previously.
> Even worse, after restarting the node, all the metadata records will replay 
> again, and the same error happened again, cause the broker cannot start up 
> successfully.
>  
> The error and call stack is like this, basically, it complains the topic 
> image can't be found
> {code:java}
> [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error 
> replaying metadata log record at offset 81 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> java.lang.NullPointerException
>     at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69)
>     at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186)
>     at 
> kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> [1] 
> [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804]
>  
> [2] 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-12622) Automate LICENSE file validation

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598637#comment-17598637
 ] 

Jose Armando Garcia Sancio commented on KAFKA-12622:


Thanks for the bash script Mickael. This is what I did for 3.3:
{code:java}
$ ./gradlewAll clean releaseTarGz
$ tar xzf core/build/distributions/kafka_2.13-3.3.0-SNAPSHOT.tgz
$ cd kafka_2.13-3.3.0-SNAPSHOT/
$ for f in $(ls libs | grep -v "^kafka\|connect\|trogdor"); do if ! grep -q 
${f%.*} LICENSE; then echo "${f%.*} is missing in license file"; fi; done
{code}
PR to make it pass for 3.3.0: https://github.com/apache/kafka/pull/12579

> Automate LICENSE file validation
> 
>
> Key: KAFKA-12622
> URL: https://issues.apache.org/jira/browse/KAFKA-12622
> Project: Kafka
>  Issue Type: Task
>Reporter: John Roesler
>Priority: Major
> Fix For: 3.4.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-12602, we manually constructed 
> a correct license file for 2.8.0. This file will certainly become wrong again 
> in later releases, so we need to write some kind of script to automate a 
> check.
> It crossed my mind to automate the generation of the file, but it seems to be 
> an intractable problem, considering that each dependency may change licenses, 
> may package license files, link to them from their poms, link to them from 
> their repos, etc. I've also found multiple URLs listed with various 
> delimiters, broken links that I have to chase down, etc.
> Therefore, it seems like the solution to aim for is simply: list all the jars 
> that we package, and print out a report of each jar that's extra or missing 
> vs. the ones in our `LICENSE-binary` file.
> The check should be part of the release script at least, if not part of the 
> regular build (so we keep it up to date as dependencies change).
>  
> Here's how I do this manually right now:
> {code:java}
> // build the binary artifacts
> $ ./gradlewAll releaseTarGz
> // unpack the binary artifact 
> $ tar xf core/build/distributions/kafka_2.13-X.Y.Z.tgz
> $ cd xf kafka_2.13-X.Y.Z
> // list the packaged jars 
> // (you can ignore the jars for our own modules, like kafka, kafka-clients, 
> etc.)
> $ ls libs/
> // cross check the jars with the packaged LICENSE
> // make sure all dependencies are listed with the right versions
> $ cat LICENSE
> // also double check all the mentioned license files are present
> $ ls licenses {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14193) Connect system test ConnectRestApiTest is failing

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14193:
---
Affects Version/s: 3.3.0

> Connect system test ConnectRestApiTest is failing
> -
>
> Key: KAFKA-14193
> URL: https://issues.apache.org/jira/browse/KAFKA-14193
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.3.0
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.3.0
>
>
> [ConnectRestApiTest|https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/connect/connect_rest_test.py]
>  is currently failing on `trunk` and `3.3` with the following assertion error:
>  
>  
> {code:java}
> AssertionError()
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 183, in _do_run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 243, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/connect/connect_rest_test.py", 
> line 106, in test_rest_api
>     self.verify_config(self.FILE_SOURCE_CONNECTOR, self.FILE_SOURCE_CONFIGS, 
> configs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/connect/connect_rest_test.py", 
> line 219, in verify_config
>     assert config_def == set(config_names){code}
> On closer inspection, this is because of the new source connector EOS related 
> configs added in [https://github.com/apache/kafka/pull/11775.] Adding the 
> following new configs - 
> {code:java}
> offsets.storage.topic, transaction.boundary, exactly.once.support, 
> transaction.boundary.interval.ms{code}
> in the expected config defs 
> [here|https://github.com/apache/kafka/blob/6f4778301b1fcac1e2750cc697043d674eaa230d/tests/kafkatest/tests/connect/connect_rest_test.py#L35]
>  fixes the tests on the 3.3 branch. However, the tests still fail on trunk 
> due to the changes from [https://github.com/apache/kafka/pull/12450.]
>  
> The plan to fix this is to raise two PRs against trunk patching 
> connect_rest_test.py - the first one fixing the EOS configs related issue 
> which can be backported to 3.3 and the second one fixing the issue related to 
> propagation of full connector configs to tasks which shouldn't be backported 
> to 3.3 (because the commit from https://github.com/apache/kafka/pull/12450 is 
> only on trunk and not on 3.3)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14193) Connect system test ConnectRestApiTest is failing

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14193:
---
Fix Version/s: 3.3.0

> Connect system test ConnectRestApiTest is failing
> -
>
> Key: KAFKA-14193
> URL: https://issues.apache.org/jira/browse/KAFKA-14193
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.3.0
>
>
> [ConnectRestApiTest|https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/connect/connect_rest_test.py]
>  is currently failing on `trunk` and `3.3` with the following assertion error:
>  
>  
> {code:java}
> AssertionError()
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 183, in _do_run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 243, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/connect/connect_rest_test.py", 
> line 106, in test_rest_api
>     self.verify_config(self.FILE_SOURCE_CONNECTOR, self.FILE_SOURCE_CONFIGS, 
> configs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/connect/connect_rest_test.py", 
> line 219, in verify_config
>     assert config_def == set(config_names){code}
> On closer inspection, this is because of the new source connector EOS related 
> configs added in [https://github.com/apache/kafka/pull/11775.] Adding the 
> following new configs - 
> {code:java}
> offsets.storage.topic, transaction.boundary, exactly.once.support, 
> transaction.boundary.interval.ms{code}
> in the expected config defs 
> [here|https://github.com/apache/kafka/blob/6f4778301b1fcac1e2750cc697043d674eaa230d/tests/kafkatest/tests/connect/connect_rest_test.py#L35]
>  fixes the tests on the 3.3 branch. However, the tests still fail on trunk 
> due to the changes from [https://github.com/apache/kafka/pull/12450.]
>  
> The plan to fix this is to raise two PRs against trunk patching 
> connect_rest_test.py - the first one fixing the EOS configs related issue 
> which can be backported to 3.3 and the second one fixing the issue related to 
> propagation of full connector configs to tasks which shouldn't be backported 
> to 3.3 (because the commit from https://github.com/apache/kafka/pull/12450 is 
> only on trunk and not on 3.3)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14143) Exactly-once source system tests

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14143:
---
Affects Version/s: 3.3.0

> Exactly-once source system tests
> 
>
> Key: KAFKA-14143
> URL: https://issues.apache.org/jira/browse/KAFKA-14143
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.3.0
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.3.0
>
>
> System tests for the exactly-once source connector support introduced in 
> [KIP-618|https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors]
>  / KAFKA-1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14143) Exactly-once source system tests

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14143:
---
Fix Version/s: 3.3.0

> Exactly-once source system tests
> 
>
> Key: KAFKA-14143
> URL: https://issues.apache.org/jira/browse/KAFKA-14143
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.3.0
>
>
> System tests for the exactly-once source connector support introduced in 
> [KIP-618|https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors]
>  / KAFKA-1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13990) Update features will fail in KRaft mode

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13990:
---
Affects Version/s: 3.3.0

> Update features will fail in KRaft mode
> ---
>
> Key: KAFKA-13990
> URL: https://issues.apache.org/jira/browse/KAFKA-13990
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
> Fix For: 3.3.0
>
>
> We return empty supported features in Controller ApiVersionResponse, so the 
> {{quorumSupportedFeature}} will always return empty, we should return 
> Map(metadata.version -> latest)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13990) Update features will fail in KRaft mode

2022-08-31 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13990:
---
Fix Version/s: 3.3.0

> Update features will fail in KRaft mode
> ---
>
> Key: KAFKA-13990
> URL: https://issues.apache.org/jira/browse/KAFKA-13990
> Project: Kafka
>  Issue Type: Bug
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
> Fix For: 3.3.0
>
>
> We return empty supported features in Controller ApiVersionResponse, so the 
> {{quorumSupportedFeature}} will always return empty, we should return 
> Map(metadata.version -> latest)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14179:
---
Affects Version/s: 3.3.0

> Improve docs/upgrade.html to talk about metadata.version upgrades
> -
>
> Key: KAFKA-14179
> URL: https://issues.apache.org/jira/browse/KAFKA-14179
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Blocker
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
> upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14188) Quickstart for KRaft

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14188:
---
Affects Version/s: 3.3.0

> Quickstart for KRaft
> 
>
> Key: KAFKA-14188
> URL: https://issues.apache.org/jira/browse/KAFKA-14188
> Project: Kafka
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> Either:
>  # Improve the quick start documentation to talk about both KRAft and ZK
>  # Create a KRaft quick start that is very similar to the ZK quick start but 
> uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14179:
---
Fix Version/s: 3.3.0

> Improve docs/upgrade.html to talk about metadata.version upgrades
> -
>
> Key: KAFKA-14179
> URL: https://issues.apache.org/jira/browse/KAFKA-14179
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Blocker
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
> upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14188) Quickstart for KRaft

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14188:
---
Fix Version/s: 3.3.0

> Quickstart for KRaft
> 
>
> Key: KAFKA-14188
> URL: https://issues.apache.org/jira/browse/KAFKA-14188
> Project: Kafka
>  Issue Type: Task
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> Either:
>  # Improve the quick start documentation to talk about both KRAft and ZK
>  # Create a KRaft quick start that is very similar to the ZK quick start but 
> uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14174) Operation documentation for KRaft

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Fix Version/s: 3.3.0

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14174) Operation documentation for KRaft

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Affects Version/s: 3.3.0

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
> Fix For: 3.3.0
>
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14187) kafka-features.sh: add support for --metadata

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14187:
---
Fix Version/s: 3.3.0

> kafka-features.sh: add support for --metadata
> -
>
> Key: KAFKA-14187
> URL: https://issues.apache.org/jira/browse/KAFKA-14187
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> Fix the kafka-features.sh command so that we can upgrade to the new version 
> as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14156) Built-in partitioner may create suboptimal batches with large linger.ms

2022-08-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14156:
---
Fix Version/s: 3.3.0

> Built-in partitioner may create suboptimal batches with large linger.ms
> ---
>
> Key: KAFKA-14156
> URL: https://issues.apache.org/jira/browse/KAFKA-14156
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.3.0
>Reporter: Artem Livshits
>Priority: Blocker
> Fix For: 3.3.0
>
>
> The new built-in "sticky" partitioner switches partitions based on the amount 
> of bytes produced to a partition.  It doesn't use batch creation as a switch 
> trigger.  The previous "sticky" DefaultPartitioner switched partition when a 
> new batch was created and with small linger.ms (default is 0) could result in 
> sending larger batches to slower brokers potentially overloading them.  See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
>  for more detail.
> However, the with large linger.ms, the new built-in partitioner may create 
> suboptimal batches.  Let's consider an example, suppose linger.ms=500, 
> batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we 
> produce 12KB worth of data.  The new built-in partitioner would switch 
> partition on every 16KB, so we could get into the following batching pattern:
>  * produce 12KB to one partition in 500ms, hit linger, send 12KB batch
>  * produce 4KB more to the same partition, now we've produced 16KB of data, 
> switch partition
>  * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch
>  * in the mean time the 4KB produced to the first partition would hit linger 
> as well, sending 4KB batch
>  * produce 4KB more to the second partition, now we've produced 16KB of data 
> to the second partition, switch to 3rd partition
> so in this scenario the new built-in partitioner produces a mix of 12KB and 
> 4KB batches, while the previous DefaultPartitioner would produce only 12KB 
> batches -- it switches on new batch creation, so there is no "mid-linger" 
> leftover batches.
> To avoid creation of batch fragmentation on partition switch, we can wait 
> until the batch is ready before switching the partition, i.e. the condition 
> to switch to a new partition would be "produced batch.size bytes" AND "batch 
> is not lingering".  This may potentially introduce some non-uniformity into 
> data distribution, but unlike the previous DefaultPartitioner, the 
> non-uniformity would not be based on broker performance and won't 
> re-introduce the bad pattern of sending more data to slower brokers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14174) Operation documentation for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Component/s: documentation

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14188) Quickstart for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14188:
---
Component/s: (was: kraft)

> Quickstart for KRaft
> 
>
> Key: KAFKA-14188
> URL: https://issues.apache.org/jira/browse/KAFKA-14188
> Project: Kafka
>  Issue Type: Task
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> Either:
>  # Improve the quick start documentation to talk about both KRAft and ZK
>  # Create a KRaft quick start that is very similar to the ZK quick start but 
> uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14188) Quickstart for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14188:
---
Labels: documentation kraft  (was: documentation)

> Quickstart for KRaft
> 
>
> Key: KAFKA-14188
> URL: https://issues.apache.org/jira/browse/KAFKA-14188
> Project: Kafka
>  Issue Type: Task
>  Components: documentation, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> Either:
>  # Improve the quick start documentation to talk about both KRAft and ZK
>  # Create a KRaft quick start that is very similar to the ZK quick start but 
> uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14188) Quickstart for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14188:
---
Labels: documentation  (was: )

> Quickstart for KRaft
> 
>
> Key: KAFKA-14188
> URL: https://issues.apache.org/jira/browse/KAFKA-14188
> Project: Kafka
>  Issue Type: Task
>  Components: documentation, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation
>
> Either:
>  # Improve the quick start documentation to talk about both KRAft and ZK
>  # Create a KRaft quick start that is very similar to the ZK quick start but 
> uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14174) Operation documentation for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Summary: Operation documentation for KRaft  (was: Documentation for KRaft)

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14188) Quickstart for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14188:
--

 Summary: Quickstart for KRaft
 Key: KAFKA-14188
 URL: https://issues.apache.org/jira/browse/KAFKA-14188
 Project: Kafka
  Issue Type: Task
  Components: documentation, kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


Either:
 # Improve the quick start documentation to talk about both KRAft and ZK
 # Create a KRaft quick start that is very similar to the ZK quick start but 
uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13909) Restart Kafka in KRaft mode with ACLs ends in a RuntimeException

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13909:
---
Fix Version/s: (was: 3.3.0)

> Restart Kafka in KRaft mode with ACLs ends in a RuntimeException
> 
>
> Key: KAFKA-13909
> URL: https://issues.apache.org/jira/browse/KAFKA-13909
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.2.0
> Environment: Running Kafka in a Docker container
>Reporter: Florian Blumenstein
>Assignee: Luke Chen
>Priority: Major
> Attachments: kafka.log, server.properties
>
>
> Running Kafka in KRaft mode works for the initial startup. When restarting 
> Kafka it ends in a RuntimeException:
> [2022-05-17 08:26:40,959] ERROR [BrokerServer id=1] Fatal error during broker 
> startup. Prepare to shutdown (kafka.server.BrokerServer)
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: An ACL 
> with ID toAvM0TbTfWRmS1kjknRaA already exists.
>         at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown 
> Source)
>         at java.base/java.util.concurrent.CompletableFuture.get(Unknown 
> Source)
>         at kafka.server.BrokerServer.startup(BrokerServer.scala:426)
>         at 
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:114)
>         at 
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:114)
>         at scala.Option.foreach(Option.scala:437)
>         at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:114)
>         at kafka.Kafka$.main(Kafka.scala:109)
>         at kafka.Kafka.main(Kafka.scala)
> Caused by: java.lang.RuntimeException: An ACL with ID toAvM0TbTfWRmS1kjknRaA 
> already exists.
>         at 
> org.apache.kafka.metadata.authorizer.StandardAuthorizerData.addAcl(StandardAuthorizerData.java:169)
>         at 
> org.apache.kafka.metadata.authorizer.StandardAuthorizer.addAcl(StandardAuthorizer.java:83)
>         at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$19(BrokerMetadataPublisher.scala:234)
>         at java.base/java.util.LinkedHashMap$LinkedEntrySet.forEach(Unknown 
> Source)
>         at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$18(BrokerMetadataPublisher.scala:232)
>         at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$18$adapted(BrokerMetadataPublisher.scala:221)
>         at scala.Option.foreach(Option.scala:437)
>         at 
> kafka.server.metadata.BrokerMetadataPublisher.publish(BrokerMetadataPublisher.scala:221)
>         at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$publish(BrokerMetadataListener.scala:258)
>         at 
> kafka.server.metadata.BrokerMetadataListener$StartPublishingEvent.run(BrokerMetadataListener.scala:241)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>         at java.base/java.lang.Thread.run(Unknown Source)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13937) StandardAuthorizer throws "ID 5t1jQ3zWSfeVLMYkN3uong not found in aclsById" exceptions into broker logs

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13937:
---
Fix Version/s: (was: 3.3.0)

> StandardAuthorizer throws "ID 5t1jQ3zWSfeVLMYkN3uong not found in aclsById" 
> exceptions into broker logs
> ---
>
> Key: KAFKA-13937
> URL: https://issues.apache.org/jira/browse/KAFKA-13937
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Jakub Scholz
>Assignee: Luke Chen
>Priority: Major
>
> I'm trying to use the new {{StandardAuthorizer}} in a Kafka cluster running 
> in KRaft mode. When managing the ACLs using the Admin API, the authorizer 
> seems to throw a lot of runtime exceptions in the log. For example ...
> When creating an ACL rule, it seems to create it just fine. But it throws the 
> following exception:
> {code:java}
> 2022-05-25 11:09:18,074 ERROR [StandardAuthorizer 0] addAcl error 
> (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [EventHandler]
> java.lang.RuntimeException: An ACL with ID 5t1jQ3zWSfeVLMYkN3uong already 
> exists.
>     at 
> org.apache.kafka.metadata.authorizer.StandardAuthorizerData.addAcl(StandardAuthorizerData.java:169)
>     at 
> org.apache.kafka.metadata.authorizer.StandardAuthorizer.addAcl(StandardAuthorizer.java:83)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$19(BrokerMetadataPublisher.scala:234)
>     at 
> java.base/java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$18(BrokerMetadataPublisher.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$18$adapted(BrokerMetadataPublisher.scala:221)
>     at scala.Option.foreach(Option.scala:437)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.publish(BrokerMetadataPublisher.scala:221)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$publish(BrokerMetadataListener.scala:258)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.$anonfun$run$2(BrokerMetadataListener.scala:119)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.$anonfun$run$2$adapted(BrokerMetadataListener.scala:119)
>     at scala.Option.foreach(Option.scala:437)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:119)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-05-25 11:09:18,076 ERROR [BrokerMetadataPublisher id=0] Error publishing 
> broker metadata at OffsetAndEpoch(offset=3, epoch=1) 
> (kafka.server.metadata.BrokerMetadataPublisher) [EventHandler]
> java.lang.RuntimeException: An ACL with ID 5t1jQ3zWSfeVLMYkN3uong already 
> exists.
>     at 
> org.apache.kafka.metadata.authorizer.StandardAuthorizerData.addAcl(StandardAuthorizerData.java:169)
>     at 
> org.apache.kafka.metadata.authorizer.StandardAuthorizer.addAcl(StandardAuthorizer.java:83)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$19(BrokerMetadataPublisher.scala:234)
>     at 
> java.base/java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$18(BrokerMetadataPublisher.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$publish$18$adapted(BrokerMetadataPublisher.scala:221)
>     at scala.Option.foreach(Option.scala:437)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.publish(BrokerMetadataPublisher.scala:221)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$publish(BrokerMetadataListener.scala:258)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.$anonfun$run$2(BrokerMetadataListener.scala:119)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.$anonfun$run$2$adapted(BrokerMetadataListener.scala:119)
>     at scala.Option.foreach(Option.scala:437)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:119)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> 

[jira] [Updated] (KAFKA-13897) Add 3.1.1 to system tests and streams upgrade tests

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-13897:
---
Fix Version/s: (was: 3.3.0)
   (was: 3.1.2)

> Add 3.1.1 to system tests and streams upgrade tests
> ---
>
> Key: KAFKA-13897
> URL: https://issues.apache.org/jira/browse/KAFKA-13897
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Reporter: Tom Bentley
>Priority: Blocker
>
> Per the penultimate bullet on the [release 
> checklist|https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses],
>  Kafka v3.1.1 is released. We should add this version to the system tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14142) Improve information returned about the cluster metadata partition

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14142:
---
Fix Version/s: (was: 3.3.0)

> Improve information returned about the cluster metadata partition
> -
>
> Key: KAFKA-14142
> URL: https://issues.apache.org/jira/browse/KAFKA-14142
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jason Gustafson
>Priority: Blocker
>
> The Apacke Kafka operator needs to know when it is safe to format and start a 
> KRaft Controller that had a disk failure of the metadata log dir.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14050) Older clients cannot deserialize ApiVersions response with finalized feature epoch

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14050:
---
Fix Version/s: (was: 3.3.0)

> Older clients cannot deserialize ApiVersions response with finalized feature 
> epoch
> --
>
> Key: KAFKA-14050
> URL: https://issues.apache.org/jira/browse/KAFKA-14050
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
>
> When testing kraft locally, we encountered this exception from an older 
> client:
> {code:java}
> [ERROR] 2022-07-05 16:45:01,165 [kafka-admin-client-thread | 
> adminclient-1394] org.apache.kafka.common.utils.KafkaThread 
> lambda$configureThread$0 - Uncaught exception in thread 
> 'kafka-admin-client-thread | adminclient-1394':
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'api_keys': Error reading array of size 1207959552, only 579 bytes available
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:118)
> at 
> org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:378)
> at 
> org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:187)
> at 
> org.apache.kafka.clients.NetworkClient$DefaultClientInterceptor.parseResponse(NetworkClient.java:1333)
> at 
> org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:752)
> at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:888)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:577)
> at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1329)
> at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1260)
> at java.base/java.lang.Thread.run(Thread.java:832) {code}
> The cause appears to be from a change to the type of the 
> `FinalizedFeaturesEpoch` field in the `ApiVersions` response from int32 to 
> int64: 
> [https://github.com/apache/kafka/pull/9001/files#diff-32006e8becae918416debdb9ac76bf8a1ad12b83aaaf5f8819b6ecc00c1fb56bR58.]
> Fortunately, `FinalizedFeaturesEpoch` is a tagged field, so we can fix this 
> by creating a new field. We will have to leave the existing tag in the 
> protocol spec and consider it dead.
> Credit for this find goes to [~dajac] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14183) Kraft bootstrap metadata file should use snapshot header/footer

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14183.

Resolution: Fixed

> Kraft bootstrap metadata file should use snapshot header/footer
> ---
>
> Key: KAFKA-14183
> URL: https://issues.apache.org/jira/browse/KAFKA-14183
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.3.0
>
>
> The bootstrap checkpoint file that we use in kraft is intended to follow the 
> usual snapshot format, but currently it does not include the header/footer 
> control records. The main purpose of these at the moment is to set a version 
> for the checkpoint file itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-12622) Automate LICENSE file validation

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-12622:
---
Fix Version/s: 3.4.0
   (was: 3.3.0)

> Automate LICENSE file validation
> 
>
> Key: KAFKA-12622
> URL: https://issues.apache.org/jira/browse/KAFKA-12622
> Project: Kafka
>  Issue Type: Task
>Reporter: John Roesler
>Priority: Major
> Fix For: 3.4.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-12602, we manually constructed 
> a correct license file for 2.8.0. This file will certainly become wrong again 
> in later releases, so we need to write some kind of script to automate a 
> check.
> It crossed my mind to automate the generation of the file, but it seems to be 
> an intractable problem, considering that each dependency may change licenses, 
> may package license files, link to them from their poms, link to them from 
> their repos, etc. I've also found multiple URLs listed with various 
> delimiters, broken links that I have to chase down, etc.
> Therefore, it seems like the solution to aim for is simply: list all the jars 
> that we package, and print out a report of each jar that's extra or missing 
> vs. the ones in our `LICENSE-binary` file.
> The check should be part of the release script at least, if not part of the 
> regular build (so we keep it up to date as dependencies change).
>  
> Here's how I do this manually right now:
> {code:java}
> // build the binary artifacts
> $ ./gradlewAll releaseTarGz
> // unpack the binary artifact 
> $ tar xf core/build/distributions/kafka_2.13-X.Y.Z.tgz
> $ cd xf kafka_2.13-X.Y.Z
> // list the packaged jars 
> // (you can ignore the jars for our own modules, like kafka, kafka-clients, 
> etc.)
> $ ls libs/
> // cross check the jars with the packaged LICENSE
> // make sure all dependencies are listed with the right versions
> $ cat LICENSE
> // also double check all the mentioned license files are present
> $ ls licenses {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-08-26 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-14179:
--

Assignee: Colin McCabe

> Improve docs/upgrade.html to talk about metadata.version upgrades
> -
>
> Key: KAFKA-14179
> URL: https://issues.apache.org/jira/browse/KAFKA-14179
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Blocker
>  Labels: documentation, kraft
>
> The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
> upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14183) Kraft bootstrap metadata file should use snapshot header/footer

2022-08-26 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-14183:
--

Assignee: Jose Armando Garcia Sancio  (was: Jason Gustafson)

> Kraft bootstrap metadata file should use snapshot header/footer
> ---
>
> Key: KAFKA-14183
> URL: https://issues.apache.org/jira/browse/KAFKA-14183
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.3.0
>
>
> The bootstrap checkpoint file that we use in kraft is intended to follow the 
> usual snapshot format, but currently it does not include the header/footer 
> control records. The main purpose of these at the moment is to set a version 
> for the checkpoint file itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14142) Improve information returned about the cluster metadata partition

2022-08-25 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14142.

Resolution: Won't Fix

We discussed this and we decided that the kafka-metadata-quorum tool already 
returns enough information to determine this.

> Improve information returned about the cluster metadata partition
> -
>
> Key: KAFKA-14142
> URL: https://issues.apache.org/jira/browse/KAFKA-14142
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0
>
>
> The Apacke Kafka operator needs to know when it is safe to format and start a 
> KRaft Controller that had a disk failure of the metadata log dir.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14156) Built-in partitioner may create suboptimal batches with large linger.ms

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14156:
---
Fix Version/s: (was: 3.3.0)

> Built-in partitioner may create suboptimal batches with large linger.ms
> ---
>
> Key: KAFKA-14156
> URL: https://issues.apache.org/jira/browse/KAFKA-14156
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.3.0
>Reporter: Artem Livshits
>Priority: Blocker
>
> The new built-in "sticky" partitioner switches partitions based on the amount 
> of bytes produced to a partition.  It doesn't use batch creation as a switch 
> trigger.  The previous "sticky" DefaultPartitioner switched partition when a 
> new batch was created and with small linger.ms (default is 0) could result in 
> sending larger batches to slower brokers potentially overloading them.  See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
>  for more detail.
> However, the with large linger.ms, the new built-in partitioner may create 
> suboptimal batches.  Let's consider an example, suppose linger.ms=500, 
> batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we 
> produce 12KB worth of data.  The new built-in partitioner would switch 
> partition on every 16KB, so we could get into the following batching pattern:
>  * produce 12KB to one partition in 500ms, hit linger, send 12KB batch
>  * produce 4KB more to the same partition, now we've produced 16KB of data, 
> switch partition
>  * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch
>  * in the mean time the 4KB produced to the first partition would hit linger 
> as well, sending 4KB batch
>  * produce 4KB more to the second partition, now we've produced 16KB of data 
> to the second partition, switch to 3rd partition
> so in this scenario the new built-in partitioner produces a mix of 12KB and 
> 4KB batches, while the previous DefaultPartitioner would produce only 12KB 
> batches -- it switches on new batch creation, so there is no "mid-linger" 
> leftover batches.
> To avoid creation of batch fragmentation on partition switch, we can wait 
> until the batch is ready before switching the partition, i.e. the condition 
> to switch to a new partition would be "produced batch.size bytes" AND "batch 
> is not lingering".  This may potentially introduce some non-uniformity into 
> data distribution, but unlike the previous DefaultPartitioner, the 
> non-uniformity would not be based on broker performance and won't 
> re-introduce the bad pattern of sending more data to slower brokers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14156) Built-in partitioner may create suboptimal batches with large linger.ms

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14156:
---
Priority: Blocker  (was: Major)

> Built-in partitioner may create suboptimal batches with large linger.ms
> ---
>
> Key: KAFKA-14156
> URL: https://issues.apache.org/jira/browse/KAFKA-14156
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 3.3.0
>Reporter: Artem Livshits
>Priority: Blocker
> Fix For: 3.3.0
>
>
> The new built-in "sticky" partitioner switches partitions based on the amount 
> of bytes produced to a partition.  It doesn't use batch creation as a switch 
> trigger.  The previous "sticky" DefaultPartitioner switched partition when a 
> new batch was created and with small linger.ms (default is 0) could result in 
> sending larger batches to slower brokers potentially overloading them.  See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
>  for more detail.
> However, the with large linger.ms, the new built-in partitioner may create 
> suboptimal batches.  Let's consider an example, suppose linger.ms=500, 
> batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we 
> produce 12KB worth of data.  The new built-in partitioner would switch 
> partition on every 16KB, so we could get into the following batching pattern:
>  * produce 12KB to one partition in 500ms, hit linger, send 12KB batch
>  * produce 4KB more to the same partition, now we've produced 16KB of data, 
> switch partition
>  * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch
>  * in the mean time the 4KB produced to the first partition would hit linger 
> as well, sending 4KB batch
>  * produce 4KB more to the second partition, now we've produced 16KB of data 
> to the second partition, switch to 3rd partition
> so in this scenario the new built-in partitioner produces a mix of 12KB and 
> 4KB batches, while the previous DefaultPartitioner would produce only 12KB 
> batches -- it switches on new batch creation, so there is no "mid-linger" 
> leftover batches.
> To avoid creation of batch fragmentation on partition switch, we can wait 
> until the batch is ready before switching the partition, i.e. the condition 
> to switch to a new partition would be "produced batch.size bytes" AND "batch 
> is not lingering".  This may potentially introduce some non-uniformity into 
> data distribution, but unlike the previous DefaultPartitioner, the 
> non-uniformity would not be based on broker performance and won't 
> re-introduce the bad pattern of sending more data to slower brokers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14156) Built-in partitioner may create suboptimal batches with large linger.ms

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14156:
---
Issue Type: Bug  (was: Improvement)

> Built-in partitioner may create suboptimal batches with large linger.ms
> ---
>
> Key: KAFKA-14156
> URL: https://issues.apache.org/jira/browse/KAFKA-14156
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.3.0
>Reporter: Artem Livshits
>Priority: Blocker
> Fix For: 3.3.0
>
>
> The new built-in "sticky" partitioner switches partitions based on the amount 
> of bytes produced to a partition.  It doesn't use batch creation as a switch 
> trigger.  The previous "sticky" DefaultPartitioner switched partition when a 
> new batch was created and with small linger.ms (default is 0) could result in 
> sending larger batches to slower brokers potentially overloading them.  See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
>  for more detail.
> However, the with large linger.ms, the new built-in partitioner may create 
> suboptimal batches.  Let's consider an example, suppose linger.ms=500, 
> batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we 
> produce 12KB worth of data.  The new built-in partitioner would switch 
> partition on every 16KB, so we could get into the following batching pattern:
>  * produce 12KB to one partition in 500ms, hit linger, send 12KB batch
>  * produce 4KB more to the same partition, now we've produced 16KB of data, 
> switch partition
>  * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch
>  * in the mean time the 4KB produced to the first partition would hit linger 
> as well, sending 4KB batch
>  * produce 4KB more to the second partition, now we've produced 16KB of data 
> to the second partition, switch to 3rd partition
> so in this scenario the new built-in partitioner produces a mix of 12KB and 
> 4KB batches, while the previous DefaultPartitioner would produce only 12KB 
> batches -- it switches on new batch creation, so there is no "mid-linger" 
> leftover batches.
> To avoid creation of batch fragmentation on partition switch, we can wait 
> until the batch is ready before switching the partition, i.e. the condition 
> to switch to a new partition would be "produced batch.size bytes" AND "batch 
> is not lingering".  This may potentially introduce some non-uniformity into 
> data distribution, but unlike the previous DefaultPartitioner, the 
> non-uniformity would not be based on broker performance and won't 
> re-introduce the bad pattern of sending more data to slower brokers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14156) Built-in partitioner may create suboptimal batches with large linger.ms

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14156:
---
Fix Version/s: 3.3.0

> Built-in partitioner may create suboptimal batches with large linger.ms
> ---
>
> Key: KAFKA-14156
> URL: https://issues.apache.org/jira/browse/KAFKA-14156
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 3.3.0
>Reporter: Artem Livshits
>Priority: Major
> Fix For: 3.3.0
>
>
> The new built-in "sticky" partitioner switches partitions based on the amount 
> of bytes produced to a partition.  It doesn't use batch creation as a switch 
> trigger.  The previous "sticky" DefaultPartitioner switched partition when a 
> new batch was created and with small linger.ms (default is 0) could result in 
> sending larger batches to slower brokers potentially overloading them.  See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
>  for more detail.
> However, the with large linger.ms, the new built-in partitioner may create 
> suboptimal batches.  Let's consider an example, suppose linger.ms=500, 
> batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we 
> produce 12KB worth of data.  The new built-in partitioner would switch 
> partition on every 16KB, so we could get into the following batching pattern:
>  * produce 12KB to one partition in 500ms, hit linger, send 12KB batch
>  * produce 4KB more to the same partition, now we've produced 16KB of data, 
> switch partition
>  * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch
>  * in the mean time the 4KB produced to the first partition would hit linger 
> as well, sending 4KB batch
>  * produce 4KB more to the second partition, now we've produced 16KB of data 
> to the second partition, switch to 3rd partition
> so in this scenario the new built-in partitioner produces a mix of 12KB and 
> 4KB batches, while the previous DefaultPartitioner would produce only 12KB 
> batches -- it switches on new batch creation, so there is no "mid-linger" 
> leftover batches.
> To avoid creation of batch fragmentation on partition switch, we can wait 
> until the batch is ready before switching the partition, i.e. the condition 
> to switch to a new partition would be "produced batch.size bytes" AND "batch 
> is not lingering".  This may potentially introduce some non-uniformity into 
> data distribution, but unlike the previous DefaultPartitioner, the 
> non-uniformity would not be based on broker performance and won't 
> re-introduce the bad pattern of sending more data to slower brokers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14174) Documentation for KRaft

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14174:
---
Description: 
KRaft documentation for 3.3
 # Disk recovery
 # External controller is the recommended configuration. The majority of 
integration tests don't run against co-located mode.
 # Talk about KRaft operation

  was:
KRaft documentation for 3.3
 # Disk recovery
 # Talk about KRaft operation


> Documentation for KRaft
> ---
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio updated KAFKA-14179:
---
Labels: documentation kraft  (was: docuentation kraft)

> Improve docs/upgrade.html to talk about metadata.version upgrades
> -
>
> Key: KAFKA-14179
> URL: https://issues.apache.org/jira/browse/KAFKA-14179
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
> upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   >