[GitHub] [kafka] dengziming commented on a change in pull request #10786: KAFKA-12787: Integrate controller snapshoting with raft client

2021-06-06 Thread GitBox


dengziming commented on a change in pull request #10786:
URL: https://github.com/apache/kafka/pull/10786#discussion_r646282221



##
File path: core/src/main/scala/kafka/raft/KafkaMetadataLog.scala
##
@@ -247,6 +247,34 @@ final class KafkaMetadataLog private (
 FileRawSnapshotWriter.create(log.dir.toPath, snapshotId, Optional.of(this))
   }
 
+  override def createSnapshotFromEndOffset(endOffset: Long): RawSnapshotWriter 
= {
+val highWatermarkOffset = highWatermark.offset
+if (endOffset > highWatermarkOffset) {
+  throw new IllegalArgumentException(
+s"Cannot create a snapshot for an end offset ($endOffset) greater than 
the high-watermark ($highWatermarkOffset)"
+  )
+}
+
+if (endOffset < startOffset) {
+  throw new IllegalArgumentException(
+s"Cannot create a snapshot for an end offset ($endOffset) less or 
equal to the log start offset ($startOffset)"

Review comment:
   nit: the code is `endOffset < startOffset` but the log is `less or equal 
to`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-12901) Metadata not updated after broker restart.

2021-06-06 Thread Suriya Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358350#comment-17358350
 ] 

Suriya Vijayaraghavan commented on KAFKA-12901:
---

I did have doubts regarding the[ 
KAFKA-12677|https://issues.apache.org/jira/browse/KAFKA-12677] issue. But not 
sure if I should focus on why the server java process did not stop, or why the 
metadata did not get updated after broker restart. 

[Logs|https://docs.google.com/document/d/1K8mdN4R59oR6SkI5d4FMAZlUNU2hmLidm1rxnTVtvJw/edit?usp=sharing]
 during the ShutDown due to Zookeeper session expiration.

[Logs|https://docs.google.com/document/d/1cUS4rwMI0CLx02lvdzM_kmVfH209wctHjsxCApZur-8/edit?usp=sharing]
 after restart

> Metadata not updated after broker restart.
> --
>
> Key: KAFKA-12901
> URL: https://issues.apache.org/jira/browse/KAFKA-12901
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Suriya Vijayaraghavan
>Priority: Major
>
> We upgraded to version 2.8 from 2.7. After monitoring for few weeks we 
> upgraded in our production setup (as we didn't enable Kraft we went ahead), 
> we faced TimeoutException in our clients after few weeks in our production 
> setup. We tried to list all active brokers using admin client API, all 
> brokers were listed properly. So we logged into that broker and tried to do a 
> describe topic with localhost as bootstrap-server, but we got timeout as 
> there.
> When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
> thread (zookeeper session timed out and we had three retry failures). But the 
> controlled shutdown got failed (got unknown server error response from the 
> controller), and proceeded to unclean shutdown. Still the process didn't get 
> quit but the process didnt process any other operation as well.  And this did 
> not remove the broker from alive status for hours (able to see this broker in 
> list of brokers) and our clients were still trying to contact this broker and 
> failing with timeout exception. So we tried restarting the problematic 
> broker, but we faced unknown topic or partition issue in our client after the 
> restart which caused timeout as well. We noticed that metadata was not 
> loaded. So we had to restart our controller. And after restarting the 
> controller everthing got back to normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] satishd commented on pull request #10829: MINOR Removed unused ConfigProvider from raft resources module.

2021-06-06 Thread GitBox


satishd commented on pull request #10829:
URL: https://github.com/apache/kafka/pull/10829#issuecomment-855561739


   @junrao Pl take a look at the minor PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-12487) Sink connectors do not work with the cooperative consumer rebalance protocol

2021-06-06 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358311#comment-17358311
 ] 

Luke Chen commented on KAFKA-12487:
---

[~ChrisEgerton], is there any update on this ticket? Do you think we can 
complete it by V3.0? (No push, just want to know the status.) Thank you.

> Sink connectors do not work with the cooperative consumer rebalance protocol
> 
>
> Key: KAFKA-12487
> URL: https://issues.apache.org/jira/browse/KAFKA-12487
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>
> The {{ConsumerRebalanceListener}} used by the framework to respond to 
> rebalance events in consumer groups for sink tasks is hard-coded with the 
> assumption that the consumer performs rebalances eagerly. In other words, it 
> assumes that whenever {{onPartitionsRevoked}} is called, all partitions have 
> been revoked from that consumer, and whenever {{onPartitionsAssigned}} is 
> called, the partitions passed in to that method comprise the complete set of 
> topic partitions assigned to that consumer.
> See the [WorkerSinkTask.HandleRebalance 
> class|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L669-L730]
>  for the specifics.
>  
> One issue this can cause is silently ignoring to-be-committed offsets 
> provided by sink tasks, since the framework ignores offsets provided by tasks 
> in their {{preCommit}} method if it does not believe that the consumer for 
> that task is currently assigned the topic partition for that offset. See 
> these lines in the [WorkerSinkTask::commitOffsets 
> method|https://github.com/apache/kafka/blob/b96fc7892f1e885239d3290cf509e1d1bb41e7db/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L429-L430]
>  for reference.
>  
> This may not be the only issue caused by configuring a sink connector's 
> consumer to use cooperative rebalancing. Rigorous unit and integration 
> testing should be added before claiming that the Connect framework supports 
> the use of cooperative consumers with sink connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12901) Metadata not updated after broker restart.

2021-06-06 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358299#comment-17358299
 ] 

Luke Chen commented on KAFKA-12901:
---

[~suriyav], do you have complete log for this issue? Is it possible that this 
is related to KAFKA-12677 although this is in zookeeper mode, not Kraft mode?

> Metadata not updated after broker restart.
> --
>
> Key: KAFKA-12901
> URL: https://issues.apache.org/jira/browse/KAFKA-12901
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Suriya Vijayaraghavan
>Priority: Major
>
> We upgraded to version 2.8 from 2.7. After monitoring for few weeks we 
> upgraded in our production setup (as we didn't enable Kraft we went ahead), 
> we faced TimeoutException in our clients after few weeks in our production 
> setup. We tried to list all active brokers using admin client API, all 
> brokers were listed properly. So we logged into that broker and tried to do a 
> describe topic with localhost as bootstrap-server, but we got timeout as 
> there.
> When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
> thread (zookeeper session timed out and we had three retry failures). But the 
> controlled shutdown got failed (got unknown server error response from the 
> controller), and proceeded to unclean shutdown. Still the process didn't get 
> quit but the process didnt process any other operation as well.  And this did 
> not remove the broker from alive status for hours (able to see this broker in 
> list of brokers) and our clients were still trying to contact this broker and 
> failing with timeout exception. So we tried restarting the problematic 
> broker, but we faced unknown topic or partition issue in our client after the 
> restart which caused timeout as well. We noticed that metadata was not 
> loaded. So we had to restart our controller. And after restarting the 
> controller everthing got back to normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12889) log clean group consider empty log segment to avoid empty log left

2021-06-06 Thread qiang Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiang Liu updated KAFKA-12889:
--
Issue Type: Bug  (was: Improvement)
  Priority: Trivial  (was: Minor)

> log clean group consider empty log segment to avoid empty log left
> --
>
> Key: KAFKA-12889
> URL: https://issues.apache.org/jira/browse/KAFKA-12889
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 0.10.1.1
>Reporter: qiang Liu
>Priority: Trivial
>
> to avoid log index 4 byte relative offset overflow, log cleaner group check 
> log segments offset to make sure group offset range not exceed Int.MaxValue.
> this offset check currentlly not cosider next is next log segment is empty, 
> so there will left empty log files every about 2^31 messages.
> the left empty logs will be reprocessed every clean cycle, which will rewrite 
> it with same empty content, witch cause little no need io.
> for __consumer_offsets topic, normally we can set cleanup.policy to 
> compact,delete to get rid of this.
> my cluster is 0.10.1.1, but after aylize trunk code, it should has same 
> problem too.
>  
> some of my left empty logs,(run ls -l)
> -rw-r- 1 u g 0 Dec 16 2017 .index
> -rw-r- 1 u g 0 Dec 16 2017 .log
> -rw-r- 1 u g 0 Dec 16 2017 .timeindex
> -rw-r- 1 u g 0 Jan  15 2018 002148249632.index
> -rw-r- 1 u g 0 Jan  15 2018 002148249632.log
> -rw-r- 1 u g 0 Jan  15 2018 002148249632.timeindex
> -rw-r- 1 u g 0 Jan  27 2018 004295766494.index
> -rw-r- 1 u g 0 Jan  27 2018 004295766494.log
> -rw-r- 1 u g 0 Jan  27 2018 004295766494.timeindex
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] satishd opened a new pull request #10829: MINOR Removed unused ConfigProvider from raft resources module.

2021-06-06 Thread GitBox


satishd opened a new pull request #10829:
URL: https://github.com/apache/kafka/pull/10829


   MINOR Removed unused ConfigProvider from raft resources module.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] mimaison commented on pull request #10743: KIP-699: Update FindCoordinator to resolve multiple Coordinators at a time

2021-06-06 Thread GitBox


mimaison commented on pull request #10743:
URL: https://github.com/apache/kafka/pull/10743#issuecomment-855465767


   @dajac Right, I've updated [the 
description](https://github.com/apache/kafka/pull/10743#issue-650159295) to 
give some context


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (KAFKA-12901) Metadata not updated after broker restart.

2021-06-06 Thread Suriya Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suriya Vijayaraghavan updated KAFKA-12901:
--
Description: 
We upgraded to version 2.8 from 2.7. After monitoring for few weeks we upgraded 
in our production setup (as we didn't enable Kraft we went ahead), we faced 
TimeoutException in our clients after few weeks in our production setup. We 
tried to list all active brokers using admin client API, all brokers were 
listed properly. So we logged into that broker and tried to do a describe topic 
with localhost as bootstrap-server, but we got timeout as there.

When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
thread (zookeeper session timed out and we had three retry failures). But the 
controlled shutdown got failed (got unknown server error response from the 
controller), and proceeded to unclean shutdown. Still the process didn't get 
quit but the process didnt process any other operation as well.  And this did 
not remove the broker from alive status for hours (able to see this broker in 
list of brokers) and our clients were still trying to contact this broker and 
failing with timeout exception. So we tried restarting the problematic broker, 
but we faced unknown topic or partition issue in our client after the restart 
which caused timeout as well. We noticed that metadata was not loaded. So we 
had to restart our controller. And after restarting the controller everthing 
got back to normal.



  was:
We upgraded to version 2.8 from 2.7. After monitoring for few weeks we upgraded 
in our production setup (as we didn't enable Kraft we went ahead), we faced 
TimeoutException in our clients after few weeks in our production setup. We 
tried to list all active brokers using admin client API, all brokers were 
listed properly. So we logged into that broker and tried to do a describe topic 
with localhost as bootstrap-server, but we got timeout as there.

When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
thread (zookeeper session timed out and we had three retry failures). But the 
controlled shutdown got failed (got unknown server error response from the 
controller), and proceeded to unclean shutdown. Still the process didn't get 
quit but the process didnt process any other operation as well.  And this did 
not remove the broker from alive status for hours (able to see this broker in 
list of brokers) and our clients were still trying to contact this broker and 
failing with timeout exception. So we tried restarting the problematic broker, 
but we faced unknown topic or partition issue in our client after the restart 
which caused timeout as well. We noticed that metadata was not loaded. So we 
had to restart our controller. And after restarting the controller everthing 
got back to normal.

So how metadata loading is handled? Is there any alternative ways for us to 
automate monitoring for metadata update? 



> Metadata not updated after broker restart.
> --
>
> Key: KAFKA-12901
> URL: https://issues.apache.org/jira/browse/KAFKA-12901
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Suriya Vijayaraghavan
>Priority: Major
>
> We upgraded to version 2.8 from 2.7. After monitoring for few weeks we 
> upgraded in our production setup (as we didn't enable Kraft we went ahead), 
> we faced TimeoutException in our clients after few weeks in our production 
> setup. We tried to list all active brokers using admin client API, all 
> brokers were listed properly. So we logged into that broker and tried to do a 
> describe topic with localhost as bootstrap-server, but we got timeout as 
> there.
> When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
> thread (zookeeper session timed out and we had three retry failures). But the 
> controlled shutdown got failed (got unknown server error response from the 
> controller), and proceeded to unclean shutdown. Still the process didn't get 
> quit but the process didnt process any other operation as well.  And this did 
> not remove the broker from alive status for hours (able to see this broker in 
> list of brokers) and our clients were still trying to contact this broker and 
> failing with timeout exception. So we tried restarting the problematic 
> broker, but we faced unknown topic or partition issue in our client after the 
> restart which caused timeout as well. We noticed that metadata was not 
> loaded. So we had to restart our controller. And after restarting the 
> controller everthing got back to normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12901) Metadata not updated after broker restart.

2021-06-06 Thread Suriya Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suriya Vijayaraghavan updated KAFKA-12901:
--
Description: 
We upgraded to version 2.8 from 2.7. After monitoring for few weeks we upgraded 
in our production setup (as we didn't enable Kraft we went ahead), we faced 
TimeoutException in our clients after few weeks in our production setup. We 
tried to list all active brokers using admin client API, all brokers were 
listed properly. So we logged into that broker and tried to do a describe topic 
with localhost as bootstrap-server, but we got timeout as there.

When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
thread (zookeeper session timed out and we had three retry failures). But the 
controlled shutdown got failed (got unknown server error response from the 
controller), and proceeded to unclean shutdown. Still the process didn't get 
quit but the process didnt process any other operation as well.  And this did 
not remove the broker from alive status for hours (able to see this broker in 
list of brokers) and our clients were still trying to contact this broker and 
failing with timeout exception. So we tried restarting the problematic broker, 
but we faced unknown topic or partition issue in our client after the restart 
which caused timeout as well. We noticed that metadata was not loaded. So we 
had to restart our controller. And after restarting the controller everthing 
got back to normal.

So how metadata loading is handled? Is there any alternative ways for us to 
automate monitoring for metadata update? 


  was:

We upgraded to version 2.8 from 2.7. After monitoring for few weeks we upgraded 
in our production setup (as we didn't enable Kraft we went ahead), we faced 
TimeoutException in our clients after few weeks in our production setup. We 
tried to list all active brokers using admin client API, all brokers were 
listed properly. So we logged into that broker and tried to do a describe topic 
with localhost as bootstrap-server, but we got timeout as there.

When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
thread (zookeeper session timed out and we had three retry failures). But the 
controlled shutdown got failed (got unknown server error response from the 
controller), and proceeded to unclean shutdown. Still the process didn't get 
quit but the process didnt process any other operation as well.  And this did 
not remove the broker from alive status for hours (able to see this broker in 
list of brokers) and our clients were still trying to contact this broker and 
failing with timeout exception. So we tried restarting the problematic broker, 
but we faced unknown topic or partition issue in our client after the restart. 
We noticed that metadata was not loaded. So we had to restart our controller. 
And after restarting the controller everthing got back to normal.

So how metadata loading is handled? Is there any alternative ways for us to 
automate monitoring for metadata update? 



> Metadata not updated after broker restart.
> --
>
> Key: KAFKA-12901
> URL: https://issues.apache.org/jira/browse/KAFKA-12901
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Suriya Vijayaraghavan
>Priority: Major
>
> We upgraded to version 2.8 from 2.7. After monitoring for few weeks we 
> upgraded in our production setup (as we didn't enable Kraft we went ahead), 
> we faced TimeoutException in our clients after few weeks in our production 
> setup. We tried to list all active brokers using admin client API, all 
> brokers were listed properly. So we logged into that broker and tried to do a 
> describe topic with localhost as bootstrap-server, but we got timeout as 
> there.
> When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
> thread (zookeeper session timed out and we had three retry failures). But the 
> controlled shutdown got failed (got unknown server error response from the 
> controller), and proceeded to unclean shutdown. Still the process didn't get 
> quit but the process didnt process any other operation as well.  And this did 
> not remove the broker from alive status for hours (able to see this broker in 
> list of brokers) and our clients were still trying to contact this broker and 
> failing with timeout exception. So we tried restarting the problematic 
> broker, but we faced unknown topic or partition issue in our client after the 
> restart which caused timeout as well. We noticed that metadata was not 
> loaded. So we had to restart our controller. And after restarting the 
> controller everthing got back to normal.
> So how metadata loading is handled? Is there any alternative ways for us to 
> automate 

[jira] [Created] (KAFKA-12901) Metadata not updated after broker restart.

2021-06-06 Thread Suriya Vijayaraghavan (Jira)
Suriya Vijayaraghavan created KAFKA-12901:
-

 Summary: Metadata not updated after broker restart.
 Key: KAFKA-12901
 URL: https://issues.apache.org/jira/browse/KAFKA-12901
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.8.0
Reporter: Suriya Vijayaraghavan



We upgraded to version 2.8 from 2.7. After monitoring for few weeks we upgraded 
in our production setup (as we didn't enable Kraft we went ahead), we faced 
TimeoutException in our clients after few weeks in our production setup. We 
tried to list all active brokers using admin client API, all brokers were 
listed properly. So we logged into that broker and tried to do a describe topic 
with localhost as bootstrap-server, but we got timeout as there.

When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
thread (zookeeper session timed out and we had three retry failures). But the 
controlled shutdown got failed (got unknown server error response from the 
controller), and proceeded to unclean shutdown. Still the process didn't get 
quit but the process didnt process any other operation as well.  And this did 
not remove the broker from alive status for hours (able to see this broker in 
list of brokers) and our clients were still trying to contact this broker and 
failing with timeout exception. So we tried restarting the problematic broker, 
but we faced unknown topic or partition issue in our client after the restart. 
We noticed that metadata was not loaded. So we had to restart our controller. 
And after restarting the controller everthing got back to normal.

So how metadata loading is handled? Is there any alternative ways for us to 
automate monitoring for metadata update? 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] rondagostino commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

2021-06-06 Thread GitBox


rondagostino commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r646148894



##
File path: 
metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##
@@ -340,14 +340,14 @@ int numUnfencedBrokers() {
 }
 
 List place(int replicationFactor) {
-if (replicationFactor <= 0) {
-throw new InvalidReplicationFactorException("Invalid 
replication factor " +
-replicationFactor + ": the replication factor must be 
positive.");
-}
+throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, 
numTotalBrokers());
+throwInvalidReplicationFactorIfZero(numUnfencedBrokers());
 // If we have returned as many assignments as there are unfenced 
brokers in
 // the cluster, shuffle the rack list and broker lists to try to 
avoid
 // repeating the same assignments again.
-if (epoch == numUnfencedBrokers) {
+// But don't reset the iteration epoch for a single unfenced 
broker -- otherwise we would loop forever
+if (epoch == numUnfencedBrokers && numUnfencedBrokers > 1) {

Review comment:
   There was no test covering this case, but I added one: 
`testMultiPartitionTopicPlacementOnSingleUnfencedBroker()` will never finish 
without this fix.  The `while (true)` loop in `RackList.place()` will never 
exit without this change when placing multiple partitions on a cluster with 
just a single unfenced broker.  The issue is that the iteration epoch will 
start at 0 for the first partition but (without the change) will be reset back 
to 0 for the second partition; the `Rack` instance associated with the broker 
will see the same iteration epoch for the second partition and therefore says 
it has no more unfenced brokers available.  The loop moves to the next rack, 
but there is no next rack -- there's only the one -- so around we go again 
asking the same question, ad infinitum.
   
   One might wonder about the validity of resetting the iteration epoch 
backwards to zero at all -- if it is possible that a rack with a single broker 
could see some iteration epoch and then be asked to place another partition 
just at the moment when the epoch loops back to the same value.  I think this 
is not possible because the racks are shuffled once every broker gets an 
assignment (and hence every rack gets at least one assignment); no rack will 
see the same iteration epoch again without it seeing a different iteration 
epoch in between.
   
   The degenerate case of just 1 broker is the one we are fixing here: we can't 
reset the epoch because shuffling has no effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] rondagostino commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

2021-06-06 Thread GitBox


rondagostino commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r646148894



##
File path: 
metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##
@@ -340,14 +340,14 @@ int numUnfencedBrokers() {
 }
 
 List place(int replicationFactor) {
-if (replicationFactor <= 0) {
-throw new InvalidReplicationFactorException("Invalid 
replication factor " +
-replicationFactor + ": the replication factor must be 
positive.");
-}
+throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, 
numTotalBrokers());
+throwInvalidReplicationFactorIfZero(numUnfencedBrokers());
 // If we have returned as many assignments as there are unfenced 
brokers in
 // the cluster, shuffle the rack list and broker lists to try to 
avoid
 // repeating the same assignments again.
-if (epoch == numUnfencedBrokers) {
+// But don't reset the iteration epoch for a single unfenced 
broker -- otherwise we would loop forever
+if (epoch == numUnfencedBrokers && numUnfencedBrokers > 1) {

Review comment:
   There was no test covering this case, but I added one: 
`testMultiPartitionTopicPlacementOnSingleUnfencedBroker()` will never finish 
without this fix.  The `while (true)` loop in `RackList.place()` will never 
exit without this change when placing multiple partitions on a cluster with 
just a single unfenced broker.  The issue is that the iteration epoch will 
start at 0 for the first partition but (without the change) will be reset back 
to 0 for the second partition; the `Rack` instance associated with the broker 
will see the same iteration epoch for the second partition and therefore says 
it has no more unfenced brokers available.  The loop moves to the next rack, 
but there is no next rack -- there's only the one -- so we around we go again 
asking the same question, ad infinitum.
   
   One might wonder about the validity of resetting the iteration epoch 
backwards to zero at all -- if it is possible that a rack with a single broker 
could see some iteration epoch and then be asked to place another partition 
just at the moment when the epoch loops back to the same value.  I think this 
is not possible because the racks are shuffled once every broker gets an 
assignment (and hence every rack gets at least one assignment); no rack will 
see the same iteration epoch again without it seeing a different iteration 
epoch in between.
   
   The degenerate case of just 1 broker is the one we are fixing here: we can't 
reset the epoch because shuffling has no effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] rondagostino commented on a change in pull request #10823: KAFKA-12897: KRaft multi-partition placement on single broker

2021-06-06 Thread GitBox


rondagostino commented on a change in pull request #10823:
URL: https://github.com/apache/kafka/pull/10823#discussion_r646142461



##
File path: 
metadata/src/main/java/org/apache/kafka/controller/StripedReplicaPlacer.java
##
@@ -412,14 +433,9 @@ public StripedReplicaPlacer(Random random) {
  short replicationFactor,
  Iterator iterator) {
 RackList rackList = new RackList(random, iterator);
-if (rackList.numUnfencedBrokers() == 0) {
-throw new InvalidReplicationFactorException("All brokers are 
currently fenced.");
-}
-if (replicationFactor > rackList.numTotalBrokers()) {
-throw new InvalidReplicationFactorException("The target 
replication factor " +
-"of " + replicationFactor + " cannot be reached because only " 
+
-rackList.numTotalBrokers() + " broker(s) are registered.");
-}
+throwInvalidReplicationFactorIfNonPositive(replicationFactor);
+throwInvalidReplicationFactorIfZero(rackList.numUnfencedBrokers());
+throwInvalidReplicationFactorIfTooFewBrokers(replicationFactor, 
rackList.numTotalBrokers());

Review comment:
   We have three separate argument checks to perform, each with its 
separate error message that the test check for:
   
   a) replication factory can't be non-positive
   b) unfenced broker count can't be 0
   c) unfenced broker count must must not be less than replication factor
   
   `StripedReplicaPlacer.place()` was performing checks (b) and (c).
   `RackList.place()` was performing check (a)
   
   Given that `RackList` is publicly accessible within the package, I felt it 
was important to perform all the sanity checks there. But 
`StripedReplicaPlacer.place()` also has a loop and allocates an array, so while 
we could allow the first iteration of the loop to raise the exception, I felt 
it was clearer to perform the checks there as well.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] socutes commented on pull request #10815: KAFKA-12885: Add the --timeout property to kafka-leader-election.sh

2021-06-06 Thread GitBox


socutes commented on pull request #10815:
URL: https://github.com/apache/kafka/pull/10815#issuecomment-855410045


   Get! Thank you very much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dengziming commented on pull request #10793: KAFKA-12338: Remove useless MetadataParser

2021-06-06 Thread GitBox


dengziming commented on pull request #10793:
URL: https://github.com/apache/kafka/pull/10793#issuecomment-855409242


   @hachikuji I update the code according to your suggestions and also make the 
test more accurate, please take a look again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] ijuma commented on pull request #10828: MINOR: Only log overridden topic configs during topic creation

2021-06-06 Thread GitBox


ijuma commented on pull request #10828:
URL: https://github.com/apache/kafka/pull/10828#issuecomment-855406145


   > Also I found you opened this PR in Kafka repo, not your personal repo. It 
should be a miss. FYI.
   
   We typically use our forks as a convention, but there is no rule against 
using the main repo if you are a committer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] ijuma commented on a change in pull request #10828: MINOR: Only log overridden topic configs during topic creation

2021-06-06 Thread GitBox


ijuma commented on a change in pull request #10828:
URL: https://github.com/apache/kafka/pull/10828#discussion_r646138828



##
File path: core/src/test/scala/unit/kafka/log/LogConfigTest.scala
##
@@ -162,6 +162,21 @@ class LogConfigTest {
 assertNull(nullServerDefault)
   }
 
+  @Test
+  def testOverriddenConfigsAsLoggableString(): Unit = {
+val kafkaProps = TestUtils.createBrokerConfig(nodeId = 0, zkConnect = "")
+kafkaProps.put("unknown.broker.password.config", "a")
+kafkaProps.put(KafkaConfig.SslKeyPasswordProp, "somekeypassword")

Review comment:
   Yes, we can. The goal of this test is not to verify that, but no harm in 
covering it too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon commented on a change in pull request #10828: MINOR: Only log overridden topic configs during topic creation

2021-06-06 Thread GitBox


showuon commented on a change in pull request #10828:
URL: https://github.com/apache/kafka/pull/10828#discussion_r646095921



##
File path: core/src/test/scala/unit/kafka/log/LogConfigTest.scala
##
@@ -162,6 +162,21 @@ class LogConfigTest {
 assertNull(nullServerDefault)
   }
 
+  @Test
+  def testOverriddenConfigsAsLoggableString(): Unit = {
+val kafkaProps = TestUtils.createBrokerConfig(nodeId = 0, zkConnect = "")
+kafkaProps.put("unknown.broker.password.config", "a")
+kafkaProps.put(KafkaConfig.SslKeyPasswordProp, "somekeypassword")

Review comment:
   Could we add a line with the same setting as overridden one, but not the 
same value, so that we can verify the overridden value will be outputted? Ex:
   add this line
   `kafkaProps.put(LogConfig.MinInSyncReplicasProp, "1")`
   then, we already have
   ` topicOverrides.setProperty(LogConfig.MinInSyncReplicasProp, "2")`
   
   So, we can assert the log is output `min.insync.replicas=2`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon commented on pull request #10815: KAFKA-12885: Add the --timeout property to kafka-leader-election.sh

2021-06-06 Thread GitBox


showuon commented on pull request #10815:
URL: https://github.com/apache/kafka/pull/10815#issuecomment-855357174


   Yes, KIP is necessary for this change. ref: 
https://cwiki.apache.org/confluence/display/kafka/kafka+improvement+proposals


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon commented on a change in pull request #10817: MINOR: Log member id of the leader when assignment are received

2021-06-06 Thread GitBox


showuon commented on a change in pull request #10817:
URL: https://github.com/apache/kafka/pull/10817#discussion_r646090260



##
File path: core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala
##
@@ -567,7 +567,7 @@ class GroupCoordinator(val brokerId: Int,
 
 // if this is the leader, then we can attempt to persist state and 
transition to stable
 if (group.isLeader(memberId)) {
-  info(s"Assignment received from leader for group 
${group.groupId} for generation ${group.generationId}. " +
+  info(s"Assignment received from leader $memberId for group 
${group.groupId} for generation ${group.generationId}. " +

Review comment:
   nit: Should we wrap the variable with single/double quote to make it 
clear? Ex: 
   `info(s"Assignment received from leader '$memberId' for group 
'${group.groupId}' for generation '${group.generationId'}. `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] showuon commented on a change in pull request #10827: KAFKA-12899: Support --bootstrap-server in ReplicaVerificationTool

2021-06-06 Thread GitBox


showuon commented on a change in pull request #10827:
URL: https://github.com/apache/kafka/pull/10827#discussion_r646089490



##
File path: core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala
##
@@ -116,7 +121,13 @@ object ReplicaVerificationTool extends Logging {
 if (options.has(versionOpt)) {
   CommandLineUtils.printVersionAndDie()
 }
-CommandLineUtils.checkRequiredArgs(parser, options, brokerListOpt)
+
+val effectiveBrokerListOpt = if (options.has(bootstrapServerOpt))
+  bootstrapServerOpt
+else
+  brokerListOpt

Review comment:
   Is it possible that user provided both `bootstrapServerOpt` and 
`brokerListOpt` at the same time?

##
File path: core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala
##
@@ -116,7 +121,13 @@ object ReplicaVerificationTool extends Logging {
 if (options.has(versionOpt)) {
   CommandLineUtils.printVersionAndDie()
 }
-CommandLineUtils.checkRequiredArgs(parser, options, brokerListOpt)
+
+val effectiveBrokerListOpt = if (options.has(bootstrapServerOpt))
+  bootstrapServerOpt
+else
+  brokerListOpt
+
+CommandLineUtils.checkRequiredArgs(parser, options, effectiveBrokerListOpt)
 
 val regex = options.valueOf(topicWhiteListOpt)
 val topicWhiteListFiler = new IncludeList(regex)

Review comment:
   typo: `topicWhiteListFiler` ->‵topicWhiteListFilter` 

##
File path: core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala
##
@@ -75,9 +75,14 @@ object ReplicaVerificationTool extends Logging {
 
   def main(args: Array[String]): Unit = {
 val parser = new OptionParser(false)
-val brokerListOpt = parser.accepts("broker-list", "REQUIRED: The list of 
hostname and port of the server to connect to.")
+val brokerListOpt = parser.accepts("broker-list", "DEPRECATED, use 
--bootstrap-server instead; ignored if --bootstrap-server is specified. The 
list of hostname and port of the server to connect to.")
  .withRequiredArg
- .describedAs("hostname:port,...,hostname:port")
+ .describedAs("HOST1:PORT1,...,HOST3:PORT3")
+ .ofType(classOf[String])
+val bootstrapServerOpt = parser.accepts("bootstrap-server", "REQUIRED. The 
list of hostname and port of the server to connect to.")

Review comment:
   add tests +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] socutes edited a comment on pull request #10749: KAFKA-12773: Use UncheckedIOException when wrapping IOException

2021-06-06 Thread GitBox


socutes edited a comment on pull request #10749:
URL: https://github.com/apache/kafka/pull/10749#issuecomment-855340683


   @jsancio @showuon Can this PR be merged into the trunk? Is there anything 
else I need to do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org