Hongten opened a new pull request, #21388: URL: https://github.com/apache/kafka/pull/21388
This PR resolved the timeout exception triggering reassign partitions with --bootstrap-server option. More can be found https://issues.apache.org/jira/browse/KAFKA-13392. **Root cause** When we run a reassignment using a plan file (e.g. xxx.json), the plan may still include replicas on the down broker. During the execution, we try to apply throttling by calling `adminClient.incrementalAlterConfigs(configs)`. The issue is: this API needs to connect to the target broker to set the broker-level throttle configs. If the broker is down, it’s obviously unreachable, so the client keeps retrying and eventually times out → TimeoutException. **My proposed solution** Add a new parameter: `--broker-list-without-throttle` Description: Optional. Comma-separated broker ID list (e.g. 1,2) that should be excluded from broker-level throttle config updates during partition reassignment execution. When --execute and --throttle are used, it normally applies throttle configs on all brokers involved in the reassignment. If any of those brokers are known to be down or unreachable, adding them to --broker-list-without-throttle makes it skip the throttle-setting step for those brokers, avoiding retries/timeouts, while still throttling the remaining reachable brokers. Value: a list of broker IDs, comma-separated Example: 1001 or 1001,1002 ```/opt/kafka/bin/kafka-reassign-partitions.sh \ --bootstrap-server xxx.xxx.xxx.xxx:9092 \ --reassignment-json-file reassignment-test.json \ --throttle 209715200 \ --execute \ --broker-list-without-throttle 1001 ``` If broker 1001 is known to be down, and the reassignment plan includes it, then we exclude 1001 from throttle config changes. **Why this is needed** - If we don’t use '--throttle' at all, then Kafka won’t set throttle on any broker (including the down one). But that’s risky, migrations can easily blow up network bandwidth or disk IO. - If we only skip throttling for the known down broker, it doesn’t change the reassignment logic itself, and it avoids the timeout. Meanwhile, healthy brokers still get throttled properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
