nileshkumar3 opened a new pull request, #21760:
URL: https://github.com/apache/kafka/pull/21760

   Description:
   
   This PR fixes a potential NullPointerException in 
OffsetFetcherUtils.regroupPartitionMapByNode when regrouping partitions by 
leader during offset reset / list-offsets.
   
   Background
   
   Partitions are grouped by leader via metadata.fetch().leaderFor(tp). If 
metadata changes between the initial leader lookup and the regroup step (e.g. 
leadership change or stale metadata), leaderFor(tp) can return null. The 
previous implementation used Collectors.groupingBy(..., leaderFor(...)), which 
throws an NPE when the classifier returns null.
   
   Fix
   
   OffsetFetcherUtils.regroupPartitionMapByNode
   Replaced the stream-based grouping with a loop that skips partitions whose 
leader is null, adds them to a caller-provided partitionsToRetry set, and does 
not trigger metadata refresh (callers are responsible for retry and metadata).
   
   Callers
   
   OffsetFetcher (classic consumer): passes partitionsToRetry into the helper; 
in resetPositionsAsync, when the set is non-empty, calls 
setNextAllowedRetry(partitionsToRetry, now + retryBackoffMs) and 
metadata.requestUpdate(false).
   OffsetsRequestManager (new consumer): passes a local retry set into the 
helper, then adds skipped partitions to state.remainingToSearch (with 
timestamp) and calls metadata.requestUpdate(false) when the set is non-empty.
   This keeps existing retry semantics and avoids the NPE.
   
   Tests
   
   
OffsetFetcherTest.testResetPositionsMetadataRefreshWhenLeaderBecomesUnknownDuringRegroup
   Simulates leaderFor(tp) returning null during regroup (first 
metadata.fetch() stubbed to a cluster with no partition, then real method). 
Asserts no exception, partition stays pending reset, and after backoff and a 
second attempt with valid metadata the offset reset succeeds.
   
   
OffsetsRequestManagerTest.testFetchOffsetsRegroupSkipsNullLeaderPartition_NoNPE
   Simulates the same scenario in the fetch-offsets path: currentLeader has a 
leader but metadata.fetch() returns a cluster where one partition has no 
leader. Asserts no NPE, one request sent (for the partition with a leader), and 
that the skipped partition is retried after metadata update and completes 
successfully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to