[ 
https://issues.apache.org/jira/browse/KAFKA-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ömer Şiar Baysal updated KAFKA-16028:
-------------------------------------
    Description: 
Dear Team,

We have been investigating some quirky behavior around admin client.  Here is 
our conclusion:
 - Due to some bug (or a feature not known by us) AdminClient (both 2.8 and 
3.6) fails to describe one of the consumer groups (with no known problems about 
it)
 - Pure GoLang admin client does not have the problem 
(github.com/twmb/franz-go) and able to describe the consumer group.

We tried to understand what may cause the issue, first of all, the Java client 
2.8 reported,
{quote}kafka-consumer-groups --bootstrap-server broker:9092 --describe --group 
'problematic-consumer'
Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
{quote}
we waited if this is a transient error but it turned out it is not, there was 
no election for the given topic

But it was not clear which topic admin client was talking about so TRACE log 
revealed some more information:
{quote}[2023-12-18 10:36:38,434] DEBUG [AdminClient clientId=adminclient-1] 
Sending LIST_OFFSETS request with header RequestHeader(apiKey=LIST_OFFSETS, 
apiVersion=6, clientId=adminclient-1, correlationId=30) and timeout 4997 to 
node 40: ListOffsetsRequestData(replicaId=-1, isolationLevel=0, 
topics=[ListOffsetsTopic(name='problematic-topic', 
partitions=[ListOffsetsPartition(partitionIndex=4, currentLeaderEpoch=-1, 
timestamp=-1, maxNumOffsets=1), ListOffsetsPartition(partitionIndex=5, 
currentLeaderEpoch=-1, timestamp=-1, maxNumOffsets=1)])]) 
(org.apache.kafka.clients.NetworkClient)
[2023-12-18 10:36:38,434] TRACE [AdminClient clientId=adminclient-1] Entering 
KafkaClient#poll(timeout=4997) (org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] 
KafkaClient#poll retrieved 0 response(s) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Trying to 
choose nodes for [] at 1702884998435 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Entering 
KafkaClient#poll(timeout=4995) (org.apache.kafka.clients.admin.KafkaAdminClient)

Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
[2023-12-18 10:36:38,436] DEBUG [AdminClient clientId=adminclient-1] Received 
LIST_OFFSETS response from node 40 for request with header 
RequestHeader(apiKey=LIST_OFFSETS, apiVersion=6, clientId=adminclient-1, 
correlationId=30): ListOffsetsResponseData(throttleTimeMs=0, 
topics=[ListOffsetsTopicResponse(name='problematic-topic', 
partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0, 
oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4, 
errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0)])]) (org.apache.kafka.clients.NetworkClient)
[2023-12-18 10:36:38,436] TRACE [AdminClient clientId=adminclient-1] 
KafkaClient#poll retrieved 1 response(s) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] 
Call(callName=listOffsets on broker 40, deadlineMs=1702885003430, tries=0, 
nextAllowedTryMs=0) got response ListOffsetsResponseData(throttleTimeMs=0, 
topics=[ListOffsetsTopicResponse(name='problematic-topic', 
partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0, 
oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4, 
errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0)])]) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Trying to 
choose nodes for [] at 1702884998436 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Entering 
KafkaClient#poll(timeout=299161) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
adminclient version 3.6 is not returning this error, but it fails with a 
timeout after retrying is exhausted.
{quote}
We have also took a look into "problematic-topic", reassigned replicas to other 
brokers, ran kafka-leader-election over all partitions, did not help

  was:
Dear Team,

We have been investigating some quirky behavior around admin client.  Here is 
our conclusion:

- Due to some bug (or a feature not known by us) AdminClient (both 2.8 and 3.6) 
fails to describe one of the consumer groups (with no known problems about it)
- Pure GoLang admin client does not have the problem (github.com/twmb/franz-go) 
and able to describe the consumer group.

We tried to understand what may cause the issue, first of all, the Java client 
2.8 reported,


kafka-consumer-groups --bootstrap-server broker:9092 --describe --group 
'problematic-consumer'
Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.

we waited if this is a transient error but it turned out it is not, there was 
no election for the given topic

But it was not clear which topic admin client was talking about so TRACE log 
revealed some more information:


[2023-12-18 10:36:38,434] DEBUG [AdminClient clientId=adminclient-1] Sending 
LIST_OFFSETS request with header RequestHeader(apiKey=LIST_OFFSETS, 
apiVersion=6, clientId=adminclient-1, correlationId=30) and timeout 4997 to 
node 40: ListOffsetsRequestData(replicaId=-1, isolationLevel=0, 
topics=[ListOffsetsTopic(name='problematic-topic', 
partitions=[ListOffsetsPartition(partitionIndex=4, currentLeaderEpoch=-1, 
timestamp=-1, maxNumOffsets=1), ListOffsetsPartition(partitionIndex=5, 
currentLeaderEpoch=-1, timestamp=-1, maxNumOffsets=1)])]) 
(org.apache.kafka.clients.NetworkClient)
[2023-12-18 10:36:38,434] TRACE [AdminClient clientId=adminclient-1] Entering 
KafkaClient#poll(timeout=4997) (org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] 
KafkaClient#poll retrieved 0 response(s) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Trying to 
choose nodes for [] at 1702884998435 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Entering 
KafkaClient#poll(timeout=4995) (org.apache.kafka.clients.admin.KafkaAdminClient)

Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
[2023-12-18 10:36:38,436] DEBUG [AdminClient clientId=adminclient-1] Received 
LIST_OFFSETS response from node 40 for request with header 
RequestHeader(apiKey=LIST_OFFSETS, apiVersion=6, clientId=adminclient-1, 
correlationId=30): ListOffsetsResponseData(throttleTimeMs=0, 
topics=[ListOffsetsTopicResponse(name='problematic-topic', 
partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0, 
oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4, 
errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0)])]) (org.apache.kafka.clients.NetworkClient)
[2023-12-18 10:36:38,436] TRACE [AdminClient clientId=adminclient-1] 
KafkaClient#poll retrieved 1 response(s) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] 
Call(callName=listOffsets on broker 40, deadlineMs=1702885003430, tries=0, 
nextAllowedTryMs=0) got response ListOffsetsResponseData(throttleTimeMs=0, 
topics=[ListOffsetsTopicResponse(name='problematic-topic', 
partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0, 
oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4, 
errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93, 
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
followerRestorePointEpoch=0)])]) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Trying to 
choose nodes for [] at 1702884998436 
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Entering 
KafkaClient#poll(timeout=299161) 
(org.apache.kafka.clients.admin.KafkaAdminClient)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
adminclient version 3.6 is not returning this error, but it fails with a 
timeout after retrying is exhausted.

We have also took a look into "problematic-topic", reassigned replicas to other 
brokers, ran kafka-leader-election over all partitions, did not help


> AdminClient fails to describe consumer group
> --------------------------------------------
>
>                 Key: KAFKA-16028
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16028
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin, clients, consumer, log
>    Affects Versions: 2.8.2, 3.6.1
>            Reporter: Ömer Şiar Baysal
>            Priority: Critical
>
> Dear Team,
> We have been investigating some quirky behavior around admin client.  Here is 
> our conclusion:
>  - Due to some bug (or a feature not known by us) AdminClient (both 2.8 and 
> 3.6) fails to describe one of the consumer groups (with no known problems 
> about it)
>  - Pure GoLang admin client does not have the problem 
> (github.com/twmb/franz-go) and able to describe the consumer group.
> We tried to understand what may cause the issue, first of all, the Java 
> client 2.8 reported,
> {quote}kafka-consumer-groups --bootstrap-server broker:9092 --describe 
> --group 'problematic-consumer'
> Error: Executing consumer group command failed due to 
> org.apache.kafka.common.errors.LeaderNotAvailableException: There is no 
> leader for this topic-partition as we are in the middle of a leadership 
> election.
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.LeaderNotAvailableException: There is no 
> leader for this topic-partition as we are in the middle of a leadership 
> election.
> {quote}
> we waited if this is a transient error but it turned out it is not, there was 
> no election for the given topic
> But it was not clear which topic admin client was talking about so TRACE log 
> revealed some more information:
> {quote}[2023-12-18 10:36:38,434] DEBUG [AdminClient clientId=adminclient-1] 
> Sending LIST_OFFSETS request with header RequestHeader(apiKey=LIST_OFFSETS, 
> apiVersion=6, clientId=adminclient-1, correlationId=30) and timeout 4997 to 
> node 40: ListOffsetsRequestData(replicaId=-1, isolationLevel=0, 
> topics=[ListOffsetsTopic(name='problematic-topic', 
> partitions=[ListOffsetsPartition(partitionIndex=4, currentLeaderEpoch=-1, 
> timestamp=-1, maxNumOffsets=1), ListOffsetsPartition(partitionIndex=5, 
> currentLeaderEpoch=-1, timestamp=-1, maxNumOffsets=1)])]) 
> (org.apache.kafka.clients.NetworkClient)
> [2023-12-18 10:36:38,434] TRACE [AdminClient clientId=adminclient-1] Entering 
> KafkaClient#poll(timeout=4997) 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> [2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] 
> KafkaClient#poll retrieved 0 response(s) 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> [2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Trying 
> to choose nodes for [] at 1702884998435 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> [2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Entering 
> KafkaClient#poll(timeout=4995) 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> Error: Executing consumer group command failed due to 
> org.apache.kafka.common.errors.LeaderNotAvailableException: There is no 
> leader for this topic-partition as we are in the middle of a leadership 
> election.
> [2023-12-18 10:36:38,436] DEBUG [AdminClient clientId=adminclient-1] Received 
> LIST_OFFSETS response from node 40 for request with header 
> RequestHeader(apiKey=LIST_OFFSETS, apiVersion=6, clientId=adminclient-1, 
> correlationId=30): ListOffsetsResponseData(throttleTimeMs=0, 
> topics=[ListOffsetsTopicResponse(name='problematic-topic', 
> partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0, 
> oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113, 
> followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
> followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4, 
> errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93, 
> followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
> followerRestorePointEpoch=0)])]) (org.apache.kafka.clients.NetworkClient)
> [2023-12-18 10:36:38,436] TRACE [AdminClient clientId=adminclient-1] 
> KafkaClient#poll retrieved 1 response(s) 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> [2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] 
> Call(callName=listOffsets on broker 40, deadlineMs=1702885003430, tries=0, 
> nextAllowedTryMs=0) got response ListOffsetsResponseData(throttleTimeMs=0, 
> topics=[ListOffsetsTopicResponse(name='problematic-topic', 
> partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0, 
> oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113, 
> followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
> followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4, 
> errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93, 
> followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA, 
> followerRestorePointEpoch=0)])]) 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> [2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Trying 
> to choose nodes for [] at 1702884998436 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> [2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Entering 
> KafkaClient#poll(timeout=299161) 
> (org.apache.kafka.clients.admin.KafkaAdminClient)
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.LeaderNotAvailableException: There is no 
> leader for this topic-partition as we are in the middle of a leadership 
> election.
> adminclient version 3.6 is not returning this error, but it fails with a 
> timeout after retrying is exhausted.
> {quote}
> We have also took a look into "problematic-topic", reassigned replicas to 
> other brokers, ran kafka-leader-election over all partitions, did not help



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to