Pranav Rathi created KAFKA-20533:
------------------------------------

             Summary: ShareFetch returns UNKNOWN_SERVER_ERROR when topic is 
deleted during active share consumption
                 Key: KAFKA-20533
                 URL: https://issues.apache.org/jira/browse/KAFKA-20533
             Project: Kafka
          Issue Type: Bug
            Reporter: Pranav Rathi


While working on the librdkafka KIP-932 share consumer implementation, I found 
that when a topic is deleted while a share consumer is actively fetching from 
it, the broker returns {{UNKNOWN_SERVER_ERROR}} (-1) as the top-level error in 
the ShareFetchResponse.

I enabled DEBUG logging on the broker for {{kafka.server.KafkaApis}} and 
{{org.apache.kafka.server.share}} and found that the broker internally 
identifies the correct exception — {{UnknownTopicOrPartitionException}} — but 
by the time the response reaches the wire, the error code has been replaced 
with {{{}UNKNOWN_SERVER_ERROR{}}}. The expected behavior is for the broker to 
return the per-partition error code it already identifies internally 
({{{}UNKNOWN_TOPIC_OR_PARTITION{}}} or {{{}UNKNOWN_TOPIC_ID{}}}) so that 
clients can identify the cause and respond accordingly.
h3. Reproduction
 # Create topic {{demo-1}} with 1 partition
 # Start a producer sending messages to {{demo-1}} at 1 msg/s
 # Start a share consumer subscribed to {{demo-1}} — it receives messages 
normally
 # Stop the producer
 # Delete topic {{demo-1}}

The share consumer immediately starts receiving {{UNKNOWN_SERVER_ERROR}} at a 
very high rate.
h3. Observed Timeline
||Event||Timestamp||
|Topic deleted (log renamed and scheduled for deletion)|15:37:31.800|
|First {{UNKNOWN_SERVER_ERROR}} received by client|15:37:32.150|
|Last {{UNKNOWN_SERVER_ERROR}} received by client|15:37:34.231|
|Total {{UNKNOWN_SERVER_ERROR}} responses|3,187|

The errors lasted approximately 2 seconds until the share session stopped 
including the partition.
h3. Broker vs Client Error Mismatch

The broker DEBUG logs show all 3,187 occurrences identified as 
{{UnknownTopicOrPartitionException}} (error code 3):
{code:java}
[KafkaApi-1] Share Fetch request with correlation id 273 from client rdkafka
  on partition 38WyjFvWQeeprA7A2i8blg:null-0 failed due to
  org.apache.kafka.common.errors.UnknownTopicOrPartitionException
... (3,187 identical entries)
{code}
But the client receives {{UNKNOWN_SERVER_ERROR}} (error code -1) for all 3,187 
responses:
{code:java}
ShareFetch response error UNKNOWN: ''
... (3,187 identical errors)
{code}
The counts match exactly — every request where the broker internally identifies 
{{UnknownTopicOrPartitionException}} results in the client receiving 
{{{}UNKNOWN_SERVER_ERROR{}}}.

Full broker and client logs from the latest reproduction are attached.

 

------------------------------------------------------------------------------

In a separate earlier test run with a multi-broker cluster and a similar 
topic-deletion-while-subscribed scenario, I also observed this ERROR in the 
broker logs:
{code:java}
[2026-04-08 19:01:28,379] ERROR Unable to perform write state RPC for key
  SharePartitionKey{groupId=share-topic-deletion-while-subscribed,
  topicIdPartition=aR9iBS_7SyKidYbzhwJf_g:null-0}:
  Write operation on uninitialized share partition not allowed.

org.apache.kafka.common.errors.UnknownServerException:
  Error in write state RPC. Write operation on uninitialized share partition 
not allowed.
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to