[
https://issues.apache.org/jira/browse/KAFKA-20533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apoorv Mittal resolved KAFKA-20533.
-----------------------------------
Fix Version/s: 4.4.0
Resolution: Fixed
> ShareFetch returns UNKNOWN_SERVER_ERROR when topic is deleted during active
> share consumption
> ---------------------------------------------------------------------------------------------
>
> Key: KAFKA-20533
> URL: https://issues.apache.org/jira/browse/KAFKA-20533
> Project: Kafka
> Issue Type: Bug
> Reporter: Pranav Rathi
> Assignee: Apoorv Mittal
> Priority: Major
> Fix For: 4.4.0
>
> Attachments: broker-server.log, share-consumer.log
>
>
> While working on the librdkafka KIP-932 share consumer implementation, I
> found that when a topic is deleted while a share consumer is actively
> fetching from it, the broker returns {{UNKNOWN_SERVER_ERROR}} (-1) as the
> top-level error in the ShareFetchResponse.
> I enabled DEBUG logging on the broker for {{kafka.server.KafkaApis}} and
> {{org.apache.kafka.server.share}} and found that the broker internally
> identifies the correct exception — {{UnknownTopicOrPartitionException}} — but
> by the time the response reaches the wire, the error code has been replaced
> with {{{}UNKNOWN_SERVER_ERROR{}}}. The expected behavior is for the broker to
> return the per-partition error code it already identifies internally
> ({{{}UNKNOWN_TOPIC_OR_PARTITION{}}} or {{{}UNKNOWN_TOPIC_ID{}}}) so that
> clients can identify the cause and respond accordingly.
> h3. Reproduction
> # Create topic {{demo-1}} with 1 partition
> # Start a producer sending messages to {{demo-1}} at 1 msg/s
> # Start a share consumer subscribed to {{demo-1}} — it receives messages
> normally
> # Stop the producer
> # Delete topic {{demo-1}}
> The share consumer immediately starts receiving {{UNKNOWN_SERVER_ERROR}} at a
> very high rate.
> h3. Observed Timeline
> ||Event||Timestamp||
> |Topic deleted (log renamed and scheduled for deletion)|15:37:31.800|
> |First {{UNKNOWN_SERVER_ERROR}} received by client|15:37:32.150|
> |Last {{UNKNOWN_SERVER_ERROR}} received by client|15:37:34.231|
> |Total {{UNKNOWN_SERVER_ERROR}} responses|3,187|
> The errors lasted approximately 2 seconds until the share session stopped
> including the partition.
> h3. Broker vs Client Error Mismatch
> The broker DEBUG logs show all 3,187 occurrences identified as
> {{UnknownTopicOrPartitionException}} (error code 3):
> {code:java}
> [KafkaApi-1] Share Fetch request with correlation id 273 from client rdkafka
> on partition 38WyjFvWQeeprA7A2i8blg:null-0 failed due to
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException
> ... (3,187 identical entries)
> {code}
> But the client receives {{UNKNOWN_SERVER_ERROR}} (error code -1) for all
> 3,187 responses:
> {code:java}
> ShareFetch response error UNKNOWN: ''
> ... (3,187 identical errors)
> {code}
> The counts match exactly — every request where the broker internally
> identifies {{UnknownTopicOrPartitionException}} results in the client
> receiving {{{}UNKNOWN_SERVER_ERROR{}}}.
> Full broker and client logs from the latest reproduction are attached.
>
> ------------------------------------------------------------------------------
> In a separate earlier test run with a multi-broker cluster and a similar
> topic-deletion-while-subscribed scenario, I also observed this ERROR in the
> broker logs:
> {code:java}
> [2026-04-08 19:01:28,379] ERROR Unable to perform write state RPC for key
> SharePartitionKey{groupId=share-topic-deletion-while-subscribed,
> topicIdPartition=aR9iBS_7SyKidYbzhwJf_g:null-0}:
> Write operation on uninitialized share partition not allowed.
> org.apache.kafka.common.errors.UnknownServerException:
> Error in write state RPC. Write operation on uninitialized share partition
> not allowed.
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)