[
https://issues.apache.org/jira/browse/KAFKA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691875#comment-16691875
]
ASF GitHub Bot commented on KAFKA-7655:
---------------------------------------
Pasvaz opened a new pull request #5929: KAFKA-7655 Metadata spamming requests
from Kafka Streams under some circumstances, potential DOS
URL: https://github.com/apache/kafka/pull/5929
Re-validate and make sure the topic either exists or it's gone by using a
delay.
There is a bug in the InternalTopicManager that makes the client believe
that a topic exists even though it doesn't, it occurs mostly in those few
seconds between when a topic is marked for deletion and when it is actually
deleted. In that timespan, the Broker gives inconsistent information, first it
hides the topic but then it refuses to create a new one therefore the client
believes the topic was existing already and it starts polling for metadata.
The consequence is that the client goes into a loop where it polls for topic
metadata and if this is done by many threads it can take down a small cluster
or degrade greatly its performances.
The real life scenario is probably a reset gone wrong. Reproducing the issue
is fairly simple, these are the steps:
Stop a Kafka streams application
Delete one of its changelog and the local store
Restart the application immediately after the topic delete
You will see the Kafka streams application hanging after the bootstrap
saying something like: INFO Metadata - Cluster ID: xxxx
I am attaching a patch that fixes the issue client side but my personal
opinion is that this should be tackled on the broker as well, metadata requests
seem expensive and it would be easy to craft a DDOS that can potentially take
down an entire cluster in seconds just by flooding the brokers with metadata
requests.
The patch kicks in only when a topic that wasn't existing in the first call
to getNumPartitions triggers a TopicExistsException. When this happens it
forces the re-validation of the topic and if it still looks like doesn't exists
plan a retry with some delay, to give the broker the necessary time to sort it
out.
I think this patch makes sense beside the above mentioned use case where a
topic it's not existing, because, even if the topic was actually created, the
client should not blindly trust it and should still re-validate it by checking
the number of partitions. IE: a topic can be created automatically by the first
request and then it would have the default partitions rather than the expected
ones.
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Metadata spamming requests from Kafka Streams under some circumstances,
> potential DOS
> -------------------------------------------------------------------------------------
>
> Key: KAFKA-7655
> URL: https://issues.apache.org/jira/browse/KAFKA-7655
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.0.1
> Reporter: Pasquale Vazzana
> Priority: Major
> Labels: performance, pull-request-available, security
>
> There is a bug in the InternalTopicManager that makes the client believe that
> a topic exists even though it doesn't, it occurs mostly in those few seconds
> between when a topic is marked for deletion and when it is actually deleted.
> In that timespan, the Broker gives inconsistent information, first it hides
> the topic but then it refuses to create a new one therefore the client
> believes the topic was existing already and it starts polling for metadata.
> The consequence is that the client goes into a loop where it polls for topic
> metadata and if this is done by many threads it can take down a small cluster
> or degrade greatly its performances.
> The real life scenario is probably a reset gone wrong. Reproducing the issue
> is fairly simple, these are the steps:
> * Stop a Kafka streams application
> * Delete one of its changelog and the local store
> * Restart the application immediately after the topic delete
> * You will see the Kafka streams application hanging after the bootstrap
> saying something like: INFO Metadata - Cluster ID: xxxx
>
> I am attaching a patch that fixes the issue client side but my personal
> opinion is that this should be tackled on the broker as well, metadata
> requests seem expensive and it would be easy to craft a DDOS that can
> potentially take down an entire cluster in seconds just by flooding the
> brokers with metadata requests.
> The patch kicks in only when a topic that wasn't existing in the first call
> to getNumPartitions triggers a TopicExistsException. When this happens it
> forces the re-validation of the topic and if it still looks like doesn't
> exists plan a retry with some delay, to give the broker the necessary time to
> sort it out.
> I think this patch makes sense beside the above mentioned use case where a
> topic it's not existing, because, even if the topic was actually created, the
> client should not blindly trust it and should still re-validate it by
> checking the number of partitions. IE: a topic can be created automatically
> by the first request and then it would have the default partitions rather
> than the expected ones.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)