GitHub user moesol opened a pull request:
https://github.com/apache/storm/pull/1287
Addresses network flood from KafkaSpout to kafka server.
* Allows minBytes in fetch request to be configured
from KafkaConfig.fetchMinBytes.
* Defaults new configuration KafkaConfig.fetchMinBytes to 1.
* Defaults fetchMaxWait to 100ms instead of 10000ms.
Discovered 30 megabits of traffic flowing between a set of KafkaSpouts
and our kafka servers even though no Kafka messages were moving.
Using the wireshark kafka dissector, we were able to see that
each FetchRequest had maxWait set to 10000
and minBytes set to 0. When binBytes is set to 0 the kafka server
responds immediately when there are no messages. In turn the KafkaSpout
polls without any delay causing a constant stream of FetchRequest/
FetchResponse messages. Using a non-KafkaSpout client had a similar
traffic pattern with two key differences
1) minBytes was 1
2) maxWait was 100
With these FetchRequest parameters and no messages flowing,
the kafka server delays the FetchResponse by 100 ms. This reduces
the network traffic from megabits to the low kilobits. It also
reduced the CPU utilization of our kafka server from 140% to 2%.
Hopefully the risk of this change is low because
the old behavior can be restored using the API by setting
KafkaConfig.fetchMaxWait to 10000
KafkaConfig.fetchMinBytes to 0
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MoebiusSolutions/storm 0.9.3-branch
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1287.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1287
----
commit 66c38ad28ce597a4df6c05db3264ff94960d6764
Author: Robert Hastings <[email protected]>
Date: 2016-03-31T23:14:47Z
Addresses network flood from KafkaSpout to kafka server.
* Allows minBytes in fetch request to be configured
from KafkaConfig.fetchMinBytes.
* Defaults new configuration KafkaConfig.fetchMinBytes to 1.
* Defaults fetchMaxWait to 100ms instead of 10000ms.
Discovered 30 megabits of traffic flowing between a set of KafkaSpouts
and our kafka servers even though no Kafka messages were moving.
Using the wireshark kafka dissector, we were able to see that
each FetchRequest had maxWait set to 10000
and minBytes set to 0. When binBytes is set to 0 the kafka server
responds immediately when there are no messages. In turn the KafkaSpout
polls without any delay causing a constant stream of FetchRequest/
FetchResponse messages. Using a non-KafkaSpout client had a similar
traffic pattern with two key differences
1) minBytes was 1
2) maxWait was 100
With these FetchRequest parameters and no messages flowing,
the kafka server delays the FetchResponse by 100 ms. This reduces
the network traffic from megabits to the low kilobits. It also
reduced the CPU utilization of our kafka server from 140% to 2%.
Hopefully the risk of this change is low because
the old behavior can be restored using the API by setting
KafkaConfig.fetchMaxWait to 10000
KafkaConfig.fetchMinBytes to 0
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---