GitHub user moesol opened a pull request: https://github.com/apache/storm/pull/1393
(STORM-1674) Idle KafkaSpout consumes more bandwidth than needed * Allows minBytes in fetch request to be configured from KafkaConfig.fetchMinBytes. * Defaults new configuration KafkaConfig.fetchMinBytes to 1. * Defaults fetchMaxWait to 100ms instead of 10000ms. Discovered 30 megabits of traffic flowing between a set of KafkaSpouts and our kafka servers even though no Kafka messages were moving. Using the wireshark kafka dissector, we were able to see that each FetchRequest had maxWait set to 10000 and minBytes set to 0. When binBytes is set to 0 the kafka server responds immediately when there are no messages. In turn the KafkaSpout polls without any delay causing a constant stream of FetchRequest/ FetchResponse messages. Using a non-KafkaSpout client had a similar traffic pattern with two key differences 1) minBytes was 1 2) maxWait was 100 With these FetchRequest parameters and no messages flowing, the kafka server delays the FetchResponse by 100 ms. This reduces the network traffic from megabits to the low kilobits. It also reduced the CPU utilization of our kafka server from 140% to 2%. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MoebiusSolutions/storm 1.x-branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/1393.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1393 ---- commit ff43309e39fb1db2bf2ae5a3d7d0972440880dca Author: Robert Hastings <rhasti...@moesol.com> Date: 2016-05-03T20:40:56Z Addresses network flood from KafkaSpout to kafka server. * Allows minBytes in fetch request to be configured from KafkaConfig.fetchMinBytes. * Defaults new configuration KafkaConfig.fetchMinBytes to 1. * Defaults fetchMaxWait to 100ms instead of 10000ms. Discovered 30 megabits of traffic flowing between a set of KafkaSpouts and our kafka servers even though no Kafka messages were moving. Using the wireshark kafka dissector, we were able to see that each FetchRequest had maxWait set to 10000 and minBytes set to 0. When binBytes is set to 0 the kafka server responds immediately when there are no messages. In turn the KafkaSpout polls without any delay causing a constant stream of FetchRequest/ FetchResponse messages. Using a non-KafkaSpout client had a similar traffic pattern with two key differences 1) minBytes was 1 2) maxWait was 100 With these FetchRequest parameters and no messages flowing, the kafka server delays the FetchResponse by 100 ms. This reduces the network traffic from megabits to the low kilobits. It also reduced the CPU utilization of our kafka server from 140% to 2%. Hopefully the risk of this change is low because the old behavior can be restored using the API by setting KafkaConfig.fetchMaxWait to 10000 KafkaConfig.fetchMinBytes to 0 Conflicts: external/storm-kafka/src/jvm/storm/kafka/KafkaConfig.java external/storm-kafka/src/jvm/storm/kafka/KafkaUtils.java commit d936ad78169283fcd8e0b8923eefc4063b59f074 Author: Robert Hastings <rhasti...@moesol.com> Date: 2016-05-03T21:19:21Z Merge remote-tracking branch 'apache/1.x-branch' into 1.x-branch Conflicts: external/storm-kafka/src/jvm/org/apache/storm/kafka/KafkaUtils.java ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---