GitHub user moesol opened a pull request:

    https://github.com/apache/storm/pull/1393

    (STORM-1674) Idle KafkaSpout consumes more bandwidth than needed

     * Allows minBytes in fetch request to be configured
          from KafkaConfig.fetchMinBytes.
        * Defaults new configuration KafkaConfig.fetchMinBytes to 1.
        * Defaults fetchMaxWait to 100ms instead of 10000ms.
    
        Discovered 30 megabits of traffic flowing between a set of KafkaSpouts
        and our kafka servers even though no Kafka messages were moving.
        Using the wireshark kafka dissector, we were able to see that
        each FetchRequest had maxWait set to 10000
        and minBytes set to 0. When binBytes is set to 0 the kafka server
        responds immediately when there are no messages. In turn the KafkaSpout
        polls without any delay causing a constant stream of FetchRequest/
        FetchResponse messages. Using a non-KafkaSpout client had a similar
        traffic pattern with two key differences
        1) minBytes was 1
        2) maxWait was 100
        With these FetchRequest parameters and no messages flowing,
        the kafka server delays the FetchResponse by 100 ms. This reduces
        the network traffic from megabits to the low kilobits. It also
        reduced the CPU utilization of our kafka server from 140% to 2%.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MoebiusSolutions/storm 1.x-branch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1393
    
----
commit ff43309e39fb1db2bf2ae5a3d7d0972440880dca
Author: Robert Hastings <rhasti...@moesol.com>
Date:   2016-05-03T20:40:56Z

    Addresses network flood from KafkaSpout to kafka server.
    
    * Allows minBytes in fetch request to be configured
      from KafkaConfig.fetchMinBytes.
    * Defaults new configuration KafkaConfig.fetchMinBytes to 1.
    * Defaults fetchMaxWait to 100ms instead of 10000ms.
    
    Discovered 30 megabits of traffic flowing between a set of KafkaSpouts
    and our kafka servers even though no Kafka messages were moving.
    Using the wireshark kafka dissector, we were able to see that
    each FetchRequest had maxWait set to 10000
    and minBytes set to 0. When binBytes is set to 0 the kafka server
    responds immediately when there are no messages. In turn the KafkaSpout
    polls without any delay causing a constant stream of FetchRequest/
    FetchResponse messages. Using a non-KafkaSpout client had a similar
    traffic pattern with two key differences
    1) minBytes was 1
    2) maxWait was 100
    With these FetchRequest parameters and no messages flowing,
    the kafka server delays the FetchResponse by 100 ms. This reduces
    the network traffic from megabits to the low kilobits. It also
    reduced the CPU utilization of our kafka server from 140% to 2%.
    
    Hopefully the risk of this change is low because
    the old behavior can be restored using the API by setting
    KafkaConfig.fetchMaxWait to 10000
    KafkaConfig.fetchMinBytes to 0
    
    Conflicts:
        external/storm-kafka/src/jvm/storm/kafka/KafkaConfig.java
        external/storm-kafka/src/jvm/storm/kafka/KafkaUtils.java

commit d936ad78169283fcd8e0b8923eefc4063b59f074
Author: Robert Hastings <rhasti...@moesol.com>
Date:   2016-05-03T21:19:21Z

    Merge remote-tracking branch 'apache/1.x-branch' into 1.x-branch
    
    Conflicts:
        external/storm-kafka/src/jvm/org/apache/storm/kafka/KafkaUtils.java

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to