GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/2704

    STORM-1038: Upgrade to Netty 4

    This is a continuation of the work done at 
https://github.com/apache/storm/pull/728.
    
    ### Important changes:
    * Split MessageEncoder into multiple encoders depending on the message 
type, and made the stateless ones singletons
    * Moved worker context termination to after the transfer thread has shut 
down. Shutting the context down first looks racy to me, since the transfer 
thread uses it.
    * ChannelGroups automatically remove closed Channels, so 
Server/PacemakerServer doesn't bother removing closed channels manually anymore
    * Made some changes to TransferDrainer for readability. I didn't see any 
performance impact from them, and they aren't necessary to upgrade to Netty 4. 
Let me know if they should go in another PR.
    
    # Benchmarking
    I benchmarked on a single-node install. Results are accessible at 
https://drive.google.com/open?id=1OuYHusyshQbzx38q5jchj8W98WG6An0L.
    
    ## Speed of light
    Ran `./storm jar .\storm-perf-2.0.0-SNAPSHOT.jar 
org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo 600 
.\ConstSpoutIdBoltNullBoltTopo.yaml` with the following config
    
    ```
    spout.count : 1
    bolt1.count : 1  # IdBolt instances
    bolt2.count : 1  # DevNullBolt instances
    
    topology.workers : 3
    ```
    This should force all messages to cross to another worker, which hopefully 
does a good job showing any decrease in throughput for the changed code.
    
    ## TVL
    I didn't put much thought into setting the number of workers, or component 
counts here, let me know if I should try rerunning with some other numbers.
    
    Ran `./storm jar .\storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency --rate 90000 --spouts 4 
--splitters 4 --counters 4 --reporter 'tsv:test.csv' -c topology.workers=4 
--test-time 10`.
    
    90k tuples is more than my machine can handle, so this is mostly to 
demonstrate that Storm doesn't choke any worse than it did before.
    
    Also ran `./storm jar .\storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency --rate 75000 --spouts 4 
--splitters 4 --counters 4 --reporter 'tsv:test.csv' -c topology.workers=4 
--test-time 10`
    
    75k is right on the edge of what my computer can put through, so it shows 
what latencies look like compared to master. 
    
    Performance looks pretty much the same to me pre- and post these changes. 
Let me know if there are other tests I should run to validate these changes.
    
    # To do at some point
    * ByteBufs are reference counted, so we should validate that we don't leak 
references when testing. Netty can tell us when a leak occurs, but I haven't 
set anything up to fail the build if these logs occur during testing yet 
https://netty.io/wiki/reference-counted-objects.html#leak-detection-levels. I 
did run some tests with the detection level set to paranoid, and didn't see any 
leaks.
    * Didn't really test Pacemaker

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-1038-clean

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2704
    
----
commit 20db8c339cbabefab1fe4cccb74d68c87cc36264
Author: Hang Sun <hsun@...>
Date:   2018-05-11T13:57:47Z

    STORM-1038: Upgrade to Netty 4.x. See 
https://github.com/apache/storm/pull/728 for the original contribution.

commit 2d0f96102e20ca1c89fb65d05a860456c9d10a96
Author: Stig Rohde Døssing <srdo@...>
Date:   2018-05-11T14:55:35Z

    STORM-1038: Upgrade to latest Netty

commit 5fbe93339061c539ae20e06c7158f892ba4abb3c
Author: Stig Rohde Døssing <srdo@...>
Date:   2018-06-05T14:32:06Z

    Refactor of transfer drainer

----


---

Reply via email to