GitHub user danny0405 opened a pull request:
https://github.com/apache/storm/pull/2389
Storm heartbeats promotion
Storm now doesn't support large cluster[ for example thousand of
supervisors] very well, for our production, topology submission/killing is very
ineffective when cluster grows to be large, i checkout the heartbeats strategy
and find that actually it can be promoted.
For heartbeats promotion:
1. Nimbus will not collect heartbeats info from zookeeper every scheduling
round any more, instead, it will use directly an updated cache which is updated
when a heartbeat reports
2. Report heartbeats through supervisor RPC [supervisor will collect local
heartbeats from local workers reported state which is in local state store]
3. Separate metrics data and heartbeats, which means that the new heartbeat
will not contains metrics info now, so it is very lightweight and efficient
4. Metrics data will still be reported to zookeeper, we only use it for
collecting UI stats info [in the old mode, UI stats is got from heartbeats
cache, the new mode will fetch it from zookeeper directly]

With this new heartbeats mode, heartbeats will be reported very
efficiently, for our production, we have about 30 workers per node/supervisor,
so i mock the data and did a pressure test for nimbus heartbeats response time:

We can see that for a 1 second heartbeat report frequency, nimbus will
support at least 2000 nodes, for our production, we set the worker heartbeats
reporting interval to 5 seconds, so it means that we can have a 5* 2000 nodes
cluster for just one cluster
Because we do not need to collect all heartbeats data and compute
alive/free slots for every scheduling round[ use a computed cache directly], we
schedule topologies very efficiently[ only 2 minutes for 5000 topologies]
About robustness:
1. when nimbus collapse, workers works fine[ like the original ], when
leader starts up, it will wait for a complete heartbeats for all node and
start to work again, i also make the strategy pluggable, user can override the
default one
2. when supervisor goes down, workers still workers fine,[ it will report
heartbeat directly to nimbus through RPC], when supervisor goes up, it will
just collect the heartbeats and reports to nimbus
3. when zk is unstable, it will not affect the heartbeats[which will cause
workers all collapse for old mode] any more
This is my JIRA task: https://issues.apache.org/jira/browse/STORM-2693
This is the assignments promotion PR:
https://github.com/apache/storm/pull/2319
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/danny0405/storm heartbeats-promotion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2389.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2389
----
commit 9e06883b9c253ae71bab052fcbc7753f838d61b3
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-08T05:03:09Z
STORM-2320: CHANGELOG
commit b621da98562db58062ec87a46238c0f0ccafb96e
Author: Aaron Dossett <[email protected]>
Date: 2016-03-21T18:06:44Z
STORM-1464: storm-hdfs support for multiple output files and partitioning
commit 657dd8815b5e91d1163e302d6be96510715a4fd7
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-08T20:07:37Z
add STORM-1464 to changelog
commit e372489c0fea259a5b2de4d42bc665593326ed8e
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-08T20:14:43Z
Merge branch 'cyz-dev' of github.com:danny0405/storm into 1.x-branch
commit 2a7e6dc0543c8069efed21b7ced9472eb46b5237
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-08T20:15:33Z
add STORM-2270 to changelog
commit bb5c6b84876da10d842889bb1729ccfab02af7b5
Author: Tibor Kiss <[email protected]>
Date: 2017-02-07T05:11:32Z
STORM-2350: Storm-HDFS's listFilesByModificationTime is broken
commit c417c8ee28384b392ddcbd96366047037164280d
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-08T21:28:06Z
add STORM-2350 to changelog
commit 1e40b02655072270c829057c14a3570a3d6005b9
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-08T04:59:39Z
Convert NoNodeException to KeyNotFoundException in
getNimbodesWithLatestSequenceNumberOfBlob
* since callers are able to handle KeyNotFoundException but not
NoNodeException
commit f91166cf6ccd721201eac114923879fa2c9a4ba6
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-09T06:09:35Z
Merge branch
'fix-nonodeexception-getNimbodesWithLatestSequenceNumberOfBlob-1.x' into
1.x-branch
commit 8b49350cca7bb113a3cdb308cf94f3bbf6a08946
Author: ambud <[email protected]>
Date: 2017-02-04T21:32:16Z
STORM-2344 Adding Flux File Viewer to Nimbus UI
Adding apache license and link to Storm Homepage
Adding links from storm nimbus homepage
Adding License for Javascript libraries. Using min js for esprima
Adding license files
commit ea1c50e2cc68187883abb7222efcaefd7420e947
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-10T03:40:46Z
Merge branch 'STORM-2344-1.x-merge' into 1.x-branch
commit 2128fc34a8a217c9a1b55edec666f25a2646bee6
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-10T03:41:06Z
STORM-2344: CHANGELOG
commit f5a1cf0b25be68a2b188f888a419bb14d270e2bc
Author: mingmxu <[email protected]>
Date: 2017-02-03T20:03:37Z
STORM-2340 fix AutoCommitMode issue in KafkaSpout
* Closes #1919
* fix: KafkaSpout is blocked in AutoCommitMode
* add comments for impacts of AutoCommitMode
* add doc about how to use KafkaSpout with at-most-once.
* remove at-most-once for better describe the changes; emit null msgId when
AutoCommitMode;
* update sample code in storm-kafka-client to use inline setProp()
commit f90d17c9715b6329938f2bd41442da5250a76bdc
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-14T02:53:49Z
Merge branch 'STORM-2340-1.x-merge' into 1.x-branch
commit 191a806de71d3e7526206b7cb6be7fad8f7da0bd
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-14T02:55:07Z
STORM-2340: CHANGELOG
commit a03137ed70a3edf155fc2c06355e12f2d4fb38f6
Author: Stig Rohde Døssing <[email protected]>
Date: 2017-02-14T20:31:45Z
STORM-2250: Kafka spout refactoring to increase modularity and testability.
Also support nanoseconds in Storm time simulation
commit d14c2935effa914ede12e0e038ebb5b732a1ef62
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-15T21:31:25Z
Merge branch 'STORM-2250-1.x' of github.com:srdo/storm into 1.x-branch
commit 8b69d43828532646d3e87d95daa250a05fc8a0be
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-15T21:32:27Z
add STORM-2250 to changelog
commit 17a2017fb644e353fb2a0f5bf50d400ee28036ba
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-16T18:57:43Z
[maven-release-plugin] prepare release v1.1.0
commit ff80b098b5e2110d326d041b73014f5e9fbff395
Author: P. Taylor Goetz <[email protected]>
Date: 2017-02-16T19:01:11Z
[maven-release-plugin] prepare for next development iteration
commit 2f69242d0b3557feb5dc710b9dcb302abbd72aae
Author: Arun Mahadevan <[email protected]>
Date: 2017-02-15T18:20:49Z
STORM-2365: Support for specifying output stream in event hubs spout
commit bdb557dd1c40d4a90d036ff5063df2c51ec90863
Author: Satish Duggana <[email protected]>
Date: 2017-02-17T10:38:22Z
Added STORM-2365 to CHANGELOG.md
commit ee1309d2a9b8cdbe4f5266327d7c62c4f9222781
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-20T00:52:04Z
Fix RAT issue from newly added js files
commit 593d523f874b70ceddcf67fe5dd4fa9af6c8436b
Author: Julien Nioche <[email protected]>
Date: 2017-02-20T17:32:06Z
STORM-2326 Upgrade log4j and slf4j
commit ebed1c8b01397b09f4083e66f574a25f9b7c585d
Author: Kyle Nusbaum <[email protected]>
Date: 2017-02-21T20:18:31Z
Fixing pacemaker delete-path bug.
commit 187d08bf45bf424f3963a604d72e076b00d594c7
Author: Sachin Pasalkar <[email protected]>
Date: 2017-02-14T10:24:23Z
STORM-1363: TridentKafkaState should handle null values from
TridentTupleToKafkaMapper.getMessageFromTuple()
Incase null value comes from the mapper it will print warning messages also
added the time take to emit number od messages in logs
commit 4fc55b33504d446ef192a25e5164f861e1495291
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-22T07:35:53Z
Merge branch 'STORM-1363-1.x' into 1.x-branch
commit d5f4c4021984bad9044654284f7d43ce03d24f41
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-22T07:39:16Z
STORM-1363: CHANGELOG
commit d4ff6b51f206e85e65bb5c4d06b4c28c828df174
Author: Hugo Louro <[email protected]>
Date: 2017-02-22T23:03:16Z
STORM-2374: Storm Kafka Client Func Interface Must be Serializable
commit 5de6e1dd46dadc5451fa0e3669120617aa8bbf8e
Author: Jungtaek Lim <[email protected]>
Date: 2017-02-22T04:03:45Z
Add Storm SQL docs to index page for 1.x branch
----
---