Jungtaek Lim created STORM-872: ---------------------------------- Summary: Expose interfaces to give users opportunity to customize "executor" heartbeat module Key: STORM-872 URL: https://issues.apache.org/jira/browse/STORM-872 Project: Apache Storm Issue Type: Improvement Reporter: Jungtaek Lim
I have seen many papers and slides addressing heartbeat timeout, and most of these point out ZK is the reason. - Storm@Twitter, SIGMOD 2014 : http://db.cs.berkeley.edu/cs286/papers/storm-sigmod2014.pdf - Scaling Storm, Hadoop Summit 2015 : https://github.com/revans2/storm-presentations/blob/master/Haddop_Summit_2015_Scaling_Storm.pptx - and so on. ZK has a hard limit of throughput and reading / writing disk is the matter. Throughput would be far more better when we're dealing worker heartbeat with in-memory storage directly, or heartbeat daemon which can scale well. (Trade-off could be made.) If we can open the interface of worker heartbeat module, and give users opportunities to customize it, it would be really great. - Why I'm narrowing heartbeat to "worker" only? -- I was thinking "supervisor" heartbeat too, but it uses ephemeral node of ZK, which is normally not supported to other storage. -- And in Scaling Storm, p15, about 99.2% of ZK workload is worker heartbeats. I think ZK can take care of supervisor heartbeat without problem. --- default of "supervisor.heartbeat.frequency.secs" is 5 --- default of "task.heartbeat.frequency.secs" is 3 --- I didn't mention "worker to supervisor" heartbeat cause it uses local file system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)