Github user HeartSaVioR commented on the pull request:
https://github.com/apache/storm/pull/1406#issuecomment-217765582
@abhishekagarwal87
Thanks for the improvement. :)
Btw, I've some opinions on this change.
1. some concerns about adding payloads for task heartbeat
As you may know, why Storm needs Pacemaker daemon with large cluster is
that Storm includes task metrics into heartbeat message and store to ZK in a
short interval (task.heartbeat.frequency.secs, its default value is 3) which is
a big pressure for ZK.
So we would like to have some discussions for expanding heartbeat message
with current way, or change the way to send metrics to Nimbus (like JStorm). If
we can make some more spaces for metrics, we can have ideations around metrics
and add them to enrich. For example, spout tasks can have optional metrics, for
example, partition information and lag for KafkaSpout.
2. metrics for queue
I guess sojourn time for the queue is one of most wanted feature of queue
metrics, since many users said that they see very short latencies for
execute/process latency for each task but also see very high complete latency.
(@wangli1426 addresses sojourn time for disruptor queue but [as he stated
to code
comment](https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/utils/DisruptorQueue.java#L324),
it's based on precondition which is sometimes not true for problematic task.
If we can make it stable it would be really helpful.)
STORM-1742 covers the accuracy of 'complete latency', but many parts of
lifecycle of tuple are still hidden, for example, avg. of queue sojourn time,
serde latency, transfer latency, etc. I think we don't want to address the
things which can affect overall performance in order to measure, but they're
meaningful information indeed so I would like to address if they don't hurt at
all.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---