Github user d2r commented on the pull request:
https://github.com/apache/storm/pull/392#issuecomment-71714975
> Not to derail the discussion but personally, I would much rather not
store errors in zk at all if its just for rendering the errors in UI. If the
spouts/bolts could just store this in memory with some expiration that should
suffice and we could expose an API at worker layer to get this information
directly from it. If the host dies you lose some errors but that does not seem
like a big deal. The only downside will be ui would now have to make requests
against worker hosts to get erros but that seems ok to me, you would also get
parallelism as all these worker calls can be made in parallel. I haven't
thought this through completely and its probably much more work but I would
love to hear your opinion.
Yeah, we were thinking about distributing things this way too. We figured
that the bigger problem is the heartbeats, and if we could get an improvement
with less effort here, it would be worth it. It would be a much bigger change
to distribute the errors out of ZK, yet maybe it is not a bad idea. (Also, I
think it is good to persist the errors anyway, not just in memory. Users would
like to see errors on the UI even if there was some issue that brought the
supervisor downâlike a rolling upgrade of the cluster.) Maybe we could file
a JIRA for better gathering of errors.
This change was intended to be small in scope and just give a way to get
errors more efficiently when a topology has many, many components. It was
prompted by seeing topology page load times of minutes from one of our
customers. Plus, this may be less of a problem once heartbeats (and their
metrics) are no longer getting sent around, but still it may not a bad idea to
use a more distributed model like you suggest.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---