[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

revans2 Mon, 26 Oct 2015 13:46:30 -0700

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/797#issuecomment-151279210
  
    @HeartSaVioR I am seeing similar things to what @harshach is seeing.  I 
really want to trace this down and fix it.  How many nodes do you have?  Which 
daemons are running on which nodes? What is version of java you are running? 
What OS are you running on?  Can you share some information about the hardware, 
I know it is VMs but number of cores and frequency would be good.  What is the 
network connection between the nodes?
    
    The failures you are seeing look like what I would see when ZK or the 
network would get overloaded.  The heartbeats could not make it to ZK and so it 
didn't show any change in the data some of the time, but with only 3 workers 
and none of them getting rescheduled I find that hard to believe.  Can you 
share any of the logs?  Have you tried to run 
[zktop](https://github.com/phunt/zktop/blob/master/zktop.py) to see if any of 
the nodes in the ensemble are showing signs of slowness.  Have you looked to 
see if the network and disk utilization?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Reply via email to