Please check the supervisor log on that node, and also check the worker log for 
the worker.  If the supervisor prints out a message about ":disallowed" then 
nimbus rescheduled it some place else.  If it prints out a message about 
timed-out then the worker was not responding, and the supervisor relaunched it 
thinking it was dead.  There are usually two causes for this.  1) it was dead 
and you will probably see a lot message in the worker log with the stack trace 
for the exception that killed the worker. 2) GC was going crazy on that worker 
and it didn't get enough time to actually heartbeat.  If it is the latter you 
really are going to need to do some profiling.  You can test this by increasing 
the heap size and seeing if it fixes it, or preferably shutting off your 
supervisor and attaching a debugger/taking a heap dump to see where the memory 
is being used.  If you have a memory leak, increasing the heap size will not 
fix it.
 - Bobby 


     On Friday, October 2, 2015 2:14 PM, abe oppenheim 
<[email protected]> wrote:
   

 Hi,

I'm seeing weird behavior in my topologies and was hoping for some advice
on how to troubleshoot the issue.

This behavior occurs throughout my topology, but it is easiest to explain
it as the behavior of one bolt. This bolt has 20 executors. When I submit
the topology, the executors are evenly split between 2 hosts. The executors
on one host seem stable, but the Uptime for the executors on the other host
never grows above 10mins-ish, they are constantly being re-prepared.

I don't know what this is symptomatic of or how to diagnose it. All the
Executors have the same Uptime, so I assume this indicates that their
Worker is dying.

Any advice on how to troubleshoot this? Possibly a way to tap into the
Worker lifecycle so I can confirm it is dying every few minutes? Possibly
an explanation of why a Worker would die so consistently, and suggestions
about how to approach this?

Also, any input on how "bad" this is? My topology still processes stuff,
but I assume this constant recreation of Executors has a significant
performance impact?

thanks,
Abe


  

Reply via email to