Thanks for the response Bobby!

I think I might have failed to sufficiently emphasize & explain something
in my earlier description of the issue:  this is happening *only* in a
worker process that is hosting a bolt that implements the *IMetricsConsumer
*interface.  The other 24 worker processes are working just fine, their
netty queues do not grow forever.  The same number and type of executors
are on every worker process, except that one worker that is hosting the
metrics consumer bolt.

So the netty queue is growing unbounded because of an influx of metrics.
The acking and max spout pending configs wouldn't seem to directly
influence the filling of the netty queue with custom metrics.

Notably, this "choking" behavior happens even with a "NoOpMetricsConsumer"
bolt which is the same as storm's LoggingMetricsConsumer but with the
handleDataPoints() doing *nothing*.  Interesting, right?

- Erik

On Tue, Jan 3, 2017 at 7:06 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> Storm does not have back pressure by default.  Also because storm supports
> loops in a topology the message queues can grow unbounded.  We have put in
> a number of fixes in newer versions of storm, also for the messaging side
> of things.  But the simplest way to avoid this is to have acking enabled
> and have max spout pending set to a reasonable number.  This will typically
> be caused by one of the executors in your worker not being able to keep up
> with the load coming in.  There is also the possibility that a single
> thread cannot keep up with the incoming  message load.  In the former case
> you should be able to see the capacity go very high on some of the
> executors.  In the latter case you will not see that, and may need to add
> more workers to your topology.  - Bobby
>
>     On Thursday, December 22, 2016 10:01 PM, Erik Weathers
> <eweath...@groupon.com.INVALID> wrote:
>
>
>  We're debugging a topology's infinite memory growth for a worker process
> that is running a metrics consumer bolt, and we just noticed that the netty
> Server.java's message_queue
> <https://github.com/apache/storm/blob/v0.9.6/storm-core/
> src/jvm/backtype/storm/messaging/netty/Server.java#L97>
> is growing forever (at least it goes up to ~5GB before it hits heap limits
> and leads to heavy GCing).  (We found this by using Eclipse's Memory
> Analysis Tool on a heap dump obtained via jmap.)
>
> We're running storm-0.9.6, and this is happening with a topology that is
> processing 200K+ tuples per second, and producing a lot of metrics.
>
> I'm a bit surprised that this queue would grow forever, I assumed there
> would be some sort of limit.  I'm pretty naive about how netty's message
> receiving system tied into the Storm executors at this point though.  I'm
> kind of assuming the behavior could be a result of backpressure / slowness
> from our downstream monitoring system, but there's no visibility provided
> by Storm into what's happening with these messages in the netty queues
> (that I have been able to ferret out at least!).
>
> Thanks for any input you might be able to provide!
>
> - Erik
>
>
>
>

Reply via email to