[ 
https://issues.apache.org/jira/browse/STORM-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175744#comment-15175744
 ] 

Robert Joseph Evans commented on STORM-1593:
--------------------------------------------

Watching the queue population would work except for system tuples that never 
come from a spout.  We have several different "tick" tuples that can cause 
other tuples to be emitted, like metrics tuples.  It too is a very useful 
metric that we should have and probably should be used if someone wants to know 
for sure if things are done.  In local mode for a lot of our tests we wait for 
the topology to be idle before we take a next step.  We had to put in random 
waits instead of waits at set intervals because we would always somehow hit a 
box where the tick tuples and the polling for idle hit the same cadence and the 
test would time out or hang.  In this case we will be polling less frequently 
but over many more machines, so it feels like we would run into this situation 
regularly. 

Unanchored tuples of all kinds are not a part of the pending count for a spout 
so you probably would want to combine the two metrics as a guess for how free a 
topology is.  But there can also be situations where an unanchored tuple can be 
sitting in a bolt and not on a queue.  Think about an async web service bolt, 
that we have lots of.  In that case we could have lots of tuples outstanding in 
a bolt.  None of these are silver bullets, but better then just waiting.

I would say that your green lite for doing an upgrade would be something like 
{code}topology_deactivated && sum(max_spout_pending) <= pending_threshold && 
sum(queue_length) <= queue_threshold{code} where pending_threshold would 
probably be 0 and queue_threshold would probably be {code}count(bolt_instances) 
+ count(spout_instances){code}

> Nimbus indicator for when a Topology finished processing all tuples
> -------------------------------------------------------------------
>
>                 Key: STORM-1593
>                 URL: https://issues.apache.org/jira/browse/STORM-1593
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: Michael Schonfeld
>            Priority: Minor
>
> Every time we want to update topologies, we routinely find ourselves waiting 
> aimlessly for topologies to "fully finish" processing. We never truly know 
> when a topology is actually still processing tuples, and when it's really 
> done... Unless of course we wait for a full 10m window showing zeros in 
> Nimbus's topology stats table.
> I think it'd be beneficial to add some sort of a "Green" indicator in Nimbus, 
> showing when a deactivated topology has ~0 tuples ringing through it. Would 
> using the queue send/rcv population metric be correct for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to