Re: Sending periodic statistics to Spout from Bolts

Bobby Evans Tue, 14 Feb 2017 06:32:35 -0800

That makes a lot of since.


- Bobby

On Monday, February 13, 2017, 5:02:53 PM CST, Anis Nasir <aadi.a...@gmail.com> 
wrote:Dear Bobby,

Thank you for the feedback. I will start looking at the source code now.

I would prefer the downstream operators to take care of these parameters
locally and only send a message to the upstream operator to *increase load*
or *decrease load.*

Given this feature, upstream operator will be able to react to each request
(i.e., *increase load* or *decrease load)*  rather than collecting all the
stats and solving the optimal assignment problem.

Therefore, I just need an interface to interact with upstream operator.

Regards,
Anis





On Tue, Feb 14, 2017 at 7:48 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> First off if you don't know clojure you are in luck. On the master branch
> all of the core code except for the UI, shell submission and a few classes
> needed to support them are in java.  There are still several tests that
> also need to move over to java but it should not be too big of an issue for
> you.
>
> q-length is fairly straight forward to collect.CPU utilization is hard
> Java does expose this through JMX on a per thread basis, so you might be
> able to get this for the executor thread of a given bolt/spout.memory
> utilization is on a per worker basis, but is fairly simple to get through
> JMXservice time vs idle time are things you will probably need to write
> yourself, but are probably not too difficult to do.  Be careful though this
> is on the data path and can impact the performance of all topologies.
>
>
> "on-need basis" is the hard part.  This is because the downstream
> components need to be able to know that an upstream component needs
> specific metrics.  I think the best way would be to broadcast it at a low
> frequency, but have thresholds where it would send it again if something
> changed drastically.
>
>
> - Bobby
>
> On Monday, February 13, 2017, 4:25:50 PM CST, Anis Nasir <
> aadi.a...@gmail.com> wrote:Dear Bobby,
>
> In this case, how can we enable such configuration?
>
> I am not very familiar with clojure. However, I would like the downstream
> operators to report various parameters on-need basis to the upstream
> operators, like service time, queue length, CPU utilization, memory
> utilization, idle time, etc.
>
> Regards,
> Anis
>
>
>
> On Tue, Feb 14, 2017 at 12:36 AM, Bobby Evans <ev...@yahoo-inc.com.invalid
> >
> wrote:
>
> > Yes makes perfect since.
> >
> >
> > - Bobby
> >
> > On Friday, February 10, 2017, 4:36:22 PM CST, Anis Nasir <
> > aadi.a...@gmail.com> wrote:Dear Bobby,
> >
> > Thank you very much for your reply.
> >
> > In real deployments, it is often the case that executors are heterogenous
> > and execution time per tuple is non-uniform (as discussed in the JIRA).
> In
> > such cases, the workload and capacity (of executors) distributions are
> > often unknown at the upstream operator and it is required to infer the
> > capacity of each worker and the assigned workload.
> >
> > For such scenarios, I would like to design a grouping scheme that allows
> > upstream operators to change the assignments by knowing both the workload
> > and the capacities of the machine.
> >
> > Also, i would prefer that each downstream operator can send this message
> > on-need basis, rather than broadcasting it across the whole set of
> > operators.
> >
> > Does it makes sense?
> >
> > Regards,
> > Anis
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Feb 10, 2017 at 11:54 PM, Bobby Evans
> <ev...@yahoo-inc.com.invalid
> > >
> > wrote:
> >
> > > Anis,
> > > We already have the q-length being reported up stream.
> > > https://issues.apache.org/jira/browse/STORM-162
> > > It works well, except when a topology gets really big the amount of
> > > metrics being collected can negatively impact the performance of the
> > > topology.  By really big I mean several thousand workers.
> > > There has also been a push to redo the metrics system in storm so it is
> > > more scalable and so that nimbus can query it.  That is what I
> personally
> > > think would be a good long term solution for features like elasticity.
> > But
> > > I am not really sure what you mean by load aware scheduling.
> > >
> > > - Bobby
> > >
> > > On Thursday, February 9, 2017, 10:34:29 PM CST, Anis Nasir <
> > > aadi.a...@gmail.com> wrote:Dear All,
> > >
> > > I have been trying to implement load aware scheduling for Apache Storm.
> > >
> > > For this purpose, I need to send periodic statistics from downstream
> > > operators to upstream operators.
> > >
> > > Is there a standard way of sending such statistics to upstream
> operator,
> > > e.g., a bolt periodically reporting it's local queue length to the
> > upstream
> > > spout.
> > >
> > > Thanking you in advance.
> > >
> > > Regards,
> > > Anis
> > >
> >
>

Re: Sending periodic statistics to Spout from Bolts

Reply via email to