Re: Using Storm Resource Aware Scheduler

Ali Nazemian Sat, 25 Nov 2017 18:15:57 -0800

Oops, I didn't know that. Happy Thanksgiving.

Thanks, Otto and Simon.


As you are aware of our use cases, with the current limitations of
multi-tenancy support, we are creating a feed per tenant per device.
Sometimes the amount of traffic we are receiving per each tenant and per
each device is way less than dedicating one storm slot for it. Therefore, I
was hoping to make it at least theoretically possible to tune resources
more wisely, but it is not going to be easy at all. This is probably a use
case that storm auto-scaling mechanism would be very nice to have.

https://issues.apache.org/jira/browse/STORM-594

On the other side, I can recall there was a PR to address multi-tenancy by
adding meta-data to Kafka topic. However, I lost track of that feature, so
maybe this situation can be tackled at another level by merging different
parsers.

I will create a Jira ticket to add an ability in UI to tune Metron parser
feeds at Storm level. Right now it is a little hard to maintain tuning
configurations per each parser, and as soon as somebody restarts them from
Management-UI/Ambari, it will be overwritten.


Cheers,
Ali

On Sat, Nov 25, 2017 at 3:36 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Implementing the resource aware scheduler would be decidedly non-trivial.
> Every topology will need additional configuration to tune for things like
> memory sizes, which is not going to buy you much change. So, at the
> micro-tuning level of parser this doesn’t make a lot of sense.
>
> However, it may be relevant to consider separate tuning for parsers in
> general vs the core enrichment and indexing topologies (potentially also
> for separate indexing topologies when this comes in) and the resource
> scheduler could provide a theoretical benefit there.
>
> Specifying resource requirements per parser topology might sound like a
> good idea, but if your parsers are working the way they should, they should
> be using a small amount of memory as their default size, and achieving
> additional resource use by multiplying workers and executors (to get higher
> usage per slot) and balance the load that way. To be honest, the only
> difference you’re going to get from the RAS is to add a bunch of tuning
> parameters which allow slightly different granularity of units for things
> like memory.
>
> The other RAS feature which might be a good add is prioritisation of
> different parser topologies, but again, this is probably not something you
> want to push hard on unless you are severely limited in resources (in which
> case, why not just add another node, it will be cheaper than spending all
> that time micro-tuning the resource requirements for each data feed).
>
> Right now we do allow a lot of micro tuning of parallelism around things
> like the count of executor threads, which is achieves roughly the
> equivalent of the cpu based limits in the RAS.
>
> TL;DR:
>
> If you’re not using resource pools for different users and using the idea
> that prioritisation can lead to arbitrary kills, all you’re getting is a
> slightly different way of tuning knobs that already exist, but you would
> get a slightly different granularity. Also, we would have to rewrite all
> the topology code to add the config endpoints for CPU and memory estimates.
>
> Simon
>
> > On 24 Nov 2017, at 07:56, Ali Nazemian <alinazem...@gmail.com> wrote:
> >
> > Any help regarding this question would be appreciated.
> >
> >
> > On Thu, Nov 23, 2017 at 8:57 AM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
> >
> >> 30 mins average of CPU load by checking Ambari.
> >>
> >> On 23 Nov. 2017 00:51, "Otto Fowler" <ottobackwa...@gmail.com> wrote:
> >>
> >> How are you measuring the utilization?
> >>
> >>
> >> On November 22, 2017 at 08:12:51, Ali Nazemian (alinazem...@gmail.com)
> >> wrote:
> >>
> >> Hi all,
> >>
> >>
> >> One of the issues that we are dealing with is the fact that not all of
> >> the Metron feeds have the same type of resource requirements. For
> example,
> >> we have some feeds that even a single Strom slot is way more than what
> it
> >> needs. We thought we could make it more utilised in total by limiting at
> >> least the amount of available heap space per feed to the parser topology
> >> worker. However, since Storm scheduler relies on available slots, it is
> >> very hard and almost impossible to utilise the cluster in the scenario
> >> that
> >> there will be lots of different topologies with different requirements
> >> running at the same time. Therefore, on a daily basis, we can see that
> for
> >> example one of the Storm hosts is 120% utilised and another is 20%
> >> utilised! I was wondering whether we can address this situation by using
> >> Storm Resource Aware scheduler or not.
> >>
> >> P.S: it would be very nice to have a functionality to tune Storm
> >> topology-related parameters per feed in the GUI (for example in
> Management
> >> UI).
> >>
> >>
> >> Regards,
> >> Ali
> >>
> >>
> >>
> >
> >
> > --
> > A.Nazemian
>
>


-- 
A.Nazemian

Re: Using Storm Resource Aware Scheduler

Reply via email to