Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

Prabhu Kasinathan Wed, 16 Aug 2017 16:18:24 -0700

As Meisam highlighted, in our case, we have Livy Multi-Node HA i.e livy
running on 6 servers for each cluster, load-balanced, sharing livy metadata
on zookeeper and running thousands of applications. With below changes, we
are seeing good improvements due to batching the requests (one per livy
node) instead of each livy node making multiple requests. Please review the
changes and let us know if improvements needed or we are open to explore
other alternative option if works.


> We are making one big request to get ApplicationReports, Then we make an
> individual + thread pool request to get the tracking URL, Spark UI URL,
> YARN diagnostics, etc for each application separately. For our cluster
> settings and our workloads, one big request turned out to be a better
> solution. But we were limited to the API provided in YarnClient. With the
> home-made REST client a separate request is not needed and that can change
> the whole equation.



On Wed, Aug 16, 2017 at 3:33 PM, Meisam Fathi <meisam.fa...@gmail.com>
wrote:

>
> On Wed, Aug 16, 2017 at 2:09 PM Nan Zhu <zhunanmcg...@gmail.com> wrote:
>
>> With time goes, the reply from YARN can only be larger and larger. Given
>> the consistent workload pattern, the cost of a large query can be
>> eventually larger than individual request
>>
>
> I am under the impression that there is a limit to the number of reports
> that YARN retains, which is set by 
> yarn.resourcemanager.max-completed-applications
> in yarn.xml and defaults to 10,000. But I could be wrong about the
> semantics of yarn.resourcemanager.max-completed-applications.
>
> I would say go with individual request + thread pool  or large batch for
>> all first, if any performance issue is observed, add the optimization on
>> top of it
>>
>
> We are making one big request to get ApplicationReports, Then we make an
> individual + thread pool request to get the tracking URL, Spark UI URL,
> YARN diagnostics, etc for each application separately. For our cluster
> settings and our workloads, one big request turned out to be a better
> solution. But we were limited to the API provided in YarnClient. With the
> home-made REST client a separate request is not needed and that can change
> the whole equation.
>
> @Prabhu, can you chime in?
>
>
>> However, even with rest API, there are some corner cases, e.g. a
>> long running app lasting for days (training some models), and some short
>> ones which last only for minutes
>>
>
> We are running Spark streaming jobs on Livy that virtually run for ever.
>
> Thanks,
> Meisam
>

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

Reply via email to