Re: PR to enable actions on YARN

Dascalita Dragos Mon, 25 Feb 2019 16:16:08 -0800

Hi Samuel,
This is an interesting contribution. Do you happen to have any performance
numbers with YARN ? I'd be particularly interested in the cold start
latencies.


Thanks,
dragos

On Fri, Feb 22, 2019 at 5:21 PM Samuel Hjelmfelt
<samhjelmf...@yahoo.com.invalid> wrote:

>
> Hi Rodric and Carlos,
>
>
> ApacheHadoop has three major components: HDFS (distributed filesystem),
> MapReduce(distributed batch processing engine), YARN (Yet Another Resource
> Negotiator) (containerengine). While MapReduce has been largely replaced by
> Apache Tez, Apache Spark,and Apache Flink, HDFS and YARN are still widely
> used for data analytics use cases.
>
>
>
> YARN is unique as a container engine because, unlike Mesos and Kubernetes,
> it was designed for ephemeral, short-livedcontainers rather than for long
> running micro-services. The jobs and queries that run on YARN are split
> intosmall tasks that run to completion and generally only last for seconds
> or maybe minutes. Overthe last couple years, YARN has been expanding its
> support for long running usecases, but is still focused on data-driven use
> cases over more generic micro-serviceuse cases (like web apps). The primary
> long running technologies on YARN are currently Spark Streamingand
> TensorFlow. Here is an articlefrom LinkedIn about why they created a
> project for TensorFlow on YARN. Asimilar case could be made for OpenWhisk:
> https://engineering.linkedin.com/blog/2018/09/open-sourcing-tony--native-support-of-tensorflow-on-hadoop.
>
>
>
>
> Bringing OpenWhisk onto YARN makes FaaS more accessible to thethousands of
> organizations with existing Hadoop clusters. Between Cloudera’s 2,000+
> customers; Azure, AWS,and GCP cloud customers; and the organizations
> self-supporting like Netflix, theinstall base of YARN is very high and
> still growing.
>
>
>
> ThisPR is a first level of integration, but YARN’s focus on ephemeral
> containerscould be more fully leveraged by OpenWhisk to improve scalability
> andperformance. Here is an interesting article on the scalability of YARN
> fromMicrosoft:
> https://azure.microsoft.com/en-us/blog/how-microsoft-drives-exabyte-analytics-on-the-world-s-largest-yarn-cluster/
>
> Thanks,
> Sam Hjelmfelt
>

Re: PR to enable actions on YARN

Reply via email to