Hi Samuel, This is an interesting contribution. Do you happen to have any performance numbers with YARN ? I'd be particularly interested in the cold start latencies.
Thanks, dragos On Fri, Feb 22, 2019 at 5:21 PM Samuel Hjelmfelt <[email protected]> wrote: > > Hi Rodric and Carlos, > > > ApacheHadoop has three major components: HDFS (distributed filesystem), > MapReduce(distributed batch processing engine), YARN (Yet Another Resource > Negotiator) (containerengine). While MapReduce has been largely replaced by > Apache Tez, Apache Spark,and Apache Flink, HDFS and YARN are still widely > used for data analytics use cases. > > > > YARN is unique as a container engine because, unlike Mesos and Kubernetes, > it was designed for ephemeral, short-livedcontainers rather than for long > running micro-services. The jobs and queries that run on YARN are split > intosmall tasks that run to completion and generally only last for seconds > or maybe minutes. Overthe last couple years, YARN has been expanding its > support for long running usecases, but is still focused on data-driven use > cases over more generic micro-serviceuse cases (like web apps). The primary > long running technologies on YARN are currently Spark Streamingand > TensorFlow. Here is an articlefrom LinkedIn about why they created a > project for TensorFlow on YARN. Asimilar case could be made for OpenWhisk: > https://engineering.linkedin.com/blog/2018/09/open-sourcing-tony--native-support-of-tensorflow-on-hadoop. > > > > > Bringing OpenWhisk onto YARN makes FaaS more accessible to thethousands of > organizations with existing Hadoop clusters. Between Cloudera’s 2,000+ > customers; Azure, AWS,and GCP cloud customers; and the organizations > self-supporting like Netflix, theinstall base of YARN is very high and > still growing. > > > > ThisPR is a first level of integration, but YARN’s focus on ephemeral > containerscould be more fully leveraged by OpenWhisk to improve scalability > andperformance. Here is an interesting article on the scalability of YARN > fromMicrosoft: > https://azure.microsoft.com/en-us/blog/how-microsoft-drives-exabyte-analytics-on-the-world-s-largest-yarn-cluster/ > > Thanks, > Sam Hjelmfelt >
