[DISCUSS] YuniKorn Proposal

Weiwei Yang Wed, 11 Dec 2019 11:50:55 -0800

Greetings folks:

Please consider the following proposal, which is also on the wiki
<https://cwiki.apache.org/confluence/display/INCUBATOR/YuniKornProposal>
 [1].
I look forward to hearing feedback from you.



*YuniKorn - An Unified Resource Scheduler*

1. Abstract
YuniKorn is a standalone resource scheduler responsible for scheduling
batch jobs and long-running services on large scale distributed systems
running in on-premises environments as well as different public clouds.

2. Proposal
YuniKorn ['ju:nikɔ:n] is a unified resource scheduler aiming to achieve
fine-grained resource sharing for various workloads efficiently on a large
scale, multi-tenant and cloud-native environments. YuniKorn brings a
unified, cross-platform scheduling experience for mixed workloads, with
support for but not limited to, Apache™ Hadoop® YARN and Kubernetes.

Currently, YuniKorn is an open-source project with Apache 2.0 license. The
source code is hosted as a git-repo under the github.com/cloudera domain.
We would like to share it with the ASF and expand the community to a wider
range of users and contributors.

2.1 Background
Enterprise users run their workloads on different platforms such as Apache™
Hadoop® YARN and Kubernetes. They need to work with different resource
schedulers in order to plan their workloads to run on these platforms
efficiently. The scheduler implementations are fragmented, and not
optimized to balance existing use-cases like batch workloads along with new
needs such as cloud-native architecture, autoscaling, etc. We need a single
resource planning/management framework to manage resources on different
platforms using the same semantics, in order to address all the important
resource management requirements.

2.2 Rationale
There is no solution that exists now to address the needs of having a
unified resource scheduling experiences across platforms. That makes it
difficult to manage workloads running on different environments, from
on-premise to Cloud. YuniKorn aims to satisfy these needs. YuniKorn is
designed around the following principles:

1) Support different environments
As the compute platforms are evolving quickly, there are more and more
challenges appear in on-prem, cloud or hybrid environments. YuniKorn aims
to bring unified scheduling experiences across multiple environments with
enhanced scheduling capabilities.

2) Support extensive types of workloads
To improve the efficiency of the computing platform, a key idea is to run
different types of applications, like long-running services and batch jobs,
on shared resources. YuniKorn is an effort to address all the scheduling
features needed for such mixed workload environments.

3) Benefit both big-data and cloud-native communities
A resource scheduler needs to be capable of supporting mixed workloads,
both batch, and long-running services. This is the key to improving cluster
utilization, and to reduce the complexity of dev-ops. By creating a common
scheduler that is decoupled from the container platforms underneath, it can
benefit both Apache™ Hadoop® YARN and the Kubernetes communities.

2.3 Initial Goals
Initial goals are:
 - Move the existing codebase, documentation to Apache hosted repo
 - Set up mailing lists, website, CI/CD pipeline under Apache infrastructure
 - Setup JIRA for issue tracking
 - Incremental development and releases according to Apache guidelines
 - Expand the community and bring more diversified contributors/users to
the community

2.4 Current Status

2.4.1 Meritocracy
Many of the initial developers of YuniKorn are already Apache committers
and PMC members from other Apache projects, such as Apache Hadoop and
Apache Submarine. Many of us have worked in the Apache Hadoop community for
years and know the Apache way well. We believe strongly in meritocracy in
electing committers and PMC members. We believe that contributions can come
in forms other than just code: for example, one of our initial proposed
committers has contributed solely in the area of project documentation. We
will encourage contributions and participation of all types, and ensure
that contributors are appropriately recognized.

2.4.2 Community
YuniKorn is a relatively new open source project, Cloudera is the original
development sponsor for YuniKorn. From the beginning of the project itself,
we had clearly aimed to have this as an open-source project, so we started
to build the community from the very early stages. We received a lot of
feedback and valuable suggestions from other community members while the
project was hosted as an open-source project on GitHub. This feedback has
greatly influenced some of our designs. For e.g, developers from Alibaba
had been involved in the very early stage of development, lots of effort
related to performance/throughput enhancement were contributed by them.
Lots of other organizations further showed their interest to join the
community once we started talking about it in meetups, conferences, etc.

2.4.3 Core developers
The project was initiated in Cloudera and so the core developers are
heavily from this organization. Tao Yang from Alibaba joined the
development at a very early stage. The core developers of YuniKorn are
(listed in alphabetical order):

 - Akhil PB (Cloudera)
 - Sunil Govindan (Cloudera)
 - Tao Yang (Alibaba)
 - Vinod Vavilapalli (Cloudera)
 - Wangda Tan (Cloudera)
 - Weiwei Yang (Cloudera)
 - Wilfred Spiegelenburg (Cloudera)

Given the origin history, the core development team so far has not been
very diverse, but we’ve been attempting to grow that diversity. We have
every hope to continue building a diverse and sustainable community if the
project gets accepted into Apache.

2.4.4 Alignment
The motivation of YuniKorn project is to resolve common resource scheduling
problems for various workloads, on large scale distributed systems. Apache
is home to one of these systems in the form of Apache Hadoop YARN. Many of
thee workloads that we expect to leverage YuniKorn are computing engines
like Apache Spark, Apache Flink whether they run on top of YARN or on
Kubernetes.

2.5 Known Risks

2.5.1 Orphaned products
The core developers of YuniKorn project from different companies plan to
work full time on this project. Currently, the initial team intends to
continue the investments on the YuniKorn project, it will be integrated
into the solutions to the customers. Several other organizations (like
Alibaba) have also started to evaluate the project, and plan to adopt it in
their production environments. We anticipate the adoption will be further
improved once it becomes an Apache project.

We have also got support from core-platform developers and Apache
committers who are interested in contributing to YuniKorn project from
different companies like Microsoft, Nvidia, Tencent, etc. We’re expecting
to see more contributions from these committers and usage by their internal
platforms. So overall, the risk of YuniKorn being an orphaned project is
low.

2.5.2 Inexperience with Open source
Most of the core developers in YuniKorn project are experienced open source
veterans, several developers are Apache committers and PMC members of other
projects, such as Apache™ Hadoop®. And the development style is already
very likely the Apache way
 - We have open community meetings to discuss designs, problems and roadmaps
 - We publish all patches and issue related discussions on Github
 - We enforce the code review and log all comments in GitHub issues

2.5.3 Length of Incubation
We started the work 10 months ago, so far the groundwork for YuniKorn is
done and the initial version can work with K8s seamlessly. Based on the
initial contributors’ experience in ASF projects, we don’t expect that
there will be huge gaps before YuniKorn can graduate with regarding to
ASF’s policies on software and releases. The goal is to grow the community
quickly and increase the user base within a few months while making
releases that adhere to the ASF standards. When it reaches a reasonable
size of adoption and a strong community with a good number of
committers/PMC members, we can prompt the graduation. We expect the length
of incubation to be approximately 6 to 12 months.

2.5.4 Homogenous Development
The initial proposed list of committers and contributors includes
developers from several institutions and industry participants. The
developers are also from different regions like U.S, Australia, India, and
the development team leverages slack, community mailing list, weekly
community calls to collaborate efficiently.

2.5.5 Reliance on Salaried Developers
Clearly, Cloudera has contributed most of the initial development through
salaried developers. But since the very beginning, YuniKorn is built as a
community effort project. We have people from other organizations that are
already collaborating with us on Github. This includes both at the source
code level, as well as participating in designs and providing feedback
through community calls. We expect our reliance on salaried developers to
decrease drastically during the incubation process itself.

2.5.6 Relationship to Other Apache Products

YuniKorn is very closely related to other Big-Data projects in Apache, such
as Hadoop YARN, Spark, Hive, Flink, etc.

YuniKorn’s core idea is to support both long-running and batch workloads
like Spark, Hive, Flink, etc, and provide a consistent, unified way to
manage and schedule resources for Big Data workloads across resource
managers like Apache™ Hadoop® YARN / Kubernetes and on-premise and cloud
environments.

Many of the core ideas for YuniKorn come from the experience of the initial
team building Apache Hadoop YARN’s schedulers - Capacity Scheduler and Fair
Scheduler.

2.5.7 An Excessive Fascination with the Apache Brand
Many of the initial developers in YuniKorn project are already experienced
Apache committers, PMC members. We understand the value of the Apache way,
and how to operate the project development on a day to day basis. The
reason for proposing YuniKorn as an Apache project is to build a healthy
community, increasing adoption & the size of the community and end-users,
because we believe the only way to build a highly valuable infrastructure
layer software is to have wide adoption and cater to common use cases.

2.6 Documentation
Project summary:
https://github.com/cloudera/yunikorn-core/blob/master/README.md
User guides:
https://github.com/cloudera/yunikorn-core/blob/master/docs/user-guide.md
Developer guides:
https://github.com/cloudera/yunikorn-core/blob/master/docs/developer-guide.md
Roadmap:
https://github.com/cloudera/yunikorn-core/blob/master/docs/roadmap.md

2.7 Initial Source
Currently, YuniKorn source code is hosted in several GitHub repositories
 - Scheduler interface:
https://github.com/cloudera/yunikorn-scheduler-interface
 - Scheduler core: https://github.com/cloudera/yunikorn-core
 - K8s Shim: https://github.com/cloudera/yunikorn-k8shim
 - Scheduler Web UI: https://github.com/cloudera/yunikorn-web

2.8 Source and Intellectual Property Submission Plan

2.8.1External Dependencies
External dependencies are listed in below table
 - k8s.io/api, K8s API,  Apache License 2.0
 - k8s.io/apimachinery, K8s API, Apache License 2.0
 - k8s.io/client-go, K8s client library, Apache License 2.0
 - github.com/looplab/fsm, Go state machine library, MIT License
 - github.com/satori/go.uuid, Go UUID library, MIT License
 - github.com/uber-go/zap, Go logging library, MIT License
 - github.com/golang/protobuf, Go protobuf library, BSD 3-Clause License
 - github.com/gorilla/mux, Go network library, BSD 3-Clause License
 - google.golang.org/grpc, Go RPC library, Apache License 2.0
 - gopkg.in/yaml.v2, Go YAML library, Apache License 2.0
 - github.com/prometheus/client_golang, Prometheus Client Library, Apache
License 2.0
 - Angular v6.1.x, Angular UI Framework Libraries, MIT License
 - TypeScript, TypeScript Language Compiler, Apache License 2.0
 - Chart.js, JavaScript Charting Library, MIT License
 - Moment.js, JavaScript Date & Time Library, MIT License

Build and test only:
 - gotest.tools, Test library, Apache License 2.0
 - github.com/stretchr/testify, Test library, MIT License
 - Karma, Unit test library, MIT License
 - Protactor, End2End test library, MIT License
 - Json-server, Test server, MIT License
 - Yarn, Dependency manager, BSD 2-Clause License

2.8.2 Cryptography
YuniKorn does not currently include any cryptography-related code.

2.9 Required Resources

2.9.1 Mailing lists:
 - priv...@yunikorn.incubator.apache.org (PMC list)
 - comm...@yunikorn.incubator.apache.org (git push emails)
 - iss...@yunikorn.incubator.apache.org (JIRA issue feed)
 - d...@yunikorn.incubator.apache.org (Dev discussion)
 - u...@yunikorn.incubator.apache.org (User questions)

2.9.2 Git Repositories
Git is the preferred source control system
 - git://git.apache.org/yunikorn-* (We have multiple git repositories)

2.9.3 Issue Tracking
JIRA YuniKorn (*YUNIKORN-*)

2.9.4 Other Resources
We have published a series of demo videos on the Youtube channel:
https://www.youtube.com/channel/UCDSJ2z-lEZcjdK27tTj_hGw

2.10 Initial Committers and Affinities
Initial committers and affinities are listed as below:
 - Akhil PB (a...@cloudera.com) (Cloudera)
 - Sunil Govindan (sun...@apache.org) (Cloudera)
 - Vinod Kumar Vavilapalli (vino...@apache.org) (Cloudera)
 - Wangda Tan (wan...@apache.org) (Cloudera)
 - Weiwei Yang (w...@apache.org) (Cloudera)
 - Wilfred Spiegelenburg (wspiegelenb...@cloudera.com) (Cloudera)
 - Carlo Curino (cur...@apache.org) (Microsoft)
 - Subramaniam Krishnan (su...@apache.org) (Microsoft)
 - Arun Suresh (asur...@apache.org) (Microsoft)
 - Konstantinos Karanasos (kkarana...@apache.org) (Microsoft)
 - Jonathan Hung (jh...@apache.org) (LinkedIn)
 - DB Tsai (dbt...@apache.org) (Apple)
 - Junping Du (junping...@apache.org) (Tencent)
 - Tao Yang (taoy...@apache.org) (Alibaba)
 - Jason Lowe (jl...@apache.org) (Nvidia)

2.11 Sponsors
Champion
 - Vinod Kumar Vavilapalli (vino...@apache.org)

Nominated Mentors
 - Junping Du (Tencent), (junping...@apache.org)
 - Felix Cheung (Uber), (felixche...@apache.org)
 - Jason Lowe (Nvidia), (jl...@apache.org)
 - Holden Karau (Apple), (hol...@apache.org)

Sponsoring Entity
 - The Apache Incubator

[1] https://cwiki.apache.org/confluence/display/INCUBATOR/YuniKornProposal

-------------------------------- END OF THE PROPOSAL
-------------------------------

Thanks
Weiwei

[DISCUSS] YuniKorn Proposal

Reply via email to