Hi all,

After the discussion of DolphinScheduler(was EasyScheduler) proposal
(discussion thread:
https://lists.apache.org/thread.html/d3ac53bddf91391e54f63d042a0b3d60f2aecfbb99780bcc00b4db6e@%3Cgeneral.incubator.apache.org%3E
),
I would like to call a VOTE to accept it into the Apache Incubator.

Please cast your vote:

  [ ] +1, bring DolphinScheduler into Incubator
  [ ] +0, I don't care either way
  [ ] -1, do not bring DolphinScheduler into Incubator, because...

The vote will open at least for 72 hours and only votes from the Incubator
PMC are binding.

======
Abstract

DolphinScheduler is a distributed ETL scheduling engine with powerful DAG
visualization interface. DolphinScheduler focuses on solving the problem of
'complex task dependencies & triggers' in data processing. Just like its
name, we dedicated to making the scheduling system out of the box.

*Current project name of DolphinScheduler is EasyScheduler, will change it
after it is accepted by Incubator.*
Proposal

DolphinScheduler provides many easy-to-use features to accelerate
the engineering efficiency on data ETL workflow job. We propose a new
concept of 'instance of process' and 'instance of task' to let developers
to tuning their jobs on the running state of workflow instead of changing
the task's template. Its main objectives are as follows:

   - Define the complex tasks' dependencies & triggers in a DAG graph by
   dragging and dropping.
   - Support cluster HA.
   - Support multi-tenant and parallel or serial backfilling data.
   - Support automatical failure job retry and recovery.
   - Support many data task types and process priority, task priority and
   relative task timeout alarm.

For now, DolphinScheduler has a fairly huge community in China. It is also
widely adopted by many companies and organizations
<https://github.com/analysys/EasyScheduler/issues/57> as its ETL scheduling
tool.

We believe that bringing DolphinScheduler into ASF could advance
development of a much more stronger and more diverse open source community.

Analysys submits this proposal to donate DolphinScheduler's source codes
and all related documentations to Apache Software Foundation. The codes are
already under Apache License Version 2.0.

   - Code base: https://www.github.com/analysys/easyscheduler
   - English Documentations: https://analysys.github.io/easyscheduler_docs
   - Chinese Documentations:
   https://analysys.github.io/easyscheduler_docs_cn

Background

We want to find a data processing tool with the following features:

   - Easy to use,developers can build a ETL process with a very simple drag
   and drop operation. not only for ETL developers,people who can't write code
   also can use this tool for ETL operation such as system adminitrator.
   - Solving the problem of "complex task dependencies" , and it can
   monitor the ETL running status.
   - Support multi-tenant.
   - Support many task types: Shell, MR, Spark, SQL (mysql, postgresql,
   hive, sparksql), Python, Sub_Process, Procedure, etc.
   - Support HA and linear scalability.

For the above reasons, we realized that no existing product met our
requirements, so we decided to develop this tool ourselves. We designed
DolphinScheduler at the end of 2017. The first internal use version was
completed in May 2018. We then iterated several internal versions and the
system gradually became stabilized.

Then we open the source code of DolphinScheduler on March 2019. It soon
gained lot's of ETL developers interest and stars on github.
Rationale

Many organizations (>30) (refer to Who is using DolphinScheduler
<https://github.com/analysys/EasyScheduler/issues/57> ) already benefit
from running DolphinScheduler to make data process pipelines more easier.
More than 100 feature ideas
<https://github.com/analysys/EasyScheduler/projects/1> come from
DolphinScheduler community. Some 3rd-party projects also plan to integrate
with DolphinScheduler through task plugin, such as Scriptis
<https://github.com/WeBankFinTech/Scriptis>, waterdrop
<https://github.com/InterestingLab/waterdrop>. These will strengthen the
features of DolphinScheduler.
Current StatusMeritocracy

DolphinScheduler was incubated at Analysys in 2017 and open sourced on
GitHub in March 2019. Once open sourced, we have been quickly adopted by
multiple organizations,DolphinScheduler has contributors and users from
many companies; we have set up the Committer Team. New contributors are
guided and reviewed by existed committer members. Contributions are always
welcomed and highly valued.
Community

Now we have set development teams for DolphinScheduler in Analysys, and we
already have external developers who contributed the code. We already have
a user group of more than 1,000 people. We hope to grow the base of
contributors by inviting all those who offer contributions through The
Apache Way. Right now, we make use of github as code hosting as well as
gitter for community communication.
Core Developers

The core developers, including experienced senior developers, are often
guided by mentors.
Known RisksOrphaned products

DolphinScheduler is widely adopted in China by many companies and
organizations <https://github.com/analysys/EasyScheduler/issues/57>. The
core developers of DolphinScheduler team plan to work full time on this
project. Currently there are 10 use cases with more that 1000 activity
tasks per day using DolphinScheduler in the user's production environment.
There is very little risk of DolphinScheduler getting orphaned as at least
two large companies (xueqiu、fengjr) are widely using it in their
production, and developers from these companies have also joined Easy
Scheduler's team of contributors, DolphinScheduler has eight major releases
so far, and and received 373 pull requests from contributors, which further
demonstrates DolphinScheduler as a very active project. We also plan to
extend and diversify this community further through Apache.

Thus, it is very unlikely that DolphinScheduler becomes orphaned.
Inexperience with Open Source

DolphinScheduler's core developers have been running it as a
community-oriented open source project for some time, several of them
already have experience working with open source communities, they are also
active in presto, alluxio and other projects. At the same time, we will
learn more open source experiences by following the Apache way in our
incubator journey.
Homogenous Developers

The current developers work across a variety of organizations including
Analysys, guandata and hydee; some individual developers are accepted as
developers of DolphinScheduler as well. Considering that fengjr and
sefonsoft have shown great interests in DolphinScheduler, we plan to
encourage them to contribute and invite them as contributors to work
together.
Reliance on Salaried Developers

At present, eight of the core developers are paid by their employer to
contribute to DolphinScheduler project. we also have some other developers
and researchers taking part in the project, and we will make efforts to
increase the diversity of the contributors and actively lobby for Domain
experts in the workflow space to contribute.
Relationships with Other Apache Products

DolphinScheduler integrates Apache Zookeeper as one of the service
registration/discovery mechanisms. DolphinScheduler is deeply integrated
with Apache products. It currently support many task types like Apache
Hive, Apache Spark, Apache Hadoop, and so on
A Excessive Fascination with the Apache Brand

We recognize the value and reputation that the Apache brand will bring to
DolphinScheduler. However, we prefer that the community provided by the
Apache Software Foundation will enable the project to achieve long-term
stable development. so DolphinScheduler is proposing to enter incubation at
Apache in order to help efforts to diversify the community, not so much to
capitalize on the Apache brand.
Documentation

A complete set of DolphinScheduler documentations is provided on github in
both English and Simplified Chinese.

   - English <https://github.com/analysys/easyscheduler_docs>
   - Chinese <https://github.com/analysys/easyscheduler_docs_cn>

Initial Source

The project consists of three distinct codebases: core and document. The
address of two existed git repositories are as follows:

   - https://github.com/analysys/easyscheduler
   - https://github.com/analysys/easyscheduler_docs
   - https://github.com/analysys/easyscheduler_docs_cn

Source and Intellectual Property Submission Plan

As soon as DolphinScheduler is approved to join Apache Incubator, Analysys
will provide the Software Grant Agreement(SGA) and intial committers will
submit ICLA(s). The code is already licensed under the Apache Software
License, version 2.0.


External Dependencies

As all backend code dependencies are managed using Apache Maven, none of
the external libraries need to be packaged in a source distribution.

Most of dependencies have Apache compatible licenses,and the detail as
follows:

Most of dependencies have Apache compatible licenses,and the core
dependencies are as follows:
Backend Dependency
Dependency
License
Comments
bonecp-0.8.0.RELEASE.jar Apache v2.0
byte-buddy-1.9.10.jar Apache V2.0
c3p0-0.9.1.1.jar GNU LESSER GENERAL PUBLIC LICENSE will remove
curator-*-2.12.0.jar Apache V2.0
druid-1.1.14.jar Apache V2.0
fastjson-1.2.29.jar Apache V2.0
fastutil-6.5.6.jar Apache V2.0
grpc-*-1.9.0.jar Apache V2.0
gson-2.8.5.jar Apache V2.0
guava-20.0.jar Apache V2.0
guice-*3.0.jar Apache V2.0
hadoop-*-2.7.3.jar Apache V2.0
hbase-*-1.1.1.jar Apache V2.0
hive-*-2.1.0.jar Apache V2.0
instrumentation-api-0.4.3.jar Apache V2.0
jackson-*-2.9.8.jar Apache V2.0
jackson-jaxrs-1.8.3.jar LGPL Version 2.1 Apache V2.0 will remove
jackson-xc-1.8.3.jar LGPL Version 2.1 Apache V2.0 will remove
javax.activation-api-1.2.0.jar CDDL/GPLv2+CE will remove
javax.annotation-api-1.3.2.jar CDDL + GPLv2 with classpath exception will
remove
javax.servlet-api-3.1.0.jar CDDL + GPLv2 with classpath exception will
remove
jaxb-*.jar (CDDL 1.1) (GPL2 w/ CPE) will remove
jersey-*-1.9.jar CDDL+GPLv2 will remove
jetty-*-9.4.14.v20181114.jar Apache V2.0,EPL 1.0
jna-4.5.2.jar Apache V2.0,LGPL 2.1 will remove
jna-platform-4.5.2.jar Apache V2.0,LGPL 2.1 will remove
jsp-api-2.x.jar CDDL,GPL 2.0 will remove
log4j-1.2.17.jar Apache V2.0
log4j-*-2.11.2.jar Apache V2.0
logback-x.jar dual-license EPL 1.0,LGPL 2.1
mail-1.4.5.jar CDDL+GPLv2 will remove
mybatis-3.5.1.jar Apache V2.0
mybatis-spring-*2.0.1.jar Apache V2.0
mysql-connector-java-5.1.34.jar GPL 2.0 will remove
netty-*-4.1.33.Final.jar Apache V2.0
oshi-core-3.5.0.jar EPL 1.0
parquet-hadoop-bundle-1.8.1.jar Apache V2.0
postgresql-42.1.4.jar BSD 2-clause
protobuf-java-*3.5.1.jar BSD 3-clause
quartz-2.2.3.jar Apache V2.0
quartz-jobs-2.2.3.jar Apache V2.0
slf4j-api-1.7.5.jar MIT
spring-*-5.1.5.RELEASE.jar Apache V2.0
spring-beans-5.1.5.RELEASE.jar Apache V2.0
spring-boot-*2.1.3.RELEASE.jar Apache V2.0
springfox-*-2.9.2.jar Apache V2.0
stringtemplate-3.2.1.jar BSD
swagger-annotations-1.5.20.jar Apache V2.0
swagger-bootstrap-ui-1.9.3.jar Apache V2.0
swagger-models-1.5.20.jar Apache V2.0
zookeeper-3.4.8.jar Apache

The front-end UI currently relies on many components, and the core
dependencies are as follows:
UI Dependency
Dependency
License
Comments
autoprefixer MIT
babel-core MIT
babel-eslint MIT
babel-helper-* MIT
babel-helpers MIT
babel-loader MIT
babel-plugin-syntax-* MIT
babel-plugin-transform-* MIT
babel-preset-env MIT
babel-runtime MIT
bootstrap MIT
canvg MIT
clipboard MIT
codemirror MIT
copy-webpack-plugin MIT
cross-env MIT
css-loader MIT
cssnano MIT
cyclist MIT
d3 BSD-3-Clause
dayjs MIT
echarts Apache V2.0
env-parse ISC
extract-text-webpack-plugin MIT
file-loader MIT
globby MIT
html-loader MIT
html-webpack-ext-plugin MIT
html-webpack-plugin MIT
html2canvas MIT
jsplumb (MIT OR GPL-2.0)
lodash MIT
node-sass MIT
optimize-css-assets-webpack-plugin MIT
postcss-loader MIT
rimraf ISC
sass-loader MIT
uglifyjs-webpack-plugin MIT
url-loader MIT
util.promisify MIT
vue MIT
vue-loader MIT
vue-style-loader MIT
vue-template-compiler MIT
vuex-router-sync MIT
watchpack MIT
webpack MIT
webpack-dev-server MIT
webpack-merge MIT
xmldom MIT,LGPL will remove

Required ResourcesGit Repositories

   - https://github.com/analysys/EasyScheduler.git
   - https://github.com/analysys/easyscheduler_docs.git
   - https://github.com/analysys/easyscheduler_docs_cn.git

Issue Tracking

The community would like to continue using GitHub Issues.
Continuous Integration tool

Jenkins
Mailing Lists

   - DolphinScheduler-dev: for development discussions
   - DolphinScheduler-private: for PPMC discussions
   - DolphinScheduler-notifications: for users notifications


Initial Committers

   - William-GuoWei(guowei...@outlook.com)
   - Lidong Dai(lidong....@outlook.com)
   - Zhanwei Qiao(qiaozhan...@outlook.com)
   - Liang Bao(baoliang.l...@gmail.com)
   - Gang Li(lgcareer2...@outlook.com)
   - Zijian Gong(quanqua...@gmail.com)
   - Jun Gao(gaojun2...@gmail.com)
   - Baoqi Wu(wuba...@gmail.com)

Affiliations

   -

   Analysys Inc: William-GuoWei,Zhanwei Qiao,Liang Bao,Gang Li,Jun
   Gao,Lidong Dai
   -

   Hydee Inc: Zijian Gong
   -

   Guandata Inc: Baoqi Wu

SponsorsChampion

   - Sheng Wu ( Apache Incubator PMC, wush...@apache.org)

Mentors

   -

   Sheng Wu ( Apache Incubator PMC, wush...@apache.org)
   -

   ShaoFeng Shi ( Apache Incubator PMC, shaofeng...@apache.org
   <wush...@apache.org>)
   -

   Liang Chen ( Apache Incubator PMC, Apache member, chenliang...@apache.org
   )
   - Furkan KAMACI ( Apache Incubator PMC, kam...@apache.org)
   - Kevin Ratnasekera ( Apache Incubator PMC, Apache member,
   djkevi...@apache.org)

Sponsoring Entity

We are expecting the Apache Incubator could sponsor this project.

Sheng Wu 吴晟

Apache SkyWalking, ShardingSphere, Zipkin
Twitter, wusheng1108

Reply via email to