Hi all,
following the discussion thread, I think we can start the vote on accepting
Amaterasu into the Apache Incubator.
The ASF voting rules are described:
http://www.apache.org/foundation/voting.html
A vote for accepting a new Apache Incubator podling is a majority vote for which
only Incubator PMC member votes are binding.
This vote will run for at least 72 hours. Please VOTE as follows
[] +1 Accept Amaterasu into the Apache Incubator
[] +0 Abstain.
[] -1 Do not accept Amaterasu into the Apache Incubator because ...
The proposal is listed below, but you can also access it on the wiki:
https://wiki.apache.org/incubator/AmaterasuProposal
Note that we are looking for one more mentor on the proposal.
One concern has been raised about the "Amaterasu" name. We want to move forward
with Amaterasu name but open to change if needed.
Thanks
Regards
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com
= Apache Amaterasu =
== Abstract ==
Apache Amaterasu is a framework providing continuous deployment for Big Data
pipelines.
It provides the following capabilities:
* '''Continuous integration''' tools to '''package pipelines and run tests'''.
* A repository to store those packaged applications: the '''applications
repository'''.
* A repository to store the pipelines, and engine configuration (for instance,
location of the Spark master, etc.): per environment - the '''configuration
repository'''.
* A '''dashboard''' to monitor the pipelines.
* A '''DSL and integration hooks''' allowing third parties to easily integrate.
== Proposal ==
Amaterasu is a simple and powerful framework to build and dispense pipelines.
It aims to help data engineers and data scientists to compose, configure, test,
package, deploy and execute data pipelines written using multiple tools,
languages and frameworks. Amaterasu provides a standard repo structure to
package big data pipelines, a YAML based Domain Specific Languages (DSL) for
data engineers, data scientists and operations engineers to manage complex
pipelines throughout their entire lifecycle (Dev, UAT, Prod, etc.).
== Background ==
Amaterasu is a relatively new project that was created to deal with some of the
issues that as Consultants, we have seen recurring at different client sites.
Mainly the need to continuously deploy complex pipelines built in multiple tools
and languages.
Amaterasu started as a pet project and is currently being evaluated by a couple
of organizations, supported by the contributors, on a personal time and
voluntary bases.
== Rational ==
As software engineers working on big data projects we have straggled for a long
time to apply the same CI/CD practices that have become the standard in the
software industry for the last few years. While some of them are possible, for
example Apache Spark is easy to unit test. However large scale pipelines are
more complex and often use data, which might be un-structured as integration
point, which requires heavy integration tests.
To automate such tests and complex deployments, we have found the need to often
handcraft scripts and use a mixture tools, so we have decided to finally build a
tool we can apply in a general way and not on a project by project basis.
Another issue Amaterasu is trying to tackle is the Integrating between the work
of software engineers, data scientists, and sometimes operations engineers. The
approach Amaterasu takes to integrate between those three schools of thought it
to provide a simple YAML based DSL that provides a simple way to integrate
different pipeline written in the native tools for each task (R, Spark in
different languages, etc.).
== Initial Goals ==
Our initial goals are to bring Amaterasu into the ASF, transition internal
engineering processes into the open, and foster a collaborative development
model according to the "Apache Way".
In addition, we intend to continue the development of Amaterasu, add new
features as well as integrate better with other frameworks, including:
* Apache Arrow
* Apache Hive
* Apache Drill
* Apache Beam
* Apache YARN
* Farther and more complete integration with Apache Spark
Other frameworks will be evaluated after those initial goals are reached.
== Current Status ==
Amaterasu is preview state but provide a large set of features. We plan to
stabilize and head to a first production ready release during the incubation
process. The current license is already Apache 2.0.
=== Meritocracy ===
We intend to radically expand the initial developer and user community by
running the project in accordance with the "Apache Way". Users and new
contributors will be treated with respect and welcomed. By participating in the
community and providing quality patches/support that move the project forward,
they will earn merit. They also will be encouraged to provide non-code
contributions (documentation, events, community management, etc.) and will gain
merit for doing so. Those with a proven support and quality track record will be
encouraged to become committers.
=== Community ===
As a relatively new project, Amaterasu has a small, but growing community.
Amaterasu is an open project, not just with it’s source code but also with our
discussions which are held openly in our slack https://shintoio.slack.com which
contains channels for design, tech and future directions discussions.
If Amaterasu is accepted for incubation, the primary initial goal is to build a
large and strong community. We are confident that Amaterasu can become a key
project for big data operations, which hopefully will create a large community
of users and developers.
=== Known Risks ===
Development has been sponsored mostly by a one company. For the project to fully
transition to the Apache Way governance model, development must shift towards
the meritocracy-centric model of growing a community of contributors balanced
with the needs for extreme stability and core implementation coherency.
=== Orphaned products ===
We are fully committed on Amaterasu. A few organizations have expressed their
interest in using Amaterasu.
=== Inexperience with Open Source ===
We have been developing and using open source software for a long time.
Additionally, several ASF veterans have agreed to mentor the project and they
are listed in this proposal. The project will rely on their guidance and
collective wisdom to quickly transition the entire team of initial committers
towards practicing the Apache Way.
=== Reliance on Salaried Developers ===
Most of the current contributors are employed in the Big Data space. While they
might wander from their current employers, they are unlikely to venture far from
their core expertises and thus will continue to be engaged with the project
regardless of their current employers.
=== An Excessive Fascination with the Apache Brand ===
While we intend to leverage the Apache ‘branding’ when talking to other projects
as testament of our project’s ‘neutrality’, we have no plans for making use of
Apache brand in press releases nor posting billboards advertising acceptance of
Amaterasu into Apache Incubator.
The main purpose in applying for Apache incubation is due to the fact that
Amaterasu is built with integration already in mind for many tools which are
Apache projects, and we see Amaterasu as an extension of these projects. We hope
that by being an Apache project, we can integrate better, and collaborate more
effectively with the relevant projects. As Amaterasu matures, we see mutual
benefits for all involved.
=== Initial Source ===
https://github.com/shintoio/amaterasu
=== External Dependencies ===
All external dependencies are licensed under an Apache 2.0 license or
Apache-compatible license. As we grow the Amaterasu community we will configure
our build process to require and validate all contributions and dependencies are
licensed under the Apache 2.0 license or are under an Apache-compatible license.
* Apache Spark
* Apache Hadoop
* Apache Maven (maven-core)
* Apache Commons
* Apache Log4j
* Apache Mesos
* Apache Zookeeper
* Apache Curator
* Scala
* Junit
* Py4j
Future versions are planned to integrate with:
* Apache YARN
* Apache Hive
* Apache Drill
=== Required Resources ===
==== Mailing lists ====
* priv...@amaterasu.incubator.apache.org (moderated subscriptions)
* comm...@amaterasu.incubator.apache.org
* d...@amaterasu.incubator.apache.org
* iss...@amaterasu.incubator.apache.org
==== Git Repository ====
* https://git-wip-us.apache.org/repos/asf/incubator-amaterasu.git
==== Issue Tracking ====
* JIRA Project Amaterasu
==== Initial Committers ====
* Yaniv Rodenski
* Jean-Baptiste Onofré
* Eyal Ben Ivri
* Karel Alfonso
* Kirupagaran (Kirupa) Devarajan
* Nadav Har Tzvi
==== Affiliations ====
* Yaniv Rodenski - Shinto
* Jean-Baptiste Onofré - Talend
* Olivier Lamy - Webtide
==== Sponsors ====
==== Champion ====
* Jean-Baptiste Onofré
==== Mentors ====
* Jean-Baptiste Onofré
* Olivier Lamy
==== Sponsoring Entity ====
The Apache Incubator
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org