Luciano, There is a copy/paste error pointing to http://wiki.apache.org/incubator/Nuvem.
Sent from my iPhone 6 Plus > On Oct 27, 2015, at 10:03 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > On Tue, Oct 27, 2015 at 9:52 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> >> After initial discussion, please vote on the acceptance of SystemML >> Project for incubation at the Apache Incubator. The full proposal is >> available at the end of this message and on the wiki at : >> >> https://wiki.apache.org/incubator/SystemML >> <http://wiki.apache.org/incubator/Nuvem> >> >> Please cast your votes: >> >> [ ] +1, bring SystemML into Incubator >> [ ] +0, I don't care either way >> [ ] -1, do not bring SystemML into Incubator, because... >> >> The vote is open for the next 72 hours and only votes from the >> Incubator PMC are binding. >> >> >> = SystemML = >> >> == Abstract == >> >> SystemML provides declarative large-scale machine learning (ML) that aims >> at flexible specification of ML algorithms and automatic generation of >> hybrid runtime plans ranging from single node, in-memory computations, to >> distributed computations on Apache Hadoop MapReduce and Apache Spark. ML >> algorithms are expressed in an R-like syntax, that includes linear algebra >> primitives, statistical functions, and ML-specific constructs. This >> high-level language significantly increases the productivity of data >> scientists as it provides (1) full flexibility in expressing custom >> analytics, and (2) data independence from the underlying input formats and >> physical data representations. Automatic optimization according to data >> characteristics such as distribution on the disk file system, and sparsity >> as well as processing characteristics in the distributed environment like >> number of nodes, CPU, memory per node, ensures both efficiency and >> scalability. >> >> == Proposal == >> >> The goal of SystemML is to create a commercial friendly, scalable and >> extensible machine learning framework for data scientists to create or >> extend machine learning algorithms using a declarative syntax. The machine >> learning framework enables data scientists to develop algorithms locally >> without the need of a distributed cluster, and scale up and scale out the >> execution of these algorithms to distributed Apache Hadoop MapReduce or >> Apache Spark clusters. >> >> == Background == >> >> SystemML started as a research project in the IBM Almaden Research Center >> around 2007 aiming to enable data scientists to develop machine learning >> algorithms independent of data and cluster characteristics. >> >> == Rationale == >> >> SystemML enables the specification of machine learning algorithms using a >> declarative machine learning (DML) language. DML includes linear algebra >> primitives, statistical functions, and additional constructs. This >> high-level language significantly increases the productivity of data >> scientists as it provides (1) full flexibility in expressing custom >> analytics and (2) data independence from the underlying input formats and >> physical data representations. >> >> SystemML computations can be executed in a variety of different modes. It >> supports single node in-memory computations and large-scale distributed >> cluster computations. This allows the user to quickly prototype new >> algorithms in local environments but automatically scale to large data >> sizes as well without changing the algorithm implementation. >> >> Algorithms specified in DML are dynamically compiled and optimized based >> on data and cluster characteristics using rule-based and cost-based >> optimization techniques. The optimizer automatically generates hybrid >> runtime execution plans ranging from in-memory single-node execution to >> distributed computations on Apache Spark or Apache Hadoop MapReduce. This >> ensures both efficiency and scalability. Automatic optimization reduces or >> eliminates the need to hand-tune distributed runtime execution plans and >> system configurations. >> >> == Initial Goals == >> >> The initial goals to move SystemML to the Apache Incubator is to broaden >> the community foster the contributions from data scientists to develop new >> machine learning algorithms and enhance the existing ones. Ultimately, this >> may lead to the creation of an industry standard in specifying machine >> learning algorithms. >> >> == Current Status == >> >> The initial code has been developed at the IBM Almaden Research Center in >> California and has recently been made available in GitHub under the Apache >> Software License 2.0. The project currently supports a single node (in >> memory computation) as well as distributed computations utilizing Apache >> Hadoop MapReduce or Apache Spark clusters. >> >> === Meritocracy === >> >> We plan to invest in supporting a meritocracy. We will discuss the >> requirements in an open forum. Several companies have already expressed >> interest in this project, and we intend to invite additional developers to >> participate. We will encourage and monitor community participation so that >> privileges can be extended to those that contribute operating to the >> standard of meritocracy that Apache emphasizes. >> >> === Community === >> >> The need for a generic scalable and declarative machine learning approach >> in the open source is tremendous, so there is a potential for a very large >> community. We believe that SystemML’s extensible architecture, declarative >> syntax, cost based optimizer and its alignment with Spark will further >> encourage community participation not only in enhancing the infrastructure >> but also speed up the creation of algorithms for a wide range of use >> cases. We expect that over time SystemML will attract a large community. >> >> === Alignment === >> >> The initial committers strongly believe that a generic scalable and >> declarative machine learning approach for machine learning will gain >> broader adoption as an open source, community driven project, where the >> community can contribute not only to the core components, but also to a >> growing collection of algorithms which will leverage the optimizations and >> ease of scaling in SystemML. Our hope is that the Apache Spark, Apache >> Hadoop and other communities will find tremendous value in SystemML and >> this will foster further collaboration between these projects furthering >> the already existing integration points. >> >> == Known Risks == >> >> To-date, development has been sponsored by IBM and coordinated mostly by >> the core team of researchers at the IBM Almaden Research Center. >> >> For SystemML to fully transition to an "Apache Way" governance model, it >> needs to start embracing the meritocracy-centric way of growing the >> community of contributors. >> >> === Orphaned Products === >> >> The SystemML developers and previous sponsor have a long-term interest in >> use and maintenance of the code and there is also hope that growing a >> diverse community around the project will become a guarantee against the >> project becoming orphaned. We feel that it is also important to put formal >> governance in place both for the project and the contributors as the >> project expands. We feel ASF is the best location for this. >> >> === Inexperience with Open Source === >> >> The current SystemML set of contributors are very diverse regarding >> participation in Open Source. While some initial members are experiencing >> an open source project for the first time, others have been contributing >> and mentoring various Apache and non-Apache open source projects. >> >> === Reliance on Salaried Developers === >> >> SystemML currently receives substantial support from salaried developers. >> However, they are all passionate about the project, and we are confident >> that the project will continue even if no salaried developers contribute to >> the project. We are committed to recruiting additional committers including >> non-salaried developers. >> >> >> === Relationships with Other Apache Products === >> >> Currently, SystemML integrates with Apache Hadoop MapReduce and Apache >> Spark as underlying computational distributed runtimes. >> >> === An Excessive Fascination with the Apache Brand === >> >> SystemML solves a real need for generic scalable and declarative machine >> learning approach for machine learning in the Apache Hadoop and Spark >> ecosystems, something that has been addressed in a very ad hoc manner so >> far by multiple Apache projects. Our rationale for developing SystemML as >> an Apache project is detailed in the Rationale section. We believe that the >> Apache brand and community process will help us attract more contributors >> to this project, and help establish ubiquitous APIs. >> >> >> == Documentation == >> >> Documentation regarding SystemML is available in the current GitHub >> repository https://github.com/SparkTC/systemml/tree/master/system-ml/docs. >> >> >> == Initial Source == >> >> Initial source is available on GitHub under the Apache License 2.0 >> >> https://github.com/SparkTC/systemml >> >> == Source and Intellectual Property Submission Plan == >> >> We know of no legal encumbrances in the transfer of source code and rights >> to Apache. In fact, given the internal IBM due diligence performed on the >> source code during open sourcing, we expect the code base to be free from >> any IP issues. >> >> == External Dependencies == >> >> SystemML is written in Java and currently supports Apache Hadoop MapReduce >> and Apache Spark runtimes. >> >> To the best of our knowledge, all dependencies of SystemML are distributed >> under Apache compatible licenses. Upon acceptance to the incubator, we >> would begin a thorough analysis of all transitive dependencies to verify >> this fact and introduce license checking into the build and release process >> (for instance integrating Apache Rat). >> >> Cryptography >> N/A >> >> == Required Resources == >> >> === Mailing lists === >> * priv...@sysml.incubator.apache.org (moderated subscriptions) >> * comm...@sysml.incubator.apache.org >> * d...@sysml.incubator.apache.org >> >> === Git Repository === >> * https://git-wip-us.apache.org/repos/asf/incubator-sysml.git >> >> === Issue Tracking === >> * JIRA (SYSML) >> >> == Initial Committers == >> >> * Luciano Resende (lresende AT apache DOT org) >> * Berthold Reinwald (reinwald AT us DOT ibm DOT com) >> * Matthias Boehm (mboehm AT us DOT ibm DOT com) >> * Shirish Tatikonda (statiko AT us DOT ibm DOT com) >> * Niketan Pansare (npansar AT us DOT ibm DOT com) >> * Prithviraj Sen (senp AT us DOT ibm DOT com) >> * Alexandre V Evfimievski (evfimi AT us DOT ibm DOT com) >> * Fred Reiss (frreiss AT us DOT ibm DOT com) >> * Deron Eriksson (deron AT us DOT ibm DOT com) >> * Arvind Surve (asurve AT us DOT ibm DOT com) >> * Mike Dusenberry (mwdusenb AT us DOT ibm DOT com) >> * Reynold Xin (rxin AT apache DOT org) >> * Xiangrui Meng (meng AT apache DOT org) >> * Joseph Bradley (jkbradley AT apache DOT org) >> * Patrick Wendell (pwendell AT apache DOT org) >> * Holden Karau (holden AT apache DOT org) >> * DB Tsai (dbtsai AT apache DOT org) >> >> == Affiliations == >> >> * DataBricks: Reynold Xin, Xiangrui Meng, Joseph Bradley, Patrick Wendell >> * Netflix: DB Tsai >> * IBM: Luciano Resende, Berthold Reinwald, Matthias Boehm, Shirish >> Tatikonda, Niketan Pansare, Prithviraj Sen, Alexandre V Evfimievski, Fred >> Reiss, Deron Eriksson, Arvind Surve, Mike Dusenberry and Holden Karau. >> >> == Sponsors == >> >> === Champion === >> * Luciano Resende >> >> === Nominated Mentors === >> * Luciano Resende >> * Reynold Xin >> * Patrick Wendell >> * Rich Bowen >> >> === Sponsoring Entity === >> We would like to propose the Apache Incubator to sponsor this project. > Off course, my +1 > > -- > Luciano Resende > http://people.apache.org/~lresende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/