Hi Henri, My previous comment was just a review comment against the proposal, but I forgot to mentioning importance thing.
> Currently the list of committers is based on the current active coders, so > we're also very interested in hearing from anyone else who is interested in > working on the project, be they current or future contributor! I'm also interested in working on MXNet :-) Thanks, - Tsuyoshi On Thu, Jan 12, 2017 at 3:43 PM, 项亮 <xlvec...@gmail.com> wrote: > I would like to volunteer as a committer for MXNet > > github id: xlvector > email: xlvec...@gmail.com > > Liang Xiang from Toutiao Lab > > On 2017-01-06 13:12 (+0800), Henri Yandell <bay...@apache.org> wrote: >> Hello Incubator, >> >> I'd like to propose a new incubator Apache MXNet podling. >> >> The existing MXNet project (http://mxnet.io - 1.5 years old, 15 committers, >> 200 contributors) is very interested in joining Apache. MXNet is an >> open-source deep learning framework that allows you to define, train, and >> deploy deep neural networks on a wide array of devices, from cloud >> infrastructure to mobile devices. >> >> The wiki proposal page is located here: >> >> https://wiki.apache.org/incubator/MXNetProposal >> >> I've included the text below in case anyone wants to focus on parts of it >> in a reply. >> >> Looking forward to your thoughts, and for lots of interested Apache members >> to volunteer to mentor the project in addition to Sebastian and myself. >> >> Currently the list of committers is based on the current active coders, so >> we're also very interested in hearing from anyone else who is interested in >> working on the project, be they current or future contributor! >> >> Thanks, >> >> Hen >> On behalf of the MXNet project >> >> --------- >> >> = MXNet: Apache Incubator Proposal = >> >> == Abstract == >> >> MXNet is a Flexible and Efficient Library for Deep Learning >> >> == Proposal == >> >> MXNet is an open-source deep learning framework that allows you to define, >> train, and deploy deep neural networks on a wide array of devices, from >> cloud infrastructure to mobile devices. It is highly scalable, allowing for >> fast model training, and supports a flexible programming model and multiple >> languages. MXNet allows you to mix symbolic and imperative programming >> flavors to maximize both efficiency and productivity. MXNet is built on a >> dynamic dependency scheduler that automatically parallelizes both symbolic >> and imperative operations on the fly. A graph optimization layer on top of >> that makes symbolic execution fast and memory efficient. The MXNet library >> is portable and lightweight, and it scales to multiple GPUs and multiple >> machines. >> >> == Background == >> >> Deep learning is a subset of Machine learning and refers to a class of >> algorithms that use a hierarchical approach with non-linearities to >> discover and learn representations within data. Deep Learning has recently >> become very popular due to its applicability and advancement of domains >> such as Computer Vision, Speech Recognition, Natural Language Understanding >> and Recommender Systems. With pervasive and cost effective cloud computing, >> large labeled datasets and continued algorithmic innovation, Deep Learning >> has become the one of the most popular classes of algorithms for machine >> learning practitioners in recent years. >> >> == Rational == >> >> The adoption of deep learning is quickly expanding from initial deep domain >> experts rooted in academia to data scientists and developers working to >> deploy intelligent services and products. Deep learning however has many >> challenges. These include model training time (which can take days to >> weeks), programmability (not everyone writes Python or C++ and like >> symbolic programming) and balancing production readiness (support for >> things like failover) with development flexibility (ability to program >> different ways, support for new operators and model types) and speed of >> execution (fast and scalable model training). Other frameworks excel on >> some but not all of these aspects. >> >> >> == Initial Goals == >> >> MXNet is a fairly established project on GitHub with its first code >> contribution in April 2015 and roughly 200 contributors. It is used by >> several large companies and some of the top research institutions on the >> planet. Initial goals would be the following: >> >> 1. Move the existing codebase(s) to Apache >> 1. Integrate with the Apache development process/sign CLAs >> 1. Ensure all dependencies are compliant with Apache License version 2.0 >> 1. Incremental development and releases per Apache guidelines >> 1. Establish engineering discipline and a predictable release cadence of >> high quality releases >> 1. Expand the community beyond the current base of expert level users >> 1. Improve usability and the overall developer/user experience >> 1. Add additional functionality to address newer problem types and >> algorithms >> >> >> == Current Status == >> >> === Meritocracy === >> >> The MXNet project already operates on meritocratic principles. Today, MXNet >> has developers worldwide and has accepted multiple major patches from a >> diverse set of contributors within both industry and academia. We would >> like to follow ASF meritocratic principles to encourage more developers to >> contribute in this project. We know that only active and committed >> developers from a diverse set of backgrounds can make MXNet a successful >> project. We are also improving the documentation and code to help new >> developers get started quickly. >> >> === Community === >> >> Acceptance into the Apache foundation would bolster the growing user and >> developer community around MXNet. That community includes around 200 >> contributors from academia and industry. The core developers of our project >> are listed in our contributors below and are also represented by logos on >> the mxnet.io site including Amazon, Baidu, Carnegie Mellon University, >> Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of Alberta, >> University of Washington and Wolfram. >> >> === Core Developers === >> >> (with GitHub logins) >> >> * Tianqi Chen (@tqchen) >> * Mu Li (@mli) >> * Junyuan Xie (@piiswrong) >> * Bing Xu (@antinucleon) >> * Chiyuan Zhang (@pluskid) >> * Minjie Wang (@jermainewang) >> * Naiyan Wang (@winstywang) >> * Yizhi Liu (@javelinjs) >> * Tong He (@hetong007) >> * Qiang Kou (@thirdwing) >> * Xingjian Shi (@sxjscience) >> >> === Alignment === >> >> ASF is already the home of many distributed platforms, e.g., Hadoop, Spark >> and Mahout, each of which targets a different application domain. MXNet, >> being a distributed platform for large-scale deep learning, focuses on >> another important domain for which there still lacks a scalable, >> programmable, flexible and super fast open-source platform. The recent >> success of deep learning models especially for vision and speech >> recognition tasks has generated interests in both applying existing deep >> learning models and in developing new ones. Thus, an open-source platform >> for deep learning backed by some of the top industry and academic players >> will be able to attract a large community of users and developers. MXNet is >> a complex system needing many iterations of design, implementation and >> testing. Apache's collaboration framework which encourages active >> contribution from developers will inevitably help improve the quality of >> the system, as shown in the success of Hadoop, Spark, etc. Equally >> important is the community of users which helps identify real-life >> applications of deep learning, and helps to evaluate the system's >> performance and ease-of-use. We hope to leverage ASF for coordinating and >> promoting both communities, and in return benefit the communities with >> another useful tool. >> >> == Known Risks == >> >> === Orphaned products === >> >> Given the current level of investment in MXNet and the stakeholders using >> it - the risk of the project being abandoned is minimal. Amazon, for >> example, is in active development to use MXNet in many of its services and >> many large corporations use it in their production applications. >> >> === Inexperience with Open Source === >> >> MXNet has existed as a healthy open source project for more than a year. >> During that time, the project has attracted 200+ contributors. >> >> === Homogenous Developers === >> >> The initial list of committers and contributors includes developers from >> several institutions and industry participants (see above). >> >> === Reliance on Salaried Developers === >> >> Like most open source projects, MXNet receives a substantial support from >> salaried developers. A large fraction of MXNet development is supported by >> graduate students at various universities in the course of research degrees >> - this is more a “volunteer” relationship, since in most cases students >> contribute vastly more than is necessary to immediately support research. >> In addition, those working from within corporations are devoting >> significant time and effort in the project - and these come from several >> organizations. >> >> === A Excessive Fascination with the Apache Brand === >> >> We choose Apache not for publicity. We have two purposes. First, we hope >> that Apache's known best-practices for managing a mature open source >> project can help guide us. For example, we are feeling the growing pains >> of a successful open source project as we attempt a major refactor of the >> internals while customers are using the system in production. We seek >> guidance in communicating breaking API changes and version revisions. >> Also, as our involvement from major corporations increases, we want to >> assure our users that MXNet will stay open and not favor any particular >> platform or environment. These are some examples of the know-how and >> discipline we're hoping Apache can bring to our project. >> >> Second, we want to leverage Apache's reputation to recruit more developers >> to create a diverse community. >> >> === Relationship with Other Apache Products === >> >> Apache Mahout and Apache Spark's MLlib are general machine learning >> systems. Deep learning algorithms can thus be implemented on these two >> platforms as well. However, in practice, the overlap will be minimal. Deep >> learning is so computationally intensive that it often requires specialized >> GPU hardware to accomplish tasks of meaningful size. Making efficient use >> of GPU hardware is complex because the hardware is so fast that the >> supporting systems around it must be carefully optimized to keep the GPU >> cores busy. Extending this capability to distributed multi-GPU and >> multi-host environments requires great care. This is a critical >> differentiator between MXNet and existing Apache machine learning systems. >> >> Mahout and Spark ML-LIB follow models where their nodes run synchronously. >> This is the fundamental difference to MXNet who follows the parameter >> server framework. MXNet can run synchronously or asynchronously. In >> addition, MXNet has optimizations for training a wide range of deep >> learning models using a variety of approaches (e.g., model parallelism and >> data parallelism) which makes MXNet much more efficient (near-linear >> speedup on state of the art models). MXNet also supports both imperative >> and symbolic approaches providing ease of programming for deep learning >> algorithms. >> >> Other Apache projects that are potentially complimentary: >> >> Apache Arrow - read data in Apache Arrow‘s internal format from MXNet, that >> would allow users to run ETL/preprocessing in Spark, save the results in >> Arrow’s format and then run DL algorithms on it. >> >> Apache Singa - MXNet and Singa are both deep learning projects, and can >> benefit from a larger deep learning community at Apache. >> >> == Documentation == >> >> Documentation has recently migrated to http://mxnet.io. We continue to >> refine and improve the documentation. >> >> == Initial Source == >> >> We currently use Github to maintain our source code, >> https://github.com/MXNet >> >> == Source and Intellectual Property Submission Plan == >> >> MXNet Code is available under Apache License, Version 2.0. We will work >> with the committers to get CLAs signed and review previous contributions. >> >> == External Dependencies == >> >> * required by the core code base: GCC or CLOM, Clang, any BLAS library >> (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires >> lib-zeromq), TBB >> * required for GPU usage: cudnn, cuda >> * required for python usage: Python 2/3 >> * required for R module: R, Rcpp (GPLv2 licensing) >> * optional for image preparation and preprocessing: opencv >> * optional dependencies for additional features: torch7, numba, cython (in >> NNVM branch) >> >> Rcpt and lib-zeromq are expected to be licensing discussions. >> >> == Cryptography == >> >> Not Applicable >> >> == Required Resources == >> >> === Mailing Lists === >> >> There is currently no mailing list. >> >> === Issue Tracking === >> >> Currently uses GitHub to track issues. Would like to continue to do so. >> >> == Committers and Affiliations == >> >> * Tianqi Chen (UW) >> * Mu Li (AWS) >> * Junyuan Xie (AWS) >> * Bing Xu (Apple) >> * Chiyuan Zhang (MIT) >> * Minjie Wang (UYU) >> * Naiyan Wang (Tusimple) >> * Yizhi Liu (Mediav) >> * Tong He (Simon Fraser University) >> * Qiang Kou (Indiana U) >> * Xingjian Shi (HKUST) >> >> == Sponsors == >> >> === Champion === >> >> Henri Yandell (bayard at apache.org) >> >> === Nominated Mentors === >> >> Sebastian Schelter (s...@apache.org) >> >> >> === Sponsoring Entity === >> >> We are requesting the Incubator to sponsor this project. >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org