+1 I would also like to participate :) On Wed, Aug 5, 2015 at 5:52 AM, Edward J. Yoon <[email protected]> wrote:
> Guys, > > I plan to submit a 'DNN platform on top of Apache Hama' proposal as > below. I know Hama community is somewhat small, but the main reason is > that this domain-specific project is not fit for Apache Hama > community. Recruiting volunteers is also hard problem. I expect this > will become a very nice use-case of Apache Hama. > > If you have any suggestions or other opinions, Please let me know. > Also, if you want to participate in this project, Pls feel free to add > your name here. > > Thanks! > > -- > == Abstract == > > (tentatively named "Horn [hɔ:n]", korean meaning of Horn is a > "Spirit") is a neuron-centric programming APIs and execution framework > for large-scale deep learning, built on top of Apache Hama. > > == Proposal == > > It is a goal of the Horn to provide a neuron-centric programming APIs > which allows user to easily define the characteristic of artificial > neural network model and its structure, and its execution framework > that leverages the heterogeneous resources on Hama and Hadoop YARN > cluster. > > == Background == > > The initial ANN code was developed at Apache Hama project by a > committer, Yexi Jiang (Facebook) in 2013. The motivation behind this > work is to build a framework that provides more intuitive programming > APIs like Google's MapReduce or Pregel and supports applications > needing large model with huge memory consumptions in distributed way. > > == Rationale == > > While many of deep learning open source softwares are still data or > model parallel only, we aim to support both data and model parallelism > and also fault-tolerant system design. The basic idea of data and > model parallelism is use of the remote parameter server to parallelize > model creation and distribute training across machines, and the BSP > framework of Apache Hama for performing asynchronous mini-batches. > Within single BSP job, each task group works asynchronously using > region barrier synchronization instead of global barrier > synchronization, and trains large-scale neural network model using > assigned data sets in BSP paradigm. This architecture is inspired by > Google's DistBelief (Jeff Dean et al, 2012). > > == Initial Goals == > > Some current goals include: > > * builds new community > * provides more intuitive programming APIs > * needs both data and model parallelism support > * must run natively on both Hama and Hadoop2 > * needs also GPUs and InfiniBand support > > == Current Status == > > === Meritocracy === > > The core developers understand what it means to have a process based > on meritocracy. We will provide continuous efforts to build an > environment that supports this, encouraging community members to > contribute. > > === Community === > > A small community has formed within the Apache Hama project and some > companies such as instant messenger service company and mobile > manufacturing company. And many people are interested in the > large-scale deep learning platform itself. By bringing Horn into > Apache, we believe that the community will grow even bigger. > > === Core Developers === > > Edward J. Yoon, Thomas Jungblut, and Dongjin Lee > > == Known Risks == > > === Orphaned Products === > > Apache Hama is already a core open source component at Samsung > Electronics, and Horn also will be used by Samsung Electronics, and so > there is no direct risk for this project to be orphaned. > > === Inexperience with Open Source === > > Some are very new and the others have experience using and/or working > on Apache open source projects. > > === Homogeneous Developers === > > The initial committers are from different organizations such as, > Microsoft, Samsung Electronics, and Line Plus. > > === Reliance on Salaried Developers === > > Other developers will also start working on the project in their spare > time. > > === Relationships with Other Apache Products === > > * Horn is based on Apache Hama > * Apache Zookeeper is used for distributed locking service > * Natively run on Apache Hadoop and Mesos > * Horn can be somewhat overlapped with Singa podling. > > === An Excessive Fascination with the Apache Brand === > > Horn itself will hopefully have benefits from Apache, in terms of > attracting a community and establishing a solid group of developers, > but also the relation with Apache Hama, a general-purpose BSP > computing engine. These are the main reasons for us to send this > proposal. > > == Documentation == > > Initial plan about Horn can be found at > http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html > > == Initial Source == > > The initial source code has been release as part of Apache Hama > project developed under Apache Software Foundation. The source code is > currently hosted at > > https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/ > > == Cryptography == > > Not applicable. > > == Required Resources == > > Mailing Lists > > * horn-private > * horn-dev > > Subversion Directory > > * Git is the preferred source control system: git://git.apache.org/horn > > Issue Tracking > > * a JIRA issue tracker, HORN > > == Initial Committers and Affiliations == > > * Thomas Jungblut (tjungblut at apache dot org) > * Edward J. Yoon (edwardyoon at apache dot org) > * Dongjin Lee (dongjin.lee.kr at gmail dot com) > * Minho Kim (minwise.kim at samsung dot com) > * TODO > > == Affiliations == > > * Thomas Jungblut (Microsoft) > * Edward J. Yoon (Samsung Electronics) > * Donjin Lee (LINE Plus) > * Minho Kim (Samsung Electronics) > * TODO > > == Sponsors == > > Champion > > * Edward J. Yoon <edwardyoon at apache dot org> > > Nominated Mentors > > * TODO > > Sponsoring Entity > > The Apache Incubator > > -- > Best Regards, Edward J. Yoon >
