On Mon, Feb 27, 2017 at 7:13 PM, Edward J. Yoon <edward.y...@samsung.com> wrote:
> Thanks for your proposal. > > I of course think Apache Hama can be used for scheduling sync and async > communication/computation networks with various topologies and resource > allocation. However, I'm not sure whether this approach is also fit for > modern microservice architecture? In my opinion, this can be discussed and > cooked in Hama community as a sub-project until it's mature enough (CC'ing > general@i.a.o. I'll be happy to read more feedbacks from ASF incubator > community). > > P.S., It seems you referred to incubation proposal template. There's no > need > to add me as initial committer (I don't have much time to actively > contribute to your project). And, I recently quit Samsung Electronics and > joined to $200 billion sized O2O e-commerce company as a CTO. > > -----Original Message----- > From: Sachin Ghai [mailto:sachin.g...@impetus.co.in] > Sent: Monday, February 27, 2017 5:16 PM > To: d...@hama.apache.org > Subject: Proposal for an Apache Hama sub-project > > Hama Community, > > I would like to propose a sub-project for Apache Hama and initiate > discussion around the proposal. The proposed sub-project named 'Scalar' is > a > scalable orchestration, training and serving system for machine learning > and > deep learning. Scalar would leverage Apache Hama to automate the > distributed > training, model deployment and prediction serving. > > More details about the proposal are listed below as per Apache project > proposal template: > Abstract > Scalar is a general purpose framework for simplifying massive scale big > data > analytics and deep learning modelling, deployment, serving with high > performance. > Proposal > It is a goal of Scalar to provide an abstraction framework which allows > user > to easily scale the functions of training a model, deploying a model and > serving the prediction from underlying machine learning or deep learning > framework. It is also the characteristic of its execution framework to > orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache > Hadoop, Apache Spark and TensorFlow resources. > Background > The initial Scalar code was developed in 2016 and has been successfully > beta > tested for one of the largest insurance organizations in a client specific > PoC. The motivation behind this work is to build a framework that provides > abstraction on heterogeneous data science frameworks and helps users > leverage them in the most performant way. > Rationale > There is a sudden deluge of machine learning and deep learning frameworks > in > the industry. As an application developer, it becomes a hard choice to > switch from one framework to another without rewriting the application. > Also, there is additional plumbing to be done to retrieve the prediction > results for each model in different frameworks. We aim to provide an > abstraction framework which can be used to seamlessly train and deploy the > model at scale on multiple frameworks like TensorFlow, Apache Horn or > Caffe. > The abstraction further provides a unified layer for serving the prediction > in the most performant, scalable and efficient way for a multi-tenant > deployment. The key performance metrics will be reduction in training time, > lower error rate and lower latency time for serving models. > Scalar consists of a core engine which can be used to create flows > described > in terms of state, sequences and algorithms. The engine invokes execution > context of Apache Hama to train and deploy models on target framework. > Apache Hama is used for a variety of functions including parameter tuning > and scheduling computations on a distributed cluster. A data object layer > provides access to data from heterogeneous sources like HDFS, local, S3 > etc. > A REST API layer is utilized for serving the prediction functions to client > applications. A caching layer in the middle acts as a latency improver for > various functions. > Initial Goals > Some current goals include: > > * Build community. > * Provide general purpose API for machine learning and deep learning > training, deployment and serving. > * Serve the predictions with low latency. > * Run massive workloads via Apache Hama on TensorFlow, Apache Spark and > Caffe. > * Provide CPU and GPU support on-premise or on cloud to run the > algorithms. > Current Status > Meritocracy > The core developers understand what it means to have a process based on > meritocracy. We will provide continuous efforts to build an environment > that > supports this, encouraging community members to contribute. > Community > A small community has formed within the Apache Hama project community and > companies such as enterprise services and product company and artificial > intelligence startup. There is a lot of interest in data science serving > systems and Artificial intelligence simplification systems. By bringing > Scalar into Apache, we believe that the community will grow even bigger. > Core Developers > Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni, > Nikunj Limbaseeya, Mayur Choubey > Known Risks > Orphaned Products > Apache Hama is already a core open source component being utilized at > Samsung Electronics, and Scalar is already getting adopted by major > enterprise organizations. There is no direct risk for Scalar project to be > orphaned. > Inexperience with Open Source > All contributors have experience using and/or working on Apache open source > projects. > Homogeneous Developers > The initial committers are from different organizations such as Impetus, > Chalk Digital, and Samsung Electronics. > Reliance on Salaried Developers > Few will be working as full-time open source developer. Other developers > will also start working on the project in their spare time. > Relationships with Other Apache Products > > * Scalar is being built on top of Apache Hama > * Apache Spark is being used for machine learning. > * Apache Horn is being used for deep learning. > * The framework will run natively on Apache Hadoop and Apache Mesos. > An Excessive Fascination with the Apache Brand > Scalar itself will hopefully have benefits from Apache, in terms of > attracting a community and establishing a solid group of developers, but > also the relation with Apache Hadoop, Spark and Hama. These are the main > reasons for us to send this proposal. > Documentation > Initial design of Scalar can be found at this > link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONm > Z4aU0/view?usp=s > haring>. > Initial Source > Impetus Technologies (Impetus) will contribute the initial orchestration > code base to create this project. Impetus plans to contribute the Scalar > code base, test cases, build files, and documentation to the ASF under the > terms specified in the ASF Corporate Contributor License and further > develop > it with wider community. Once at Apache, the project will be licensed under > the ASF license. > Cryptography > Not applicable. > Required Resources > Mailing Lists > > * scalar-dev > * scalar-pmc > Subversion Directory > > * Git is the preferred source control system: > git://git.apache.org/scalar > Issue Tracking > > * a JIRA issue tracker, SCALAR > Initial Committers > > * Sachin Ghai (sachin.ghai AT impetus DOT co DOT in) > * Edward J. Yoon (edwardyoon AT apache DOT org) > * Abhishek Soni (abhishek.soni AT impetus DOT co DOT in) > * Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com ) > * Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in) > * Rachna Gogia (rachna AT hadoopsphere DOT org) > * Mayur Choubey (mayur.choubey AT impetus DOT co DOT in) > Affiliations > > * Sachin Ghai (Impetus) > * Edward J. Yoon (Samsung Electronics) > * Abhishek Soni (Impetus) > * Ishwardeep Singh ( Chalk Digital) > * Nikunj Limbaseeya (Impetus) > * Rachna Gogia (HadoopSphere) > * Mayur Choubey (Impetus) > Sponsors > <proposed> > Champion > > * Edward J. Yoon <ASF member, Samsung Electronics > > Nominated Mentors > > * Edward J. Yoon <ASF member, Samsung Electronics > > Sponsoring Entity > The Apache Hama project > > -- End of proposal -- > > Thanks, > Sachin Ghai > > ________________________________ > > > > > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > I do not believe the the Hama project has had activity for a long time. 1 + year. For example, have attempted to broach this discussion and got no official reply: https://issues.apache.org/jira/browse/HAMA-998. I am interested in Scalar and I would like to take time and familiarize myself with it. I do not believe I am the right champion but I can possibly be a mentor/contributor. Thanks, Edward