Re: Proposal for an Apache Hama sub-project

Edward Capriolo Thu, 02 Mar 2017 07:35:29 -0800

On Mon, Feb 27, 2017 at 7:13 PM, Edward J. Yoon <edward.y...@samsung.com>
wrote:


> Thanks for your proposal.
>
> I of course think Apache Hama can be used for scheduling sync and async
> communication/computation networks with various topologies and resource
> allocation. However, I'm not sure whether this approach is also fit for
> modern microservice architecture? In my opinion, this can be discussed and
> cooked in Hama community as a sub-project until it's mature enough (CC'ing
> general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
> community).
>
> P.S., It seems you referred to incubation proposal template. There's no
> need
> to add me as initial committer (I don't have much time to actively
> contribute to your project). And, I recently quit Samsung Electronics and
> joined to $200 billion sized O2O e-commerce company as a CTO.
>
> -----Original Message-----
> From: Sachin Ghai [mailto:sachin.g...@impetus.co.in]
> Sent: Monday, February 27, 2017 5:16 PM
> To: d...@hama.apache.org
> Subject: Proposal for an Apache Hama sub-project
>
> Hama Community,
>
> I would like to propose a sub-project for Apache Hama and initiate
> discussion around the proposal. The proposed sub-project named 'Scalar' is
> a
> scalable orchestration, training and serving system for machine learning
> and
> deep learning. Scalar would leverage Apache Hama to automate the
> distributed
> training, model deployment and prediction serving.
>
> More details about the proposal are listed below as per Apache project
> proposal template:
> Abstract
> Scalar is a general purpose framework for simplifying massive scale big
> data
> analytics and deep learning modelling, deployment, serving with high
> performance.
> Proposal
> It is a goal of Scalar to provide an abstraction framework which allows
> user
> to easily scale the functions of training a model, deploying a model and
> serving the prediction from underlying machine learning or deep learning
> framework. It is also the characteristic of its execution framework to
> orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
> Hadoop, Apache Spark and TensorFlow resources.
> Background
> The initial Scalar code was developed in 2016 and has been successfully
> beta
> tested for one of the largest insurance organizations in a client specific
> PoC. The motivation behind this work is to build a framework that provides
> abstraction on heterogeneous data science frameworks and helps users
> leverage them in the most performant way.
> Rationale
> There is a sudden deluge of machine learning and deep learning frameworks
> in
> the industry. As an application developer, it becomes a hard choice to
> switch from one framework to another without rewriting the application.
> Also, there is additional plumbing to be done to retrieve the prediction
> results for each model in different frameworks. We aim to provide an
> abstraction framework which can be used to seamlessly train and deploy the
> model at scale on multiple frameworks like TensorFlow, Apache Horn or
> Caffe.
> The abstraction further provides a unified layer for serving the prediction
> in the most performant, scalable and efficient way for a multi-tenant
> deployment. The key performance metrics will be reduction in training time,
> lower error rate and lower latency time for serving models.
> Scalar consists of a core engine which can be used to create flows
> described
> in terms of state, sequences and algorithms. The engine invokes execution
> context of Apache Hama to train and deploy models on target framework.
> Apache Hama is used for a variety of functions including parameter tuning
> and scheduling computations on a distributed cluster. A data object layer
> provides access to data from heterogeneous sources like HDFS, local, S3
> etc.
> A REST API layer is utilized for serving the prediction functions to client
> applications. A caching layer in the middle acts as a latency improver for
> various functions.
> Initial Goals
> Some current goals include:
>
>   *   Build community.
>   *   Provide general purpose API for machine learning and deep learning
> training, deployment and serving.
>   *   Serve the predictions with low latency.
>   *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
> Caffe.
>   *   Provide CPU and GPU support on-premise or on cloud to run the
> algorithms.
> Current Status
> Meritocracy
> The core developers understand what it means to have a process based on
> meritocracy. We will provide continuous efforts to build an environment
> that
> supports this, encouraging community members to contribute.
> Community
> A small community has formed within the Apache Hama project community and
> companies such as enterprise services and product company and artificial
> intelligence startup. There is a lot of interest in data science serving
> systems and Artificial intelligence simplification systems. By bringing
> Scalar into Apache, we believe that the community will grow even bigger.
> Core Developers
> Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni,
> Nikunj Limbaseeya, Mayur Choubey
> Known Risks
> Orphaned Products
> Apache Hama is already a core open source component being utilized at
> Samsung Electronics, and Scalar is already getting adopted by major
> enterprise organizations. There is no direct risk for Scalar project to be
> orphaned.
> Inexperience with Open Source
> All contributors have experience using and/or working on Apache open source
> projects.
> Homogeneous Developers
> The initial committers are from different organizations such as Impetus,
> Chalk Digital, and Samsung Electronics.
> Reliance on Salaried Developers
> Few will be working as full-time open source developer. Other developers
> will also start working on the project in their spare time.
> Relationships with Other Apache Products
>
>   *   Scalar is being built on top of Apache Hama
>   *   Apache Spark is being used for machine learning.
>   *   Apache Horn is being used for deep learning.
>   *   The framework will run natively on Apache Hadoop and Apache Mesos.
> An Excessive Fascination with the Apache Brand
> Scalar itself will hopefully have benefits from Apache, in terms of
> attracting a community and establishing a solid group of developers, but
> also the relation with Apache Hadoop, Spark and Hama. These are the main
> reasons for us to send this proposal.
> Documentation
> Initial design of Scalar can be found at this
> link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONm
> Z4aU0/view?usp=s
> haring>.
> Initial Source
> Impetus Technologies (Impetus) will contribute the initial orchestration
> code base to create this project. Impetus plans to contribute the Scalar
> code base, test cases, build files, and documentation to the ASF under the
> terms specified in the ASF Corporate Contributor License and further
> develop
> it with wider community. Once at Apache, the project will be licensed under
> the ASF license.
> Cryptography
> Not applicable.
> Required Resources
> Mailing Lists
>
>   *   scalar-dev
>   *   scalar-pmc
> Subversion Directory
>
>   *   Git is the preferred source control system:
> git://git.apache.org/scalar
> Issue Tracking
>
>   *   a JIRA issue tracker, SCALAR
> Initial Committers
>
>   *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
>   *   Edward J. Yoon (edwardyoon AT apache DOT org)
>   *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
>   *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
>   *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
>   *   Rachna Gogia (rachna AT hadoopsphere DOT org)
>   *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
> Affiliations
>
>   *   Sachin Ghai (Impetus)
>   *   Edward J. Yoon (Samsung Electronics)
>   *   Abhishek Soni (Impetus)
>   *   Ishwardeep Singh ( Chalk Digital)
>   *   Nikunj Limbaseeya (Impetus)
>   *   Rachna Gogia (HadoopSphere)
>   *   Mayur Choubey (Impetus)
> Sponsors
> <proposed>
> Champion
>
>   *   Edward J. Yoon <ASF member, Samsung Electronics >
> Nominated Mentors
>
>   *   Edward J. Yoon <ASF member, Samsung Electronics >
> Sponsoring Entity
> The Apache Hama project
>
> -- End of proposal --
>
> Thanks,
> Sachin Ghai
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>
I do not believe the the Hama project has had activity for a long time. 1 +
year. For example, have attempted to broach this discussion and got no
official reply: https://issues.apache.org/jira/browse/HAMA-998.

I am interested in Scalar and I would like to take time and familiarize
myself with it.  I do not believe I am the right champion but I can
possibly be a mentor/contributor.

Thanks,
Edward

Re: Proposal for an Apache Hama sub-project

Reply via email to