+1. Thanks for bringing this up Wangda.

Makes sense to have Submarine follow its own release cadence given the good
momentum/adoption so far. Also, making it run with older versions of Hadoop
would drive higher adoption.

Suma

On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <abvclo...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <neliu...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日,上午2:53,Wangda Tan <wheele...@gmail.com> 写道:
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3ccahfhakh6_m3yldf5a2kq8+w-5fbvx5ahfgs-x1vajw8gmnz...@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>     >
>
>
>

Reply via email to