Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have
separate releases, like Hive's storage API, OZone, etc. For loosely coupled
sub-projects, it gonna be great (at least for most of the users) to have
separate releases so new features can be faster consumed and iterated. From
above feedbacks from developers and users, I think it is also what people
want.

Another concern you mentioned is Submarine is aligned with Hadoop project
goals. From feedbacks we can see, it attracts companies continue using
Hadoop to solve their ML/DL requirements, it also created a good feedback
loop, many issues faced, and some new functionalities added by Submarine
went back to Hadoop. Such as localization files, directories. GPU topology
related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both
Submarine and Hadoop, try to get fast release cycles for both of the
projects. And for your suggestion about Apache incubator, we can reconsider
it once Submarine becomes a more independent project, now it is still too
small and too much overhead to go through the process, I don't want to stop
the fast-growing community for months to go through incubator process for
now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <abvclo...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <neliu...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日,上午2:53,Wangda Tan <wheele...@gmail.com> 写道:
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3ccahfhakh6_m3yldf5a2kq8+w-5fbvx5ahfgs-x1vajw8gmnz...@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>     >
>
>
>

Reply via email to