Hello, guys.
Theodore, last week I started the review of the PR:
https://github.com/apache/flink/pull/2735 related to *word2Vec for Flink*.

During this review I have asked myself: why do we need to implement such a
very popular algorithm like *word2vec one more time*, when there is already
availabe implementation in java provided by deeplearning4j.org
<https://deeplearning4j.org/word2vec> library (DL4J -> Apache 2 licence).
This library tries to promote it self, there is a hype around it in ML
sphere, and  it was integrated with Apache Spark, to provide scalable
deeplearning calculations.
That's why I thought: could we integrate with this library or not also and
Flink?
1) Personally I think, providing support and deployment of Deeplearning
algorithms/models in Flink is promising and attractive feature, because:
    a) during last two years deeplearning proved its efficiency and this
algorithms used in many applications. For example *Spotify *uses DL based
algorithms for music content extraction: Recommending music on Spotify with
deep learning AUGUST 05, 2014
<http://benanne.github.io/2014/08/05/spotify-cnns.html> for their music
recommendations. Doing this natively scalable is very attractive.


I have investigated that implementation of integration DL4J with Apache
Spark, and got several points:

1) It seems that idea of building of our own implementation of word2vec not
such a bad solution, because the integration of DL4J with Spark is too
strongly coupled with Saprk API and it will take time from the side of DL4J
to adopt this integration to Flink. Also I have expected that we will be
able to call just some API, it is not such thing.
2)

https://deeplearning4j.org/use_cases
https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/

чт, 19 янв. 2017 г. в 13:29, Till Rohrmann <trohrm...@apache.org>:

Hi Katherin,

welcome to the Flink community. Always great to see new people joining the
community :-)

Cheers,
Till

On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko <katherinm...@gmail.com>
wrote:

> ok, I've got it.
> I will take a look at  https://github.com/apache/flink/pull/2735.
>
> вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis <
> theodoros.vasilou...@gmail.com>:
>
> > Hello Katherin,
> >
> > Welcome to the Flink community!
> >
> > The ML component definitely needs a lot of work you are correct, we are
> > facing similar problems to CEP, which we'll hopefully resolve with the
> > restructuring Stephan has mentioned in that thread.
> >
> > If you'd like to help out with PRs we have many open, one I have started
> > reviewing but got side-tracked is the Word2Vec one [1].
> >
> > Best,
> > Theodore
> >
> > [1] https://github.com/apache/flink/pull/2735
> >
> > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske <fhue...@gmail.com>
> wrote:
> >
> > > Hi Katherin,
> > >
> > > welcome to the Flink community!
> > > Help with reviewing PRs is always very welcome and a great way to
> > > contribute.
> > >
> > > Best, Fabian
> > >
> > >
> > >
> > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko <katherinm...@gmail.com>:
> > >
> > > > Thank you, Timo.
> > > > I have started the analysis of the topic.
> > > > And if it necessary, I will try to perform the review of other
pulls)
> > > >
> > > >
> > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther <twal...@apache.org>:
> > > >
> > > > > Hi Katherin,
> > > > >
> > > > > great to hear that you would like to contribute! Welcome!
> > > > >
> > > > > I gave you contributor permissions. You can now assign issues to
> > > > > yourself. I assigned FLINK-1750 to you.
> > > > > Right now there are many open ML pull requests, you are very
> welcome
> > to
> > > > > review the code of others, too.
> > > > >
> > > > > Timo
> > > > >
> > > > >
> > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
> > > > > > Hello, All!
> > > > > >
> > > > > >
> > > > > >
> > > > > > I'm Kate Eri, I'm java developer with 6-year enterprise
> experience,
> > > > also
> > > > > I
> > > > > > have some expertise with scala (half of the year).
> > > > > >
> > > > > > Last 2 years I have participated in several BigData projects
that
> > > were
> > > > > > related to Machine Learning (Time series analysis, Recommender
> > > systems,
> > > > > > Social networking) and ETL. I have experience with Hadoop,
Apache
> > > Spark
> > > > > and
> > > > > > Hive.
> > > > > >
> > > > > >
> > > > > > I’m fond of ML topic, and I see that Flink project requires some
> > work
> > > > in
> > > > > > this area, so that’s why I would like to join Flink and ask me
to
> > > grant
> > > > > the
> > > > > > assignment of the ticket
> > > > > https://issues.apache.org/jira/browse/FLINK-1750
> > > > > > to me.
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to