I think there is no such rule that APIs go automatically into opt/ and
"libraries" not. The contents of opt/ have mainly grown over time w/o
following a strict rule.

I think the decisive factor for what goes into Flink's binary distribution
should be how core it is to Flink. Of course another important
consideration is which use cases Flink should promote "out of the box" (not
sure whether this is actual true for content shipped in opt/ because you
also have to move it to lib).

For example, Gelly would be an example which I would rather see as an
optional component than shipping it with every Flink binary distribution.

Cheers,
Till

On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <becket....@gmail.com> wrote:

> Thanks for the suggestion, Till.
>
> I am curious about how do we usually decide when to put the jars into the
> opt folder?
>
> Technically speaking, it seems that `flink-ml-api` should be put into the
> opt directory because they are actually API instead of libraries, just like
> CEP and Table.
>
> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
> the other hand, unlike SQL formats and Hadoop whose major code are outside
> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
> those of built-in SQL UDFs. So it seems fine to either put it in the opt
> folder or in the downloads page.
>
> From the user experience perspective, it might be better to have both
> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
> places for the required dependencies.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <he...@apache.org> wrote:
>
> > Hi Till,
> >
> > Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> > libraries as optional dependencies on the download page which can make
> the
> > dist smaller.
> >
> > But I also have some concerns for it, e.g., the download page now only
> > includes the latest 3 releases. We may need to find ways to support more
> > versions.
> > On the other hand, the size of the flink-ml libraries now is very
> > small(about 246K), so it would not bring much impact on the size of dist.
> >
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
> >
> >> An alternative solution would be to offer the flink-ml libraries as
> >> optional dependencies on the download page. Similar to how we offer the
> >> different SQL formats and Hadoop releases [1].
> >>
> >> [1] https://flink.apache.org/downloads.html
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <he...@apache.org> wrote:
> >>
> >> > Thank you all for your feedback and suggestions!
> >> >
> >> > Best, Hequn
> >> >
> >> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <becket....@gmail.com>
> wrote:
> >> >
> >> > > Thanks for bringing up the discussion, Hequn.
> >> > >
> >> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> >> make
> >> > > it much easier for the users to try out some simple ml tasks.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
> sunjincheng...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > >> Thank you for pushing forward @Hequn Cheng <he...@apache.org> !
> >> > >>
> >> > >> Hi  @Becket Qin <becket....@gmail.com> , Do you have any concerns
> on
> >> > >> this ?
> >> > >>
> >> > >> Best,
> >> > >> Jincheng
> >> > >>
> >> > >> Hequn Cheng <he...@apache.org> 于2020年2月3日周一 下午2:09写道:
> >> > >>
> >> > >>> Hi everyone,
> >> > >>>
> >> > >>> Thanks for the feedback. As there are no objections, I've opened a
> >> JIRA
> >> > >>> issue(FLINK-15847[1]) to address this issue.
> >> > >>> The implementation details can be discussed in the issue or in the
> >> > >>> following PR.
> >> > >>>
> >> > >>> Best,
> >> > >>> Hequn
> >> > >>>
> >> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> >> > >>>
> >> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <chenghe...@gmail.com>
> >> > wrote:
> >> > >>>
> >> > >>> > Hi Jincheng,
> >> > >>> >
> >> > >>> > Thanks a lot for your feedback!
> >> > >>> > Yes, I agree with you. There are cases that multi jars need to
> be
> >> > >>> > uploaded. I will prepare another discussion later. Maybe with a
> >> > simple
> >> > >>> > design doc.
> >> > >>> >
> >> > >>> > Best, Hequn
> >> > >>> >
> >> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> >> > sunjincheng...@gmail.com>
> >> > >>> > wrote:
> >> > >>> >
> >> > >>> >> Thanks for bring up this discussion Hequn!
> >> > >>> >>
> >> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >> > >>> >>
> >> > >>> >> BTW: I think would be great if bring up a discussion for upload
> >> > >>> multiple
> >> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit
> >> if
> >> > we
> >> > >>> do
> >> > >>> >> that improvement.
> >> > >>> >>
> >> > >>> >> Best,
> >> > >>> >> Jincheng
> >> > >>> >>
> >> > >>> >>
> >> > >>> >> Hequn Cheng <chenghe...@gmail.com> 于2020年1月8日周三 上午11:50写道:
> >> > >>> >>
> >> > >>> >> > Hi everyone,
> >> > >>> >> >
> >> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
> which
> >> > moves
> >> > >>> >> Flink
> >> > >>> >> > ML a step further. Base on it, users can develop their ML
> jobs
> >> and
> >> > >>> more
> >> > >>> >> and
> >> > >>> >> > more machine learning platforms are providing ML services.
> >> > >>> >> >
> >> > >>> >> > However, the problem now is the jars of flink-ml-api and
> >> > >>> flink-ml-lib
> >> > >>> >> are
> >> > >>> >> > only exist on maven repo. Whenever users want to submit ML
> >> jobs,
> >> > >>> they
> >> > >>> >> can
> >> > >>> >> > only depend on the ml modules and package a fat jar. This
> >> would be
> >> > >>> >> > inconvenient especially for the machine learning platforms on
> >> > which
> >> > >>> >> nearly
> >> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
> >> jar.
> >> > >>> >> >
> >> > >>> >> > Given this, it would be better to include jars of
> flink-ml-api
> >> and
> >> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly
> >> use
> >> > the
> >> > >>> >> jars
> >> > >>> >> > with the binary release. For example, users can move the jars
> >> into
> >> > >>> the
> >> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j
> only
> >> > >>> support
> >> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed
> >> in
> >> > >>> another
> >> > >>> >> > discussion.)
> >> > >>> >> >
> >> > >>> >> > Putting the jars in the `opt` folder instead of the `lib`
> >> folder
> >> > is
> >> > >>> >> because
> >> > >>> >> > currently, the ml jars are still optional for the Flink
> >> project by
> >> > >>> >> default.
> >> > >>> >> >
> >> > >>> >> > What do you think? Welcome any feedback!
> >> > >>> >> >
> >> > >>> >> > Best,
> >> > >>> >> >
> >> > >>> >> > Hequn
> >> > >>> >> >
> >> > >>> >> > [1]
> >> > >>> >> >
> >> > >>> >> >
> >> > >>> >>
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >
> >> > >>>
> >> > >>
> >> >
> >>
> >
>

Reply via email to