What we could do is to add the implicit class to the package object of
org.apache.flink.api.scala. Since one always has to import this package in
order to have the proper TypeInformations, you wouldn’t have to import the
extension explicitly.

The thing with the API is that you don’t want to break it too often. Once
people are used to it and have code implemented it always entails rewriting
the code if you change the API in a breaking manner. This can be really
annoying for users.

Cheers,
Till
​

On Tue, Feb 9, 2016 at 2:51 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:

> I agree with you, but I acknowledge that there may be concerns regarding
> the stability of the API. Perhaps the rationale behind the proposal of
> Stephan and Till is to provide it as an extension to test how the
> developers feel about it. It would be ideal to have a larger feedback from
> the community. However I have to admit I like the approach.
>
> On Tue, Feb 9, 2016 at 2:43 PM, Theodore Vasiloudis <
> theodoros.vasilou...@gmail.com> wrote:
>
> > Thanks for bringing this up Stefano, it would a very welcome addition
> > indeed.
> >
> > I like the approach of having extensions through implicits as well. IMHO
> > though this should be the default
> > behavior, without the need to add another import.
> >
> > On Tue, Feb 9, 2016 at 1:29 PM, Stefano Baghino <
> > stefano.bagh...@radicalbit.io> wrote:
> >
> > > I see, thanks for the tip! I'll work on it; meanwhile, I've added some
> > > functions and Scaladoc:
> > >
> > >
> >
> https://github.com/radicalbit/flink/blob/1159-implicit/flink-scala/src/main/scala/org/apache/flink/api/scala/extensions/package.scala
> > >
> > > On Tue, Feb 9, 2016 at 12:01 PM, Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > >
> > > > Not the DataSet but the JoinDataSet and the CoGroupDataSet do in the
> > form
> > > > of an apply function.
> > > > ​
> > > >
> > > > On Tue, Feb 9, 2016 at 11:09 AM, Stefano Baghino <
> > > > stefano.bagh...@radicalbit.io> wrote:
> > > >
> > > > > Sure, it was just a draft. I agree that filter and mapPartition
> make
> > > > sense,
> > > > > but coGroup and join don't look like they take a function.
> > > > >
> > > > > On Tue, Feb 9, 2016 at 10:08 AM, Till Rohrmann <
> trohrm...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > This looks like a good design to me :-) The only thing is that it
> > is
> > > > not
> > > > > > complete. For example, the filter, mapPartition, coGroup and join
> > > > > functions
> > > > > > are missing.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > > ​
> > > > > >
> > > > > > On Tue, Feb 9, 2016 at 1:18 AM, Stefano Baghino <
> > > > > > stefano.bagh...@radicalbit.io> wrote:
> > > > > >
> > > > > > > What do you think of something like this?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/radicalbit/flink/commit/21a889a437875c88921c93e87d88a378c6b4299e
> > > > > > >
> > > > > > > In this way, several extensions can be collected in this
> package
> > > > object
> > > > > > and
> > > > > > > picked altogether or a-là-carte (e.g. import
> > > > > > > org.apache.flink.api.scala.extensions.AcceptPartialFunctions).
> > > > > > >
> > > > > > > On Mon, Feb 8, 2016 at 2:51 PM, Till Rohrmann <
> > > trohrm...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I like the idea to support partial functions with Flink’s
> Scala
> > > > API.
> > > > > > > > However, I think that breaking the API and making it
> > inconsistent
> > > > > with
> > > > > > > > respect to the Java API is not the best option. I would
> rather
> > be
> > > > in
> > > > > > > favour
> > > > > > > > of the first proposal where we add a new method xxxWith via
> > > > implicit
> > > > > > > > conversions.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > > ​
> > > > > > > >
> > > > > > > > On Sun, Feb 7, 2016 at 12:44 PM, Stefano Baghino <
> > > > > > > > stefano.bagh...@radicalbit.io> wrote:
> > > > > > > >
> > > > > > > > > It took me a little time but I was able to put together
> some
> > > > code.
> > > > > > > > >
> > > > > > > > > In this commit I just added a few methods renamed to
> prevent
> > > > > > > overloading,
> > > > > > > > > thus usable with PartialFunction instead of functions:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/radicalbit/flink/commit/aacd59e0ce98cccb66d48a30d07990ac8f345748
> > > > > > > > >
> > > > > > > > > In this other commit I coded the original proposal,
> renaming
> > > the
> > > > > > > methods
> > > > > > > > to
> > > > > > > > > obtain the same effect as before, but with lower friction
> for
> > > > Scala
> > > > > > > > > developers (and provided some usage examples):
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/radicalbit/flink/commit/33403878eebba70def42f73a1cb671d13b1521b5
> > > > > > > > >
> > > > > > > > > On Thu, Jan 28, 2016 at 6:16 PM, Stefano Baghino <
> > > > > > > > > stefano.bagh...@radicalbit.io> wrote:
> > > > > > > > >
> > > > > > > > > > Hi Stephan,
> > > > > > > > > >
> > > > > > > > > > thank you for the quick reply and for your feedback; I
> > agree
> > > > with
> > > > > > you
> > > > > > > > > that
> > > > > > > > > > breaking changes have to taken very seriously.
> > > > > > > > > >
> > > > > > > > > > The rationale behind my proposal is that Scala users are
> > > > already
> > > > > > > > > > accustomed to higher-order functions that manipulate
> > > > collections
> > > > > > and
> > > > > > > it
> > > > > > > > > > would beneficial for them to have an API that tries to
> > adhere
> > > > as
> > > > > > much
> > > > > > > > as
> > > > > > > > > > possible to the interface provided by the Scala
> Collections
> > > > API.
> > > > > > IMHO
> > > > > > > > > being
> > > > > > > > > > able to manipulate a DataSet or DataStream like a Scala
> > > > > collection
> > > > > > > > > > idiomatically would appeal to developers and reduce the
> > > > friction
> > > > > > for
> > > > > > > > them
> > > > > > > > > > to learn Flink.
> > > > > > > > > >
> > > > > > > > > > If we want to pursue the renaming path, I think these
> > changes
> > > > > (and
> > > > > > > > > porting
> > > > > > > > > > the rest of the codebase, like `flink-ml` and
> > > `flink-contrib`,
> > > > to
> > > > > > the
> > > > > > > > new
> > > > > > > > > > method names) can be done in relatively little time.
> Since
> > > > Flink
> > > > > is
> > > > > > > > > > approaching a major release, I think it's a good time to
> > > > consider
> > > > > > > this
> > > > > > > > > > change, if the community deems it relevant.
> > > > > > > > > >
> > > > > > > > > > While we await for feedback on the proposal, I can start
> > > > working
> > > > > on
> > > > > > > > both
> > > > > > > > > > paths to see how it would affect the codebase, what do
> you
> > > > think?
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 28, 2016 at 2:14 PM, Stephan Ewen <
> > > > se...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi!
> > > > > > > > > >>
> > > > > > > > > >> Would be nice to support that, agreed.
> > > > > > > > > >>
> > > > > > > > > >> Such a fundamental break in the API worries me a bit,
> > though
> > > > - I
> > > > > > > would
> > > > > > > > > opt
> > > > > > > > > >> for a non-breaking addition.
> > > > > > > > > >> Wrapping the RichFunctions into Scala functions (which
> are
> > > > > > actually
> > > > > > > > > >> wrapped
> > > > > > > > > >> as rich functions) with implicits seems like a
> workaround
> > > for
> > > > > > > > something
> > > > > > > > > >> that should be very simple. Would probably also cost a
> bit
> > > of
> > > > > > > > > performance.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> I like the idea of "mapWith(...)" - if that were a
> simple
> > > non
> > > > > > > > overloaded
> > > > > > > > > >> function accepting a Scala function, it should accept
> > > > case-style
> > > > > > > > > >> functions,
> > > > > > > > > >> right?
> > > > > > > > > >> Simply adding that would probably solve things, but add
> a
> > > > second
> > > > > > > > variant
> > > > > > > > > >> of
> > > > > > > > > >> each function to the DataSet. An implicit conversion
> from
> > > > > DataSet
> > > > > > to
> > > > > > > > > >> DataSetExtended (which implements the mapWith,
> reduceWith,
> > > > ...)
> > > > > > > > methods
> > > > > > > > > >> could help there...
> > > > > > > > > >>
> > > > > > > > > >> What do you think?
> > > > > > > > > >>
> > > > > > > > > >> Greetings,
> > > > > > > > > >> Stephan
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Jan 28, 2016 at 2:05 PM, Stefano Baghino <
> > > > > > > > > >> stefano.bagh...@radicalbit.io> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hello everybody,
> > > > > > > > > >> >
> > > > > > > > > >> > as I'm getting familiar with Flink I've found a
> possible
> > > > > > > improvement
> > > > > > > > > to
> > > > > > > > > >> the
> > > > > > > > > >> > Scala APIs: in Scala it's a common pattern to perform
> > > tuple
> > > > > > > > extraction
> > > > > > > > > >> > using pattern matching, making functions working on
> > tuples
> > > > > more
> > > > > > > > > >> readable,
> > > > > > > > > >> > like this:
> > > > > > > > > >> >
> > > > > > > > > >> > // referring to the mail count example in the training
> > > > > > > > > >> > // assuming `mails` is a DataSet[(String, String)]
> > > > > > > > > >> > // a pair of date and a string with username and email
> > > > > > > > > >> > val monthsAndEmails =
> > > > > > > > > >> >   mails.map {
> > > > > > > > > >> >     case (date, sender) =>
> > > > > > > > > >> >       (extractMonth(date), extractEmail(sender))
> > > > > > > > > >> >   }
> > > > > > > > > >> >
> > > > > > > > > >> > However, this is not possible when using the Scala
> APIs
> > > > > because
> > > > > > of
> > > > > > > > the
> > > > > > > > > >> > overloading of the `map` function in the `DataSet` and
> > > > > > > `DataStream`
> > > > > > > > > >> classes
> > > > > > > > > >> > (along with other higher-order function such as
> > `flatMap`
> > > > and
> > > > > > > > > >> `filter`). My
> > > > > > > > > >> > understanding is that the main reason to have two
> > > different
> > > > > > > > overloaded
> > > > > > > > > >> > functions is to provide support for `RichFunction`s.
> > > > > > > > > >> > I've found out there has been some interest around the
> > > issue
> > > > > in
> > > > > > > the
> > > > > > > > > >> past (
> > > > > > > > > >> > [FLINK-1159] <
> > > > > https://issues.apache.org/jira/browse/FLINK-1159
> > > > > > >).
> > > > > > > > > >> > In the past couple of days me and my colleague Andrea
> > have
> > > > > tried
> > > > > > > > > several
> > > > > > > > > >> > ways to address the problem, coming to two possible
> > > > solutions:
> > > > > > > > > >> >
> > > > > > > > > >> >    1. don't overload and use different names, e.g.
> `map`
> > > > > taking
> > > > > > a
> > > > > > > > > Scala
> > > > > > > > > >> >    function and `mapWith` taking a Flink MapFunction
> > > > > > > > > >> >    2. keep only the method taking a Scala function
> > (which
> > > > > would
> > > > > > be
> > > > > > > > > more
> > > > > > > > > >> >    idiomatic from a Scala perspective, IMHO) and
> > providing
> > > > an
> > > > > > > > implicit
> > > > > > > > > >> >    conversion from the Flink function to the Scala
> > > function
> > > > > > within
> > > > > > > > the
> > > > > > > > > >> >    `org.apache.flink.api.scala` package object
> > > > > > > > > >> >
> > > > > > > > > >> > We've also evaluated several other approaches using
> > union
> > > > > types
> > > > > > > and
> > > > > > > > > type
> > > > > > > > > >> > classes but we've found them to be too complex.
> > Regarding
> > > > the
> > > > > > two
> > > > > > > > > >> > approaches I've cited, the first would imply a
> breaking
> > > > change
> > > > > > to
> > > > > > > > the
> > > > > > > > > >> APIs,
> > > > > > > > > >> > while the second is giving me a hard time at figuring
> > out
> > > > some
> > > > > > > > > >> compilation
> > > > > > > > > >> > errors in `flink-libraries` and `flink-contrib` and as
> > we
> > > > > tested
> > > > > > > it
> > > > > > > > we
> > > > > > > > > >> > found out `RichMapFunction`s lose state (possibly
> > because
> > > of
> > > > > the
> > > > > > > > > double
> > > > > > > > > >> > conversion, first to a Scala function, then to a
> simple
> > > > > > > > > `MapFunction`).
> > > > > > > > > >> >
> > > > > > > > > >> > You can have a look at the code I've written so far
> here
> > > > > (last 2
> > > > > > > > > >> commits):
> > > > > > > > > >> > https://github.com/radicalbit/flink/commits/1159
> > > > > > > > > >> >
> > > > > > > > > >> > We had a little exchange of ideas and thought that the
> > > first
> > > > > > > > solution
> > > > > > > > > >> would
> > > > > > > > > >> > be easier and also interesting from the standpoint of
> > the
> > > > > > > ergonomics
> > > > > > > > > of
> > > > > > > > > >> the
> > > > > > > > > >> > API (e.g. `line mapWith new LineSplitter`) and would
> > like
> > > to
> > > > > > > gather
> > > > > > > > > some
> > > > > > > > > >> > feedback on the feasibility of this change.
> > > > > > > > > >> >
> > > > > > > > > >> > Would this still be regarded as a relevant
> improvement?
> > > What
> > > > > do
> > > > > > > you
> > > > > > > > > >> think
> > > > > > > > > >> > about it? Do you think there's time to work on them
> > before
> > > > the
> > > > > > 1.0
> > > > > > > > > >> release?
> > > > > > > > > >> > What do you think about introducing breaking changes
> to
> > > make
> > > > > > this
> > > > > > > > > >> pattern
> > > > > > > > > >> > available to Scala users?
> > > > > > > > > >> >
> > > > > > > > > >> > Thank you all in advance for your feedback.
> > > > > > > > > >> >
> > > > > > > > > >> > --
> > > > > > > > > >> > BR,
> > > > > > > > > >> > Stefano Baghino
> > > > > > > > > >> >
> > > > > > > > > >> > Software Engineer @ Radicalbit
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > BR,
> > > > > > > > > > Stefano Baghino
> > > > > > > > > >
> > > > > > > > > > Software Engineer @ Radicalbit
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > BR,
> > > > > > > > > Stefano Baghino
> > > > > > > > >
> > > > > > > > > Software Engineer @ Radicalbit
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > BR,
> > > > > > > Stefano Baghino
> > > > > > >
> > > > > > > Software Engineer @ Radicalbit
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Stefano Baghino
> > > > >
> > > > > Software Engineer @ Radicalbit
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > BR,
> > > Stefano Baghino
> > >
> > > Software Engineer @ Radicalbit
> > >
> >
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>

Reply via email to