I agree with you, but I acknowledge that there may be concerns regarding
the stability of the API. Perhaps the rationale behind the proposal of
Stephan and Till is to provide it as an extension to test how the
developers feel about it. It would be ideal to have a larger feedback from
the community. However I have to admit I like the approach.

On Tue, Feb 9, 2016 at 2:43 PM, Theodore Vasiloudis <
theodoros.vasilou...@gmail.com> wrote:

> Thanks for bringing this up Stefano, it would a very welcome addition
> indeed.
>
> I like the approach of having extensions through implicits as well. IMHO
> though this should be the default
> behavior, without the need to add another import.
>
> On Tue, Feb 9, 2016 at 1:29 PM, Stefano Baghino <
> stefano.bagh...@radicalbit.io> wrote:
>
> > I see, thanks for the tip! I'll work on it; meanwhile, I've added some
> > functions and Scaladoc:
> >
> >
> https://github.com/radicalbit/flink/blob/1159-implicit/flink-scala/src/main/scala/org/apache/flink/api/scala/extensions/package.scala
> >
> > On Tue, Feb 9, 2016 at 12:01 PM, Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > Not the DataSet but the JoinDataSet and the CoGroupDataSet do in the
> form
> > > of an apply function.
> > > ​
> > >
> > > On Tue, Feb 9, 2016 at 11:09 AM, Stefano Baghino <
> > > stefano.bagh...@radicalbit.io> wrote:
> > >
> > > > Sure, it was just a draft. I agree that filter and mapPartition make
> > > sense,
> > > > but coGroup and join don't look like they take a function.
> > > >
> > > > On Tue, Feb 9, 2016 at 10:08 AM, Till Rohrmann <trohrm...@apache.org
> >
> > > > wrote:
> > > >
> > > > > This looks like a good design to me :-) The only thing is that it
> is
> > > not
> > > > > complete. For example, the filter, mapPartition, coGroup and join
> > > > functions
> > > > > are missing.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > > ​
> > > > >
> > > > > On Tue, Feb 9, 2016 at 1:18 AM, Stefano Baghino <
> > > > > stefano.bagh...@radicalbit.io> wrote:
> > > > >
> > > > > > What do you think of something like this?
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/radicalbit/flink/commit/21a889a437875c88921c93e87d88a378c6b4299e
> > > > > >
> > > > > > In this way, several extensions can be collected in this package
> > > object
> > > > > and
> > > > > > picked altogether or a-là-carte (e.g. import
> > > > > > org.apache.flink.api.scala.extensions.AcceptPartialFunctions).
> > > > > >
> > > > > > On Mon, Feb 8, 2016 at 2:51 PM, Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I like the idea to support partial functions with Flink’s Scala
> > > API.
> > > > > > > However, I think that breaking the API and making it
> inconsistent
> > > > with
> > > > > > > respect to the Java API is not the best option. I would rather
> be
> > > in
> > > > > > favour
> > > > > > > of the first proposal where we add a new method xxxWith via
> > > implicit
> > > > > > > conversions.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > > ​
> > > > > > >
> > > > > > > On Sun, Feb 7, 2016 at 12:44 PM, Stefano Baghino <
> > > > > > > stefano.bagh...@radicalbit.io> wrote:
> > > > > > >
> > > > > > > > It took me a little time but I was able to put together some
> > > code.
> > > > > > > >
> > > > > > > > In this commit I just added a few methods renamed to prevent
> > > > > > overloading,
> > > > > > > > thus usable with PartialFunction instead of functions:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/radicalbit/flink/commit/aacd59e0ce98cccb66d48a30d07990ac8f345748
> > > > > > > >
> > > > > > > > In this other commit I coded the original proposal, renaming
> > the
> > > > > > methods
> > > > > > > to
> > > > > > > > obtain the same effect as before, but with lower friction for
> > > Scala
> > > > > > > > developers (and provided some usage examples):
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/radicalbit/flink/commit/33403878eebba70def42f73a1cb671d13b1521b5
> > > > > > > >
> > > > > > > > On Thu, Jan 28, 2016 at 6:16 PM, Stefano Baghino <
> > > > > > > > stefano.bagh...@radicalbit.io> wrote:
> > > > > > > >
> > > > > > > > > Hi Stephan,
> > > > > > > > >
> > > > > > > > > thank you for the quick reply and for your feedback; I
> agree
> > > with
> > > > > you
> > > > > > > > that
> > > > > > > > > breaking changes have to taken very seriously.
> > > > > > > > >
> > > > > > > > > The rationale behind my proposal is that Scala users are
> > > already
> > > > > > > > > accustomed to higher-order functions that manipulate
> > > collections
> > > > > and
> > > > > > it
> > > > > > > > > would beneficial for them to have an API that tries to
> adhere
> > > as
> > > > > much
> > > > > > > as
> > > > > > > > > possible to the interface provided by the Scala Collections
> > > API.
> > > > > IMHO
> > > > > > > > being
> > > > > > > > > able to manipulate a DataSet or DataStream like a Scala
> > > > collection
> > > > > > > > > idiomatically would appeal to developers and reduce the
> > > friction
> > > > > for
> > > > > > > them
> > > > > > > > > to learn Flink.
> > > > > > > > >
> > > > > > > > > If we want to pursue the renaming path, I think these
> changes
> > > > (and
> > > > > > > > porting
> > > > > > > > > the rest of the codebase, like `flink-ml` and
> > `flink-contrib`,
> > > to
> > > > > the
> > > > > > > new
> > > > > > > > > method names) can be done in relatively little time. Since
> > > Flink
> > > > is
> > > > > > > > > approaching a major release, I think it's a good time to
> > > consider
> > > > > > this
> > > > > > > > > change, if the community deems it relevant.
> > > > > > > > >
> > > > > > > > > While we await for feedback on the proposal, I can start
> > > working
> > > > on
> > > > > > > both
> > > > > > > > > paths to see how it would affect the codebase, what do you
> > > think?
> > > > > > > > >
> > > > > > > > > On Thu, Jan 28, 2016 at 2:14 PM, Stephan Ewen <
> > > se...@apache.org>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi!
> > > > > > > > >>
> > > > > > > > >> Would be nice to support that, agreed.
> > > > > > > > >>
> > > > > > > > >> Such a fundamental break in the API worries me a bit,
> though
> > > - I
> > > > > > would
> > > > > > > > opt
> > > > > > > > >> for a non-breaking addition.
> > > > > > > > >> Wrapping the RichFunctions into Scala functions (which are
> > > > > actually
> > > > > > > > >> wrapped
> > > > > > > > >> as rich functions) with implicits seems like a workaround
> > for
> > > > > > > something
> > > > > > > > >> that should be very simple. Would probably also cost a bit
> > of
> > > > > > > > performance.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> I like the idea of "mapWith(...)" - if that were a simple
> > non
> > > > > > > overloaded
> > > > > > > > >> function accepting a Scala function, it should accept
> > > case-style
> > > > > > > > >> functions,
> > > > > > > > >> right?
> > > > > > > > >> Simply adding that would probably solve things, but add a
> > > second
> > > > > > > variant
> > > > > > > > >> of
> > > > > > > > >> each function to the DataSet. An implicit conversion from
> > > > DataSet
> > > > > to
> > > > > > > > >> DataSetExtended (which implements the mapWith, reduceWith,
> > > ...)
> > > > > > > methods
> > > > > > > > >> could help there...
> > > > > > > > >>
> > > > > > > > >> What do you think?
> > > > > > > > >>
> > > > > > > > >> Greetings,
> > > > > > > > >> Stephan
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Thu, Jan 28, 2016 at 2:05 PM, Stefano Baghino <
> > > > > > > > >> stefano.bagh...@radicalbit.io> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hello everybody,
> > > > > > > > >> >
> > > > > > > > >> > as I'm getting familiar with Flink I've found a possible
> > > > > > improvement
> > > > > > > > to
> > > > > > > > >> the
> > > > > > > > >> > Scala APIs: in Scala it's a common pattern to perform
> > tuple
> > > > > > > extraction
> > > > > > > > >> > using pattern matching, making functions working on
> tuples
> > > > more
> > > > > > > > >> readable,
> > > > > > > > >> > like this:
> > > > > > > > >> >
> > > > > > > > >> > // referring to the mail count example in the training
> > > > > > > > >> > // assuming `mails` is a DataSet[(String, String)]
> > > > > > > > >> > // a pair of date and a string with username and email
> > > > > > > > >> > val monthsAndEmails =
> > > > > > > > >> >   mails.map {
> > > > > > > > >> >     case (date, sender) =>
> > > > > > > > >> >       (extractMonth(date), extractEmail(sender))
> > > > > > > > >> >   }
> > > > > > > > >> >
> > > > > > > > >> > However, this is not possible when using the Scala APIs
> > > > because
> > > > > of
> > > > > > > the
> > > > > > > > >> > overloading of the `map` function in the `DataSet` and
> > > > > > `DataStream`
> > > > > > > > >> classes
> > > > > > > > >> > (along with other higher-order function such as
> `flatMap`
> > > and
> > > > > > > > >> `filter`). My
> > > > > > > > >> > understanding is that the main reason to have two
> > different
> > > > > > > overloaded
> > > > > > > > >> > functions is to provide support for `RichFunction`s.
> > > > > > > > >> > I've found out there has been some interest around the
> > issue
> > > > in
> > > > > > the
> > > > > > > > >> past (
> > > > > > > > >> > [FLINK-1159] <
> > > > https://issues.apache.org/jira/browse/FLINK-1159
> > > > > >).
> > > > > > > > >> > In the past couple of days me and my colleague Andrea
> have
> > > > tried
> > > > > > > > several
> > > > > > > > >> > ways to address the problem, coming to two possible
> > > solutions:
> > > > > > > > >> >
> > > > > > > > >> >    1. don't overload and use different names, e.g. `map`
> > > > taking
> > > > > a
> > > > > > > > Scala
> > > > > > > > >> >    function and `mapWith` taking a Flink MapFunction
> > > > > > > > >> >    2. keep only the method taking a Scala function
> (which
> > > > would
> > > > > be
> > > > > > > > more
> > > > > > > > >> >    idiomatic from a Scala perspective, IMHO) and
> providing
> > > an
> > > > > > > implicit
> > > > > > > > >> >    conversion from the Flink function to the Scala
> > function
> > > > > within
> > > > > > > the
> > > > > > > > >> >    `org.apache.flink.api.scala` package object
> > > > > > > > >> >
> > > > > > > > >> > We've also evaluated several other approaches using
> union
> > > > types
> > > > > > and
> > > > > > > > type
> > > > > > > > >> > classes but we've found them to be too complex.
> Regarding
> > > the
> > > > > two
> > > > > > > > >> > approaches I've cited, the first would imply a breaking
> > > change
> > > > > to
> > > > > > > the
> > > > > > > > >> APIs,
> > > > > > > > >> > while the second is giving me a hard time at figuring
> out
> > > some
> > > > > > > > >> compilation
> > > > > > > > >> > errors in `flink-libraries` and `flink-contrib` and as
> we
> > > > tested
> > > > > > it
> > > > > > > we
> > > > > > > > >> > found out `RichMapFunction`s lose state (possibly
> because
> > of
> > > > the
> > > > > > > > double
> > > > > > > > >> > conversion, first to a Scala function, then to a simple
> > > > > > > > `MapFunction`).
> > > > > > > > >> >
> > > > > > > > >> > You can have a look at the code I've written so far here
> > > > (last 2
> > > > > > > > >> commits):
> > > > > > > > >> > https://github.com/radicalbit/flink/commits/1159
> > > > > > > > >> >
> > > > > > > > >> > We had a little exchange of ideas and thought that the
> > first
> > > > > > > solution
> > > > > > > > >> would
> > > > > > > > >> > be easier and also interesting from the standpoint of
> the
> > > > > > ergonomics
> > > > > > > > of
> > > > > > > > >> the
> > > > > > > > >> > API (e.g. `line mapWith new LineSplitter`) and would
> like
> > to
> > > > > > gather
> > > > > > > > some
> > > > > > > > >> > feedback on the feasibility of this change.
> > > > > > > > >> >
> > > > > > > > >> > Would this still be regarded as a relevant improvement?
> > What
> > > > do
> > > > > > you
> > > > > > > > >> think
> > > > > > > > >> > about it? Do you think there's time to work on them
> before
> > > the
> > > > > 1.0
> > > > > > > > >> release?
> > > > > > > > >> > What do you think about introducing breaking changes to
> > make
> > > > > this
> > > > > > > > >> pattern
> > > > > > > > >> > available to Scala users?
> > > > > > > > >> >
> > > > > > > > >> > Thank you all in advance for your feedback.
> > > > > > > > >> >
> > > > > > > > >> > --
> > > > > > > > >> > BR,
> > > > > > > > >> > Stefano Baghino
> > > > > > > > >> >
> > > > > > > > >> > Software Engineer @ Radicalbit
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > BR,
> > > > > > > > > Stefano Baghino
> > > > > > > > >
> > > > > > > > > Software Engineer @ Radicalbit
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > BR,
> > > > > > > > Stefano Baghino
> > > > > > > >
> > > > > > > > Software Engineer @ Radicalbit
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > BR,
> > > > > > Stefano Baghino
> > > > > >
> > > > > > Software Engineer @ Radicalbit
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Stefano Baghino
> > > >
> > > > Software Engineer @ Radicalbit
> > > >
> > >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
> >
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply via email to