Hi Till, I do agree with your point, so much so so that at the time being I'd suggest to keeping these additions as optional, up to the end-user to opt-in. Adding them by default would effectively be an addition to the DataSet API (despite being separated at a source file level). I think your solution works best right now. We can proceed on this path unless we see more interest around support for case-style functions.
On Tue, Feb 9, 2016 at 7:31 PM, Till Rohrmann <trohrm...@apache.org> wrote: > What we could do is to add the implicit class to the package object of > org.apache.flink.api.scala. Since one always has to import this package in > order to have the proper TypeInformations, you wouldn’t have to import the > extension explicitly. > > The thing with the API is that you don’t want to break it too often. Once > people are used to it and have code implemented it always entails rewriting > the code if you change the API in a breaking manner. This can be really > annoying for users. > > Cheers, > Till > > > On Tue, Feb 9, 2016 at 2:51 PM, Stefano Baghino < > stefano.bagh...@radicalbit.io> wrote: > > > I agree with you, but I acknowledge that there may be concerns regarding > > the stability of the API. Perhaps the rationale behind the proposal of > > Stephan and Till is to provide it as an extension to test how the > > developers feel about it. It would be ideal to have a larger feedback > from > > the community. However I have to admit I like the approach. > > > > On Tue, Feb 9, 2016 at 2:43 PM, Theodore Vasiloudis < > > theodoros.vasilou...@gmail.com> wrote: > > > > > Thanks for bringing this up Stefano, it would a very welcome addition > > > indeed. > > > > > > I like the approach of having extensions through implicits as well. > IMHO > > > though this should be the default > > > behavior, without the need to add another import. > > > > > > On Tue, Feb 9, 2016 at 1:29 PM, Stefano Baghino < > > > stefano.bagh...@radicalbit.io> wrote: > > > > > > > I see, thanks for the tip! I'll work on it; meanwhile, I've added > some > > > > functions and Scaladoc: > > > > > > > > > > > > > > https://github.com/radicalbit/flink/blob/1159-implicit/flink-scala/src/main/scala/org/apache/flink/api/scala/extensions/package.scala > > > > > > > > On Tue, Feb 9, 2016 at 12:01 PM, Till Rohrmann <trohrm...@apache.org > > > > > > wrote: > > > > > > > > > Not the DataSet but the JoinDataSet and the CoGroupDataSet do in > the > > > form > > > > > of an apply function. > > > > > > > > > > > > > > > On Tue, Feb 9, 2016 at 11:09 AM, Stefano Baghino < > > > > > stefano.bagh...@radicalbit.io> wrote: > > > > > > > > > > > Sure, it was just a draft. I agree that filter and mapPartition > > make > > > > > sense, > > > > > > but coGroup and join don't look like they take a function. > > > > > > > > > > > > On Tue, Feb 9, 2016 at 10:08 AM, Till Rohrmann < > > trohrm...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > This looks like a good design to me :-) The only thing is that > it > > > is > > > > > not > > > > > > > complete. For example, the filter, mapPartition, coGroup and > join > > > > > > functions > > > > > > > are missing. > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 9, 2016 at 1:18 AM, Stefano Baghino < > > > > > > > stefano.bagh...@radicalbit.io> wrote: > > > > > > > > > > > > > > > What do you think of something like this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/radicalbit/flink/commit/21a889a437875c88921c93e87d88a378c6b4299e > > > > > > > > > > > > > > > > In this way, several extensions can be collected in this > > package > > > > > object > > > > > > > and > > > > > > > > picked altogether or a-là-carte (e.g. import > > > > > > > > > org.apache.flink.api.scala.extensions.AcceptPartialFunctions). > > > > > > > > > > > > > > > > On Mon, Feb 8, 2016 at 2:51 PM, Till Rohrmann < > > > > trohrm...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > I like the idea to support partial functions with Flink’s > > Scala > > > > > API. > > > > > > > > > However, I think that breaking the API and making it > > > inconsistent > > > > > > with > > > > > > > > > respect to the Java API is not the best option. I would > > rather > > > be > > > > > in > > > > > > > > favour > > > > > > > > > of the first proposal where we add a new method xxxWith via > > > > > implicit > > > > > > > > > conversions. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 7, 2016 at 12:44 PM, Stefano Baghino < > > > > > > > > > stefano.bagh...@radicalbit.io> wrote: > > > > > > > > > > > > > > > > > > > It took me a little time but I was able to put together > > some > > > > > code. > > > > > > > > > > > > > > > > > > > > In this commit I just added a few methods renamed to > > prevent > > > > > > > > overloading, > > > > > > > > > > thus usable with PartialFunction instead of functions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/radicalbit/flink/commit/aacd59e0ce98cccb66d48a30d07990ac8f345748 > > > > > > > > > > > > > > > > > > > > In this other commit I coded the original proposal, > > renaming > > > > the > > > > > > > > methods > > > > > > > > > to > > > > > > > > > > obtain the same effect as before, but with lower friction > > for > > > > > Scala > > > > > > > > > > developers (and provided some usage examples): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/radicalbit/flink/commit/33403878eebba70def42f73a1cb671d13b1521b5 > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2016 at 6:16 PM, Stefano Baghino < > > > > > > > > > > stefano.bagh...@radicalbit.io> wrote: > > > > > > > > > > > > > > > > > > > > > Hi Stephan, > > > > > > > > > > > > > > > > > > > > > > thank you for the quick reply and for your feedback; I > > > agree > > > > > with > > > > > > > you > > > > > > > > > > that > > > > > > > > > > > breaking changes have to taken very seriously. > > > > > > > > > > > > > > > > > > > > > > The rationale behind my proposal is that Scala users > are > > > > > already > > > > > > > > > > > accustomed to higher-order functions that manipulate > > > > > collections > > > > > > > and > > > > > > > > it > > > > > > > > > > > would beneficial for them to have an API that tries to > > > adhere > > > > > as > > > > > > > much > > > > > > > > > as > > > > > > > > > > > possible to the interface provided by the Scala > > Collections > > > > > API. > > > > > > > IMHO > > > > > > > > > > being > > > > > > > > > > > able to manipulate a DataSet or DataStream like a Scala > > > > > > collection > > > > > > > > > > > idiomatically would appeal to developers and reduce the > > > > > friction > > > > > > > for > > > > > > > > > them > > > > > > > > > > > to learn Flink. > > > > > > > > > > > > > > > > > > > > > > If we want to pursue the renaming path, I think these > > > changes > > > > > > (and > > > > > > > > > > porting > > > > > > > > > > > the rest of the codebase, like `flink-ml` and > > > > `flink-contrib`, > > > > > to > > > > > > > the > > > > > > > > > new > > > > > > > > > > > method names) can be done in relatively little time. > > Since > > > > > Flink > > > > > > is > > > > > > > > > > > approaching a major release, I think it's a good time > to > > > > > consider > > > > > > > > this > > > > > > > > > > > change, if the community deems it relevant. > > > > > > > > > > > > > > > > > > > > > > While we await for feedback on the proposal, I can > start > > > > > working > > > > > > on > > > > > > > > > both > > > > > > > > > > > paths to see how it would affect the codebase, what do > > you > > > > > think? > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2016 at 2:14 PM, Stephan Ewen < > > > > > se...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> Hi! > > > > > > > > > > >> > > > > > > > > > > >> Would be nice to support that, agreed. > > > > > > > > > > >> > > > > > > > > > > >> Such a fundamental break in the API worries me a bit, > > > though > > > > > - I > > > > > > > > would > > > > > > > > > > opt > > > > > > > > > > >> for a non-breaking addition. > > > > > > > > > > >> Wrapping the RichFunctions into Scala functions (which > > are > > > > > > > actually > > > > > > > > > > >> wrapped > > > > > > > > > > >> as rich functions) with implicits seems like a > > workaround > > > > for > > > > > > > > > something > > > > > > > > > > >> that should be very simple. Would probably also cost a > > bit > > > > of > > > > > > > > > > performance. > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> I like the idea of "mapWith(...)" - if that were a > > simple > > > > non > > > > > > > > > overloaded > > > > > > > > > > >> function accepting a Scala function, it should accept > > > > > case-style > > > > > > > > > > >> functions, > > > > > > > > > > >> right? > > > > > > > > > > >> Simply adding that would probably solve things, but > add > > a > > > > > second > > > > > > > > > variant > > > > > > > > > > >> of > > > > > > > > > > >> each function to the DataSet. An implicit conversion > > from > > > > > > DataSet > > > > > > > to > > > > > > > > > > >> DataSetExtended (which implements the mapWith, > > reduceWith, > > > > > ...) > > > > > > > > > methods > > > > > > > > > > >> could help there... > > > > > > > > > > >> > > > > > > > > > > >> What do you think? > > > > > > > > > > >> > > > > > > > > > > >> Greetings, > > > > > > > > > > >> Stephan > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> On Thu, Jan 28, 2016 at 2:05 PM, Stefano Baghino < > > > > > > > > > > >> stefano.bagh...@radicalbit.io> wrote: > > > > > > > > > > >> > > > > > > > > > > >> > Hello everybody, > > > > > > > > > > >> > > > > > > > > > > > >> > as I'm getting familiar with Flink I've found a > > possible > > > > > > > > improvement > > > > > > > > > > to > > > > > > > > > > >> the > > > > > > > > > > >> > Scala APIs: in Scala it's a common pattern to > perform > > > > tuple > > > > > > > > > extraction > > > > > > > > > > >> > using pattern matching, making functions working on > > > tuples > > > > > > more > > > > > > > > > > >> readable, > > > > > > > > > > >> > like this: > > > > > > > > > > >> > > > > > > > > > > > >> > // referring to the mail count example in the > training > > > > > > > > > > >> > // assuming `mails` is a DataSet[(String, String)] > > > > > > > > > > >> > // a pair of date and a string with username and > email > > > > > > > > > > >> > val monthsAndEmails = > > > > > > > > > > >> > mails.map { > > > > > > > > > > >> > case (date, sender) => > > > > > > > > > > >> > (extractMonth(date), extractEmail(sender)) > > > > > > > > > > >> > } > > > > > > > > > > >> > > > > > > > > > > > >> > However, this is not possible when using the Scala > > APIs > > > > > > because > > > > > > > of > > > > > > > > > the > > > > > > > > > > >> > overloading of the `map` function in the `DataSet` > and > > > > > > > > `DataStream` > > > > > > > > > > >> classes > > > > > > > > > > >> > (along with other higher-order function such as > > > `flatMap` > > > > > and > > > > > > > > > > >> `filter`). My > > > > > > > > > > >> > understanding is that the main reason to have two > > > > different > > > > > > > > > overloaded > > > > > > > > > > >> > functions is to provide support for `RichFunction`s. > > > > > > > > > > >> > I've found out there has been some interest around > the > > > > issue > > > > > > in > > > > > > > > the > > > > > > > > > > >> past ( > > > > > > > > > > >> > [FLINK-1159] < > > > > > > https://issues.apache.org/jira/browse/FLINK-1159 > > > > > > > >). > > > > > > > > > > >> > In the past couple of days me and my colleague > Andrea > > > have > > > > > > tried > > > > > > > > > > several > > > > > > > > > > >> > ways to address the problem, coming to two possible > > > > > solutions: > > > > > > > > > > >> > > > > > > > > > > > >> > 1. don't overload and use different names, e.g. > > `map` > > > > > > taking > > > > > > > a > > > > > > > > > > Scala > > > > > > > > > > >> > function and `mapWith` taking a Flink MapFunction > > > > > > > > > > >> > 2. keep only the method taking a Scala function > > > (which > > > > > > would > > > > > > > be > > > > > > > > > > more > > > > > > > > > > >> > idiomatic from a Scala perspective, IMHO) and > > > providing > > > > > an > > > > > > > > > implicit > > > > > > > > > > >> > conversion from the Flink function to the Scala > > > > function > > > > > > > within > > > > > > > > > the > > > > > > > > > > >> > `org.apache.flink.api.scala` package object > > > > > > > > > > >> > > > > > > > > > > > >> > We've also evaluated several other approaches using > > > union > > > > > > types > > > > > > > > and > > > > > > > > > > type > > > > > > > > > > >> > classes but we've found them to be too complex. > > > Regarding > > > > > the > > > > > > > two > > > > > > > > > > >> > approaches I've cited, the first would imply a > > breaking > > > > > change > > > > > > > to > > > > > > > > > the > > > > > > > > > > >> APIs, > > > > > > > > > > >> > while the second is giving me a hard time at > figuring > > > out > > > > > some > > > > > > > > > > >> compilation > > > > > > > > > > >> > errors in `flink-libraries` and `flink-contrib` and > as > > > we > > > > > > tested > > > > > > > > it > > > > > > > > > we > > > > > > > > > > >> > found out `RichMapFunction`s lose state (possibly > > > because > > > > of > > > > > > the > > > > > > > > > > double > > > > > > > > > > >> > conversion, first to a Scala function, then to a > > simple > > > > > > > > > > `MapFunction`). > > > > > > > > > > >> > > > > > > > > > > > >> > You can have a look at the code I've written so far > > here > > > > > > (last 2 > > > > > > > > > > >> commits): > > > > > > > > > > >> > https://github.com/radicalbit/flink/commits/1159 > > > > > > > > > > >> > > > > > > > > > > > >> > We had a little exchange of ideas and thought that > the > > > > first > > > > > > > > > solution > > > > > > > > > > >> would > > > > > > > > > > >> > be easier and also interesting from the standpoint > of > > > the > > > > > > > > ergonomics > > > > > > > > > > of > > > > > > > > > > >> the > > > > > > > > > > >> > API (e.g. `line mapWith new LineSplitter`) and would > > > like > > > > to > > > > > > > > gather > > > > > > > > > > some > > > > > > > > > > >> > feedback on the feasibility of this change. > > > > > > > > > > >> > > > > > > > > > > > >> > Would this still be regarded as a relevant > > improvement? > > > > What > > > > > > do > > > > > > > > you > > > > > > > > > > >> think > > > > > > > > > > >> > about it? Do you think there's time to work on them > > > before > > > > > the > > > > > > > 1.0 > > > > > > > > > > >> release? > > > > > > > > > > >> > What do you think about introducing breaking changes > > to > > > > make > > > > > > > this > > > > > > > > > > >> pattern > > > > > > > > > > >> > available to Scala users? > > > > > > > > > > >> > > > > > > > > > > > >> > Thank you all in advance for your feedback. > > > > > > > > > > >> > > > > > > > > > > > >> > -- > > > > > > > > > > >> > BR, > > > > > > > > > > >> > Stefano Baghino > > > > > > > > > > >> > > > > > > > > > > > >> > Software Engineer @ Radicalbit > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > BR, > > > > > > > > > > > Stefano Baghino > > > > > > > > > > > > > > > > > > > > > > Software Engineer @ Radicalbit > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > BR, > > > > > > > > > > Stefano Baghino > > > > > > > > > > > > > > > > > > > > Software Engineer @ Radicalbit > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > BR, > > > > > > > > Stefano Baghino > > > > > > > > > > > > > > > > Software Engineer @ Radicalbit > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > BR, > > > > > > Stefano Baghino > > > > > > > > > > > > Software Engineer @ Radicalbit > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > BR, > > > > Stefano Baghino > > > > > > > > Software Engineer @ Radicalbit > > > > > > > > > > > > > > > -- > > BR, > > Stefano Baghino > > > > Software Engineer @ Radicalbit > > > -- BR, Stefano Baghino Software Engineer @ Radicalbit