It took me a little time but I was able to put together some code.

In this commit I just added a few methods renamed to prevent overloading,
thus usable with PartialFunction instead of functions:
https://github.com/radicalbit/flink/commit/aacd59e0ce98cccb66d48a30d07990ac8f345748

In this other commit I coded the original proposal, renaming the methods to
obtain the same effect as before, but with lower friction for Scala
developers (and provided some usage examples):
https://github.com/radicalbit/flink/commit/33403878eebba70def42f73a1cb671d13b1521b5

On Thu, Jan 28, 2016 at 6:16 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:

> Hi Stephan,
>
> thank you for the quick reply and for your feedback; I agree with you that
> breaking changes have to taken very seriously.
>
> The rationale behind my proposal is that Scala users are already
> accustomed to higher-order functions that manipulate collections and it
> would beneficial for them to have an API that tries to adhere as much as
> possible to the interface provided by the Scala Collections API. IMHO being
> able to manipulate a DataSet or DataStream like a Scala collection
> idiomatically would appeal to developers and reduce the friction for them
> to learn Flink.
>
> If we want to pursue the renaming path, I think these changes (and porting
> the rest of the codebase, like `flink-ml` and `flink-contrib`, to the new
> method names) can be done in relatively little time. Since Flink is
> approaching a major release, I think it's a good time to consider this
> change, if the community deems it relevant.
>
> While we await for feedback on the proposal, I can start working on both
> paths to see how it would affect the codebase, what do you think?
>
> On Thu, Jan 28, 2016 at 2:14 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi!
>>
>> Would be nice to support that, agreed.
>>
>> Such a fundamental break in the API worries me a bit, though - I would opt
>> for a non-breaking addition.
>> Wrapping the RichFunctions into Scala functions (which are actually
>> wrapped
>> as rich functions) with implicits seems like a workaround for something
>> that should be very simple. Would probably also cost a bit of performance.
>>
>>
>> I like the idea of "mapWith(...)" - if that were a simple non overloaded
>> function accepting a Scala function, it should accept case-style
>> functions,
>> right?
>> Simply adding that would probably solve things, but add a second variant
>> of
>> each function to the DataSet. An implicit conversion from DataSet to
>> DataSetExtended (which implements the mapWith, reduceWith, ...) methods
>> could help there...
>>
>> What do you think?
>>
>> Greetings,
>> Stephan
>>
>>
>> On Thu, Jan 28, 2016 at 2:05 PM, Stefano Baghino <
>> stefano.bagh...@radicalbit.io> wrote:
>>
>> > Hello everybody,
>> >
>> > as I'm getting familiar with Flink I've found a possible improvement to
>> the
>> > Scala APIs: in Scala it's a common pattern to perform tuple extraction
>> > using pattern matching, making functions working on tuples more
>> readable,
>> > like this:
>> >
>> > // referring to the mail count example in the training
>> > // assuming `mails` is a DataSet[(String, String)]
>> > // a pair of date and a string with username and email
>> > val monthsAndEmails =
>> >   mails.map {
>> >     case (date, sender) =>
>> >       (extractMonth(date), extractEmail(sender))
>> >   }
>> >
>> > However, this is not possible when using the Scala APIs because of the
>> > overloading of the `map` function in the `DataSet` and `DataStream`
>> classes
>> > (along with other higher-order function such as `flatMap` and
>> `filter`). My
>> > understanding is that the main reason to have two different overloaded
>> > functions is to provide support for `RichFunction`s.
>> > I've found out there has been some interest around the issue in the
>> past (
>> > [FLINK-1159] <https://issues.apache.org/jira/browse/FLINK-1159>).
>> > In the past couple of days me and my colleague Andrea have tried several
>> > ways to address the problem, coming to two possible solutions:
>> >
>> >    1. don't overload and use different names, e.g. `map` taking a Scala
>> >    function and `mapWith` taking a Flink MapFunction
>> >    2. keep only the method taking a Scala function (which would be more
>> >    idiomatic from a Scala perspective, IMHO) and providing an implicit
>> >    conversion from the Flink function to the Scala function within the
>> >    `org.apache.flink.api.scala` package object
>> >
>> > We've also evaluated several other approaches using union types and type
>> > classes but we've found them to be too complex. Regarding the two
>> > approaches I've cited, the first would imply a breaking change to the
>> APIs,
>> > while the second is giving me a hard time at figuring out some
>> compilation
>> > errors in `flink-libraries` and `flink-contrib` and as we tested it we
>> > found out `RichMapFunction`s lose state (possibly because of the double
>> > conversion, first to a Scala function, then to a simple `MapFunction`).
>> >
>> > You can have a look at the code I've written so far here (last 2
>> commits):
>> > https://github.com/radicalbit/flink/commits/1159
>> >
>> > We had a little exchange of ideas and thought that the first solution
>> would
>> > be easier and also interesting from the standpoint of the ergonomics of
>> the
>> > API (e.g. `line mapWith new LineSplitter`) and would like to gather some
>> > feedback on the feasibility of this change.
>> >
>> > Would this still be regarded as a relevant improvement? What do you
>> think
>> > about it? Do you think there's time to work on them before the 1.0
>> release?
>> > What do you think about introducing breaking changes to make this
>> pattern
>> > available to Scala users?
>> >
>> > Thank you all in advance for your feedback.
>> >
>> > --
>> > BR,
>> > Stefano Baghino
>> >
>> > Software Engineer @ Radicalbit
>> >
>>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply via email to