Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Aljoscha Krettek
Yes, but I think we would pretty much have to do that. I don't think we can 
stop doing 2.11 releases.

> On 8. Oct 2018, at 15:37, Chesnay Schepler  wrote:
> 
> The infrastructure would only be required if we opt for releasing 2.11 and 
> 2.12 builds simultaneously, correct?
> 
> On 08.10.2018 15:04, Aljoscha Krettek wrote:
>> Breaking the API (or not breaking it but requiring explicit types when using 
>> Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
>> 
>>> On 8. Oct 2018, at 13:00, Chesnay Schepler  wrote:
>>> 
>>> And the remaining parts would only be about breaking the API?
>>> 
>>> On 08.10.2018 12:24, Aljoscha Krettek wrote:
 I have an open PR that does everything we can do for preparing the code 
 base for Scala 2.12 without breaking the API: 
 https://github.com/apache/flink/pull/6784
 
> On 8. Oct 2018, at 09:56, Chesnay Schepler  wrote:
> 
> I'd rather not maintain 2 master branches. Beyond the maintenance 
> overhead I'm
> wondering about the benefit, as the API break still has to happen at some 
> point.
> 
> @Aljoscha how much work for supporting scala 2.12 can be merged without 
> breaking the API?
> If this is the only blocker I suggest to make the breaking change in 1.8.
> 
> On 05.10.2018 10:31, Till Rohrmann wrote:
>> Thanks Aljoscha for starting this discussion. The described problem 
>> brings
>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>> API breaking because everyone who used lambdas without types needs to add
>> them now. Consequently, I only see two real options out of the ones 
>> you've
>> proposed:
>> 
>> 1) Disambiguate the API (either by removing
>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely 
>> out
>> 
>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>> problematic because then all Scala API users who have implemented a
>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>> think it will be problematic with RichGroupReduceFunction which you need 
>> to
>> get access to the RuntimeContext.
>> 
>> Maintaining two master branches puts a lot of burden onto the developers 
>> to
>> always keep the two branches in sync. Ideally I would like to avoid this.
>> 
>> I also played a little bit around with implicit conversions to add the
>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>> smoothly.
>> 
>> I'm cross posting this thread to user as well to get some more user
>> feedback.
>> 
>> Cheers,
>> Till
>> 
>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
>> wrote:
>> 
>>> The second alternative, with the addition of methods that take functions
>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>> then to maintain the *J Java parameter methods, or whether users could 
>>> just
>>> access the functionality by converting the Scala DataStreams to Java via
>>> .javaStream and whatever the equivalent is for DataSets.
>>> 
>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
>>> wrote:
>>> 
 Hi,
 
 I'm currently working on
>>> https://issues.apache.org/jira/browse/FLINK-7811,
 with the goal of adding support for Scala 2.12. There is a bit of a
>>> hurdle
 and I have to explain some context first.
 
 With Scala 2.12, lambdas are implemented using the lambda mechanism of
 Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
 means that the following two method definitions can both take a lambda:
 
 def map[R](mapper: MapFunction[T, R]): DataSet[R]
 def map[R](fun: T => R): DataSet[R]
 
 The Scala compiler gives precedence to the lambda version when you call
 map() with a lambda in simple cases, so it works here. You could still
>>> call
 map() with a lambda if the lambda version of the method weren't here
 because they are now considered the same. For Scala 2.11 we need both
 signatures, though, to allow calling with a lambda and with a
>>> MapFunction.
 The problem is with more complicated method signatures, like:
 
 def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
 DataSet[R]
 
 def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
 
 (for reference, GroupReduceFunction is a SAM with void
 reduce(java.lang.Iterable values, Collector out))
 
 These two signatures are not the same but similar enough for the Scala
 2.12 compiler to "get confused". In Scala 2.11, I could call
>>> 

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Chesnay Schepler
The infrastructure would only be required if we opt for releasing 2.11 
and 2.12 builds simultaneously, correct?


On 08.10.2018 15:04, Aljoscha Krettek wrote:

Breaking the API (or not breaking it but requiring explicit types when using 
Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.


On 8. Oct 2018, at 13:00, Chesnay Schepler  wrote:

And the remaining parts would only be about breaking the API?

On 08.10.2018 12:24, Aljoscha Krettek wrote:

I have an open PR that does everything we can do for preparing the code base 
for Scala 2.12 without breaking the API: 
https://github.com/apache/flink/pull/6784


On 8. Oct 2018, at 09:56, Chesnay Schepler  wrote:

I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
wondering about the benefit, as the API break still has to happen at some point.

@Aljoscha how much work for supporting scala 2.12 can be merged without 
breaking the API?
If this is the only blocker I suggest to make the breaking change in 1.8.

On 05.10.2018 10:31, Till Rohrmann wrote:

Thanks Aljoscha for starting this discussion. The described problem brings
us indeed a bit into a pickle. Even with option 1) I think it is somewhat
API breaking because everyone who used lambdas without types needs to add
them now. Consequently, I only see two real options out of the ones you've
proposed:

1) Disambiguate the API (either by removing
reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out

Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
problematic because then all Scala API users who have implemented a
GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
think it will be problematic with RichGroupReduceFunction which you need to
get access to the RuntimeContext.

Maintaining two master branches puts a lot of burden onto the developers to
always keep the two branches in sync. Ideally I would like to avoid this.

I also played a little bit around with implicit conversions to add the
lambda methods in Scala 2.11 on demand, but I was not able to get it work
smoothly.

I'm cross posting this thread to user as well to get some more user
feedback.

Cheers,
Till

On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
wrote:


The second alternative, with the addition of methods that take functions
with Scala types, seems the most sensible.  I wonder if there is a need
then to maintain the *J Java parameter methods, or whether users could just
access the functionality by converting the Scala DataStreams to Java via
.javaStream and whatever the equivalent is for DataSets.

On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
wrote:


Hi,

I'm currently working on

https://issues.apache.org/jira/browse/FLINK-7811,

with the goal of adding support for Scala 2.12. There is a bit of a

hurdle

and I have to explain some context first.

With Scala 2.12, lambdas are implemented using the lambda mechanism of
Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
means that the following two method definitions can both take a lambda:

def map[R](mapper: MapFunction[T, R]): DataSet[R]
def map[R](fun: T => R): DataSet[R]

The Scala compiler gives precedence to the lambda version when you call
map() with a lambda in simple cases, so it works here. You could still

call

map() with a lambda if the lambda version of the method weren't here
because they are now considered the same. For Scala 2.11 we need both
signatures, though, to allow calling with a lambda and with a

MapFunction.

The problem is with more complicated method signatures, like:

def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
DataSet[R]

def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]

(for reference, GroupReduceFunction is a SAM with void
reduce(java.lang.Iterable values, Collector out))

These two signatures are not the same but similar enough for the Scala
2.12 compiler to "get confused". In Scala 2.11, I could call

reduceGroup()

with a lambda that doesn't have parameter type definitions and things

would

be fine. With Scala 2.12 I can't do that because the compiler can't

figure

out which method to call and requires explicit type definitions on the
lambda parameters.

I see some solutions for this:

1. Keep the methods as is, this would force people to always explicitly
specify parameter types on their lambdas.

2. Rename the second method to reduceGroupJ() to signal that it takes a
user function that takes Java-style interfaces (the first parameter is
java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
disambiguates the code, users can use lambdas without specifying explicit
parameter types but breaks the API.

One effect of 2. would be that we can add a reduceGroup() method that
takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
it would allow people to implement user functions without 

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Aljoscha Krettek
Breaking the API (or not breaking it but requiring explicit types when using 
Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.

> On 8. Oct 2018, at 13:00, Chesnay Schepler  wrote:
> 
> And the remaining parts would only be about breaking the API?
> 
> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>> I have an open PR that does everything we can do for preparing the code base 
>> for Scala 2.12 without breaking the API: 
>> https://github.com/apache/flink/pull/6784
>> 
>>> On 8. Oct 2018, at 09:56, Chesnay Schepler  wrote:
>>> 
>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead 
>>> I'm
>>> wondering about the benefit, as the API break still has to happen at some 
>>> point.
>>> 
>>> @Aljoscha how much work for supporting scala 2.12 can be merged without 
>>> breaking the API?
>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>> 
>>> On 05.10.2018 10:31, Till Rohrmann wrote:
 Thanks Aljoscha for starting this discussion. The described problem brings
 us indeed a bit into a pickle. Even with option 1) I think it is somewhat
 API breaking because everyone who used lambdas without types needs to add
 them now. Consequently, I only see two real options out of the ones you've
 proposed:
 
 1) Disambiguate the API (either by removing
 reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely 
 out
 
 Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
 problematic because then all Scala API users who have implemented a
 GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
 think it will be problematic with RichGroupReduceFunction which you need to
 get access to the RuntimeContext.
 
 Maintaining two master branches puts a lot of burden onto the developers to
 always keep the two branches in sync. Ideally I would like to avoid this.
 
 I also played a little bit around with implicit conversions to add the
 lambda methods in Scala 2.11 on demand, but I was not able to get it work
 smoothly.
 
 I'm cross posting this thread to user as well to get some more user
 feedback.
 
 Cheers,
 Till
 
 On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
 wrote:
 
> The second alternative, with the addition of methods that take functions
> with Scala types, seems the most sensible.  I wonder if there is a need
> then to maintain the *J Java parameter methods, or whether users could 
> just
> access the functionality by converting the Scala DataStreams to Java via
> .javaStream and whatever the equivalent is for DataSets.
> 
> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
> wrote:
> 
>> Hi,
>> 
>> I'm currently working on
> https://issues.apache.org/jira/browse/FLINK-7811,
>> with the goal of adding support for Scala 2.12. There is a bit of a
> hurdle
>> and I have to explain some context first.
>> 
>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>> means that the following two method definitions can both take a lambda:
>> 
>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>> def map[R](fun: T => R): DataSet[R]
>> 
>> The Scala compiler gives precedence to the lambda version when you call
>> map() with a lambda in simple cases, so it works here. You could still
> call
>> map() with a lambda if the lambda version of the method weren't here
>> because they are now considered the same. For Scala 2.11 we need both
>> signatures, though, to allow calling with a lambda and with a
> MapFunction.
>> The problem is with more complicated method signatures, like:
>> 
>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>> DataSet[R]
>> 
>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>> 
>> (for reference, GroupReduceFunction is a SAM with void
>> reduce(java.lang.Iterable values, Collector out))
>> 
>> These two signatures are not the same but similar enough for the Scala
>> 2.12 compiler to "get confused". In Scala 2.11, I could call
> reduceGroup()
>> with a lambda that doesn't have parameter type definitions and things
> would
>> be fine. With Scala 2.12 I can't do that because the compiler can't
> figure
>> out which method to call and requires explicit type definitions on the
>> lambda parameters.
>> 
>> I see some solutions for this:
>> 
>> 1. Keep the methods as is, this would force people to always explicitly
>> specify parameter types on their lambdas.
>> 
>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>> 

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Chesnay Schepler

And the remaining parts would only be about breaking the API?

On 08.10.2018 12:24, Aljoscha Krettek wrote:

I have an open PR that does everything we can do for preparing the code base 
for Scala 2.12 without breaking the API: 
https://github.com/apache/flink/pull/6784


On 8. Oct 2018, at 09:56, Chesnay Schepler  wrote:

I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
wondering about the benefit, as the API break still has to happen at some point.

@Aljoscha how much work for supporting scala 2.12 can be merged without 
breaking the API?
If this is the only blocker I suggest to make the breaking change in 1.8.

On 05.10.2018 10:31, Till Rohrmann wrote:

Thanks Aljoscha for starting this discussion. The described problem brings
us indeed a bit into a pickle. Even with option 1) I think it is somewhat
API breaking because everyone who used lambdas without types needs to add
them now. Consequently, I only see two real options out of the ones you've
proposed:

1) Disambiguate the API (either by removing
reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out

Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
problematic because then all Scala API users who have implemented a
GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
think it will be problematic with RichGroupReduceFunction which you need to
get access to the RuntimeContext.

Maintaining two master branches puts a lot of burden onto the developers to
always keep the two branches in sync. Ideally I would like to avoid this.

I also played a little bit around with implicit conversions to add the
lambda methods in Scala 2.11 on demand, but I was not able to get it work
smoothly.

I'm cross posting this thread to user as well to get some more user
feedback.

Cheers,
Till

On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
wrote:


The second alternative, with the addition of methods that take functions
with Scala types, seems the most sensible.  I wonder if there is a need
then to maintain the *J Java parameter methods, or whether users could just
access the functionality by converting the Scala DataStreams to Java via
.javaStream and whatever the equivalent is for DataSets.

On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
wrote:


Hi,

I'm currently working on

https://issues.apache.org/jira/browse/FLINK-7811,

with the goal of adding support for Scala 2.12. There is a bit of a

hurdle

and I have to explain some context first.

With Scala 2.12, lambdas are implemented using the lambda mechanism of
Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
means that the following two method definitions can both take a lambda:

def map[R](mapper: MapFunction[T, R]): DataSet[R]
def map[R](fun: T => R): DataSet[R]

The Scala compiler gives precedence to the lambda version when you call
map() with a lambda in simple cases, so it works here. You could still

call

map() with a lambda if the lambda version of the method weren't here
because they are now considered the same. For Scala 2.11 we need both
signatures, though, to allow calling with a lambda and with a

MapFunction.

The problem is with more complicated method signatures, like:

def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
DataSet[R]

def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]

(for reference, GroupReduceFunction is a SAM with void
reduce(java.lang.Iterable values, Collector out))

These two signatures are not the same but similar enough for the Scala
2.12 compiler to "get confused". In Scala 2.11, I could call

reduceGroup()

with a lambda that doesn't have parameter type definitions and things

would

be fine. With Scala 2.12 I can't do that because the compiler can't

figure

out which method to call and requires explicit type definitions on the
lambda parameters.

I see some solutions for this:

1. Keep the methods as is, this would force people to always explicitly
specify parameter types on their lambdas.

2. Rename the second method to reduceGroupJ() to signal that it takes a
user function that takes Java-style interfaces (the first parameter is
java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
disambiguates the code, users can use lambdas without specifying explicit
parameter types but breaks the API.

One effect of 2. would be that we can add a reduceGroup() method that
takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
it would allow people to implement user functions without having to cast
the various Iterator/Iterable parameters.

Either way, people would have to adapt their code when moving to Scala
2.12 in some way, depending on what style of methods they use.

There is also solution 2.5:

2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
old method names for Scala 2.11. This would require some infrastructure

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Aljoscha Krettek
I have an open PR that does everything we can do for preparing the code base 
for Scala 2.12 without breaking the API: 
https://github.com/apache/flink/pull/6784

> On 8. Oct 2018, at 09:56, Chesnay Schepler  wrote:
> 
> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
> wondering about the benefit, as the API break still has to happen at some 
> point.
> 
> @Aljoscha how much work for supporting scala 2.12 can be merged without 
> breaking the API?
> If this is the only blocker I suggest to make the breaking change in 1.8.
> 
> On 05.10.2018 10:31, Till Rohrmann wrote:
>> Thanks Aljoscha for starting this discussion. The described problem brings
>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>> API breaking because everyone who used lambdas without types needs to add
>> them now. Consequently, I only see two real options out of the ones you've
>> proposed:
>> 
>> 1) Disambiguate the API (either by removing
>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>> 
>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>> problematic because then all Scala API users who have implemented a
>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>> think it will be problematic with RichGroupReduceFunction which you need to
>> get access to the RuntimeContext.
>> 
>> Maintaining two master branches puts a lot of burden onto the developers to
>> always keep the two branches in sync. Ideally I would like to avoid this.
>> 
>> I also played a little bit around with implicit conversions to add the
>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>> smoothly.
>> 
>> I'm cross posting this thread to user as well to get some more user
>> feedback.
>> 
>> Cheers,
>> Till
>> 
>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
>> wrote:
>> 
>>> The second alternative, with the addition of methods that take functions
>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>> then to maintain the *J Java parameter methods, or whether users could just
>>> access the functionality by converting the Scala DataStreams to Java via
>>> .javaStream and whatever the equivalent is for DataSets.
>>> 
>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
>>> wrote:
>>> 
 Hi,
 
 I'm currently working on
>>> https://issues.apache.org/jira/browse/FLINK-7811,
 with the goal of adding support for Scala 2.12. There is a bit of a
>>> hurdle
 and I have to explain some context first.
 
 With Scala 2.12, lambdas are implemented using the lambda mechanism of
 Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
 means that the following two method definitions can both take a lambda:
 
 def map[R](mapper: MapFunction[T, R]): DataSet[R]
 def map[R](fun: T => R): DataSet[R]
 
 The Scala compiler gives precedence to the lambda version when you call
 map() with a lambda in simple cases, so it works here. You could still
>>> call
 map() with a lambda if the lambda version of the method weren't here
 because they are now considered the same. For Scala 2.11 we need both
 signatures, though, to allow calling with a lambda and with a
>>> MapFunction.
 The problem is with more complicated method signatures, like:
 
 def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
 DataSet[R]
 
 def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
 
 (for reference, GroupReduceFunction is a SAM with void
 reduce(java.lang.Iterable values, Collector out))
 
 These two signatures are not the same but similar enough for the Scala
 2.12 compiler to "get confused". In Scala 2.11, I could call
>>> reduceGroup()
 with a lambda that doesn't have parameter type definitions and things
>>> would
 be fine. With Scala 2.12 I can't do that because the compiler can't
>>> figure
 out which method to call and requires explicit type definitions on the
 lambda parameters.
 
 I see some solutions for this:
 
 1. Keep the methods as is, this would force people to always explicitly
 specify parameter types on their lambdas.
 
 2. Rename the second method to reduceGroupJ() to signal that it takes a
 user function that takes Java-style interfaces (the first parameter is
 java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
 disambiguates the code, users can use lambdas without specifying explicit
 parameter types but breaks the API.
 
 One effect of 2. would be that we can add a reduceGroup() method that
 takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
 it would allow people to implement user functions without having to cast
 the various Iterator/Iterable parameters.
 

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Chesnay Schepler
I'd rather not maintain 2 master branches. Beyond the maintenance 
overhead I'm
wondering about the benefit, as the API break still has to happen at 
some point.


@Aljoscha how much work for supporting scala 2.12 can be merged without 
breaking the API?

If this is the only blocker I suggest to make the breaking change in 1.8.

On 05.10.2018 10:31, Till Rohrmann wrote:

Thanks Aljoscha for starting this discussion. The described problem brings
us indeed a bit into a pickle. Even with option 1) I think it is somewhat
API breaking because everyone who used lambdas without types needs to add
them now. Consequently, I only see two real options out of the ones you've
proposed:

1) Disambiguate the API (either by removing
reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out

Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
problematic because then all Scala API users who have implemented a
GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
think it will be problematic with RichGroupReduceFunction which you need to
get access to the RuntimeContext.

Maintaining two master branches puts a lot of burden onto the developers to
always keep the two branches in sync. Ideally I would like to avoid this.

I also played a little bit around with implicit conversions to add the
lambda methods in Scala 2.11 on demand, but I was not able to get it work
smoothly.

I'm cross posting this thread to user as well to get some more user
feedback.

Cheers,
Till

On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
wrote:


The second alternative, with the addition of methods that take functions
with Scala types, seems the most sensible.  I wonder if there is a need
then to maintain the *J Java parameter methods, or whether users could just
access the functionality by converting the Scala DataStreams to Java via
.javaStream and whatever the equivalent is for DataSets.

On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
wrote:


Hi,

I'm currently working on

https://issues.apache.org/jira/browse/FLINK-7811,

with the goal of adding support for Scala 2.12. There is a bit of a

hurdle

and I have to explain some context first.

With Scala 2.12, lambdas are implemented using the lambda mechanism of
Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
means that the following two method definitions can both take a lambda:

def map[R](mapper: MapFunction[T, R]): DataSet[R]
def map[R](fun: T => R): DataSet[R]

The Scala compiler gives precedence to the lambda version when you call
map() with a lambda in simple cases, so it works here. You could still

call

map() with a lambda if the lambda version of the method weren't here
because they are now considered the same. For Scala 2.11 we need both
signatures, though, to allow calling with a lambda and with a

MapFunction.

The problem is with more complicated method signatures, like:

def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
DataSet[R]

def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]

(for reference, GroupReduceFunction is a SAM with void
reduce(java.lang.Iterable values, Collector out))

These two signatures are not the same but similar enough for the Scala
2.12 compiler to "get confused". In Scala 2.11, I could call

reduceGroup()

with a lambda that doesn't have parameter type definitions and things

would

be fine. With Scala 2.12 I can't do that because the compiler can't

figure

out which method to call and requires explicit type definitions on the
lambda parameters.

I see some solutions for this:

1. Keep the methods as is, this would force people to always explicitly
specify parameter types on their lambdas.

2. Rename the second method to reduceGroupJ() to signal that it takes a
user function that takes Java-style interfaces (the first parameter is
java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
disambiguates the code, users can use lambdas without specifying explicit
parameter types but breaks the API.

One effect of 2. would be that we can add a reduceGroup() method that
takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
it would allow people to implement user functions without having to cast
the various Iterator/Iterable parameters.

Either way, people would have to adapt their code when moving to Scala
2.12 in some way, depending on what style of methods they use.

There is also solution 2.5:

2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
old method names for Scala 2.11. This would require some infrastructure

and

I don't yet know how it can be done in a sane way.

What do you think? I personally would be in favour of 2. but it breaks

the

existing API.

Best,
Aljoscha








Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-05 Thread Till Rohrmann
Thanks Aljoscha for starting this discussion. The described problem brings
us indeed a bit into a pickle. Even with option 1) I think it is somewhat
API breaking because everyone who used lambdas without types needs to add
them now. Consequently, I only see two real options out of the ones you've
proposed:

1) Disambiguate the API (either by removing
reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out

Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
problematic because then all Scala API users who have implemented a
GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
think it will be problematic with RichGroupReduceFunction which you need to
get access to the RuntimeContext.

Maintaining two master branches puts a lot of burden onto the developers to
always keep the two branches in sync. Ideally I would like to avoid this.

I also played a little bit around with implicit conversions to add the
lambda methods in Scala 2.11 on demand, but I was not able to get it work
smoothly.

I'm cross posting this thread to user as well to get some more user
feedback.

Cheers,
Till

On Thu, Oct 4, 2018 at 7:36 PM Elias Levy 
wrote:

> The second alternative, with the addition of methods that take functions
> with Scala types, seems the most sensible.  I wonder if there is a need
> then to maintain the *J Java parameter methods, or whether users could just
> access the functionality by converting the Scala DataStreams to Java via
> .javaStream and whatever the equivalent is for DataSets.
>
> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek 
> wrote:
>
> > Hi,
> >
> > I'm currently working on
> https://issues.apache.org/jira/browse/FLINK-7811,
> > with the goal of adding support for Scala 2.12. There is a bit of a
> hurdle
> > and I have to explain some context first.
> >
> > With Scala 2.12, lambdas are implemented using the lambda mechanism of
> > Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
> > means that the following two method definitions can both take a lambda:
> >
> > def map[R](mapper: MapFunction[T, R]): DataSet[R]
> > def map[R](fun: T => R): DataSet[R]
> >
> > The Scala compiler gives precedence to the lambda version when you call
> > map() with a lambda in simple cases, so it works here. You could still
> call
> > map() with a lambda if the lambda version of the method weren't here
> > because they are now considered the same. For Scala 2.11 we need both
> > signatures, though, to allow calling with a lambda and with a
> MapFunction.
> >
> > The problem is with more complicated method signatures, like:
> >
> > def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
> > DataSet[R]
> >
> > def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
> >
> > (for reference, GroupReduceFunction is a SAM with void
> > reduce(java.lang.Iterable values, Collector out))
> >
> > These two signatures are not the same but similar enough for the Scala
> > 2.12 compiler to "get confused". In Scala 2.11, I could call
> reduceGroup()
> > with a lambda that doesn't have parameter type definitions and things
> would
> > be fine. With Scala 2.12 I can't do that because the compiler can't
> figure
> > out which method to call and requires explicit type definitions on the
> > lambda parameters.
> >
> > I see some solutions for this:
> >
> > 1. Keep the methods as is, this would force people to always explicitly
> > specify parameter types on their lambdas.
> >
> > 2. Rename the second method to reduceGroupJ() to signal that it takes a
> > user function that takes Java-style interfaces (the first parameter is
> > java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
> > disambiguates the code, users can use lambdas without specifying explicit
> > parameter types but breaks the API.
> >
> > One effect of 2. would be that we can add a reduceGroup() method that
> > takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
> > it would allow people to implement user functions without having to cast
> > the various Iterator/Iterable parameters.
> >
> > Either way, people would have to adapt their code when moving to Scala
> > 2.12 in some way, depending on what style of methods they use.
> >
> > There is also solution 2.5:
> >
> > 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
> > old method names for Scala 2.11. This would require some infrastructure
> and
> > I don't yet know how it can be done in a sane way.
> >
> > What do you think? I personally would be in favour of 2. but it breaks
> the
> > existing API.
> >
> > Best,
> > Aljoscha
> >
> >
> >
> >
>


Re: Scala 2.12 Support

2018-08-16 Thread Timo Walther

Hi Aaron,

we just released Flink 1.6 and the discussion for the roadmap of 1.7 
will begin soon. I guess the Jira issue will also updated then. I would 
recommend to watch it for now.


Regards,
Timo


Am 16.08.18 um 17:08 schrieb Aaron Levin:

Hi Piotr,

Thanks for the update. Glad to hear it's high on the priority list! 
I'm looking forward to the 1.7 update!


It may be worth having someone more official from the Flink team give 
an update on that ticket. It wasn't clear if the 1.7 comment from that 
user was just a reference to the fact that 1.6 had come out (or where 
they got that information). I know a few people have cited the ticket 
and concluded "not clear what's going on with Scala 2.12 support." If 
you have the bandwidth, a note from you or anyone else would be helpful!


Thanks again!

Best,

Aaron Levin

On Thu, Aug 16, 2018 at 6:04 AM, Piotr Nowojski 
mailto:pi...@data-artisans.com>> wrote:


    Hi,

Scala 2.12 support is high on our priority list and we hope to
have it included for the 1.7 release (as you can see in the ticket
itself), which should happen later this year.

Piotrek



On 15 Aug 2018, at 17:59, Aaron Levin mailto:aaronle...@stripe.com>> wrote:

Hello!

I'm wondering if there is anywhere I can see Flink's roadmap for
Scala 2.12 support. The last email I can find on the list for
this was back in January, and the FLINK-7811[0], the ticket
    asking for Scala 2.12 support, hasn't been updated in a few months.

Recently Spark fixed the ClosureCleaner code to support Scala
2.12[1], and from what I can gather this was one of the main
barrier for Flink supporting Scala 2.12. Given this has been
fixed, is there work in progress to support Scala 2.12? Any
updates on FLINK-7811?

Thanks for your help!

[0] https://issues.apache.org/jira/browse/FLINK-7811
<https://issues.apache.org/jira/browse/FLINK-7811>
[1] https://issues.apache.org/jira/browse/SPARK-14540
<https://issues.apache.org/jira/browse/SPARK-14540>

Best,

Aaron Levin







Re: Scala 2.12 Support

2018-08-16 Thread Aaron Levin
Hi Piotr,

Thanks for the update. Glad to hear it's high on the priority list! I'm
looking forward to the 1.7 update!

It may be worth having someone more official from the Flink team give an
update on that ticket. It wasn't clear if the 1.7 comment from that user
was just a reference to the fact that 1.6 had come out (or where they got
that information). I know a few people have cited the ticket and concluded
"not clear what's going on with Scala 2.12 support." If you have the
bandwidth, a note from you or anyone else would be helpful!

Thanks again!

Best,

Aaron Levin

On Thu, Aug 16, 2018 at 6:04 AM, Piotr Nowojski 
wrote:

> Hi,
>
> Scala 2.12 support is high on our priority list and we hope to have it
> included for the 1.7 release (as you can see in the ticket itself), which
> should happen later this year.
>
> Piotrek
>
>
> On 15 Aug 2018, at 17:59, Aaron Levin  wrote:
>
> Hello!
>
> I'm wondering if there is anywhere I can see Flink's roadmap for Scala
> 2.12 support. The last email I can find on the list for this was back in
> January, and the FLINK-7811[0], the ticket asking for Scala 2.12 support,
> hasn't been updated in a few months.
>
> Recently Spark fixed the ClosureCleaner code to support Scala 2.12[1], and
> from what I can gather this was one of the main barrier for Flink
> supporting Scala 2.12. Given this has been fixed, is there work in progress
> to support Scala 2.12? Any updates on FLINK-7811?
>
> Thanks for your help!
>
> [0] https://issues.apache.org/jira/browse/FLINK-7811
> [1] https://issues.apache.org/jira/browse/SPARK-14540
>
> Best,
>
> Aaron Levin
>
>
>


Re: Scala 2.12 Support

2018-08-16 Thread Piotr Nowojski
Hi,

Scala 2.12 support is high on our priority list and we hope to have it included 
for the 1.7 release (as you can see in the ticket itself), which should happen 
later this year.

Piotrek

> On 15 Aug 2018, at 17:59, Aaron Levin  wrote:
> 
> Hello!
> 
> I'm wondering if there is anywhere I can see Flink's roadmap for Scala 2.12 
> support. The last email I can find on the list for this was back in January, 
> and the FLINK-7811[0], the ticket asking for Scala 2.12 support, hasn't been 
> updated in a few months.
> 
> Recently Spark fixed the ClosureCleaner code to support Scala 2.12[1], and 
> from what I can gather this was one of the main barrier for Flink supporting 
> Scala 2.12. Given this has been fixed, is there work in progress to support 
> Scala 2.12? Any updates on FLINK-7811?
> 
> Thanks for your help!
> 
> [0] https://issues.apache.org/jira/browse/FLINK-7811 
> <https://issues.apache.org/jira/browse/FLINK-7811>
> [1] https://issues.apache.org/jira/browse/SPARK-14540 
> <https://issues.apache.org/jira/browse/SPARK-14540>
> 
> Best,
> 
> Aaron Levin



Scala 2.12 Support

2018-08-15 Thread Aaron Levin
Hello!

I'm wondering if there is anywhere I can see Flink's roadmap for Scala 2.12
support. The last email I can find on the list for this was back in
January, and the FLINK-7811[0], the ticket asking for Scala 2.12 support,
hasn't been updated in a few months.

Recently Spark fixed the ClosureCleaner code to support Scala 2.12[1], and
from what I can gather this was one of the main barrier for Flink
supporting Scala 2.12. Given this has been fixed, is there work in progress
to support Scala 2.12? Any updates on FLINK-7811?

Thanks for your help!

[0] https://issues.apache.org/jira/browse/FLINK-7811
[1] https://issues.apache.org/jira/browse/SPARK-14540

Best,

Aaron Levin


Re: scala 2.12 support/cross-compile

2018-01-03 Thread Hao Sun
Thanks Stephan and Alhoscha for the info!

On Wed, Jan 3, 2018 at 2:41 AM Aljoscha Krettek <aljos...@apache.org> wrote:

> Hi,
>
> This is the umbrella issue for Scala 2.12 support. As Stephan pointed out,
> the ClosureCleaner and SAMs are currently the main problems. The first is
> also a problem for Spark, which track their respective progress here:
> https://issues.apache.org/jira/browse/SPARK-14540
> <https://issues.apache.org/jira/browse/SPARK-14540>.
>
> Best,
> Aljoscha
>
>
> On 3. Jan 2018, at 10:39, Stephan Ewen <se...@apache.org> wrote:
>
> Hi Hao Sun!
>
> This is work in progress, but Scala 2.12 is a bit tricky. I think the
> Scala folks have messed this version up a bit, to be honest.
>
> The main blockers is that Scala 2.12 breaks some classes through its
> addition of SAM interface lambdas (similar to Java). Many of the DataStream
> API classes have two method variants (one with a Scala Function, one with a
> Java SAM interface) which now become ambiguously overloaded methods in
> Scala 2.12.
>
> In addition, Scala 2.12 also needs a different closure cleaner, because
> Scala 2.12 compiles differently.
>
> I am adding Aljoscha, who has started working on this...
>
> Best,
> Stephan
>
>
> On Wed, Jan 3, 2018 at 4:13 AM, Hao Sun <ha...@zendesk.com> wrote:
>
>> Hi team, I am wondering if there is a schedule to support scala 2.12?
>> If I need flink 1.3+ with scala 2.12, do I just have to cross compile
>> myself? Is there anything blocking us from using scala 2.12?
>>
>> Thanks
>>
>
>
>


Re: scala 2.12 support/cross-compile

2018-01-03 Thread Aljoscha Krettek
Hi,

This is the umbrella issue for Scala 2.12 support. As Stephan pointed out, the 
ClosureCleaner and SAMs are currently the main problems. The first is also a 
problem for Spark, which track their respective progress here: 
https://issues.apache.org/jira/browse/SPARK-14540 
<https://issues.apache.org/jira/browse/SPARK-14540>.

Best,
Aljoscha

> On 3. Jan 2018, at 10:39, Stephan Ewen <se...@apache.org> wrote:
> 
> Hi Hao Sun!
> 
> This is work in progress, but Scala 2.12 is a bit tricky. I think the Scala 
> folks have messed this version up a bit, to be honest.
> 
> The main blockers is that Scala 2.12 breaks some classes through its addition 
> of SAM interface lambdas (similar to Java). Many of the DataStream API 
> classes have two method variants (one with a Scala Function, one with a Java 
> SAM interface) which now become ambiguously overloaded methods in Scala 2.12.
> 
> In addition, Scala 2.12 also needs a different closure cleaner, because Scala 
> 2.12 compiles differently.
> 
> I am adding Aljoscha, who has started working on this...
> 
> Best,
> Stephan
> 
> 
> On Wed, Jan 3, 2018 at 4:13 AM, Hao Sun <ha...@zendesk.com 
> <mailto:ha...@zendesk.com>> wrote:
> Hi team, I am wondering if there is a schedule to support scala 2.12?
> If I need flink 1.3+ with scala 2.12, do I just have to cross compile myself? 
> Is there anything blocking us from using scala 2.12?
> 
> Thanks
> 



Re: scala 2.12 support/cross-compile

2018-01-03 Thread Stephan Ewen
Hi Hao Sun!

This is work in progress, but Scala 2.12 is a bit tricky. I think the Scala
folks have messed this version up a bit, to be honest.

The main blockers is that Scala 2.12 breaks some classes through its
addition of SAM interface lambdas (similar to Java). Many of the DataStream
API classes have two method variants (one with a Scala Function, one with a
Java SAM interface) which now become ambiguously overloaded methods in
Scala 2.12.

In addition, Scala 2.12 also needs a different closure cleaner, because
Scala 2.12 compiles differently.

I am adding Aljoscha, who has started working on this...

Best,
Stephan


On Wed, Jan 3, 2018 at 4:13 AM, Hao Sun  wrote:

> Hi team, I am wondering if there is a schedule to support scala 2.12?
> If I need flink 1.3+ with scala 2.12, do I just have to cross compile
> myself? Is there anything blocking us from using scala 2.12?
>
> Thanks
>