[ 
https://issues.apache.org/jira/browse/SPARK-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912842#comment-16912842
 ] 

Sean Owen commented on SPARK-14643:
-----------------------------------

I took another look at this. I tried implementing Josh's proposal, and while it 
begins to work for map(), I found I quickly ran into some complications. First 
is the visibility of the implicit conversion from Function1 to MapFunction; it 
requires importing from org.apache.spark.sql._ now for Scala users, no? Second, 
things like mapPartitions and MapPartitionsFunction will require a round-trip 
conversion to/from Java Iterators / Scala Iterators, which is a little 
overhead, to pipe Scala users through the Java-specific overload. I also had 
trouble getting that to work in cases where the mapPartitions returns a 
Iterator on primitive type, which Java iterators won't support. They may be 
solvable, but it was getting messy and not just in Dataset, and that makes me 
uneasy.

Right now, we have already kind of had Java users eat this problem if they 
compile vs 2.12, and, I haven't heard much about it. They have to cast their 
lambdas to MapFunction et al to disambiguate. This isn't great, but not the end 
of the world.

Why not just have Java callers call the Function1 overload that exists? delete 
the Java-specific overload? I get that it means they depend on a Scala class 
and that's a complication, but lambdas will hide that. Now that we require Java 
8, and can accept a breaking change in Spark 3, if I'm reading [~joshrosen]'s 
doc correctly, that's viable? Well, they'd have to for example convert to Scala 
Iterators in the case of mapPartitions, which is the flip-side to the problem 
above. That's quite hard for Java users. That is, map() works out just fine, 
not so much mapPartitions() in Java.

I'm inclined to say... leave it? it's a minor inconvenience for Java users 
right now, and there are already minor inconveniences for Java users calling 
this Scala-based system ({{$MODULE$}} anyone?)

> Remove overloaded methods which become ambiguous in Scala 2.12
> --------------------------------------------------------------
>
>                 Key: SPARK-14643
>                 URL: https://issues.apache.org/jira/browse/SPARK-14643
>             Project: Spark
>          Issue Type: Task
>          Components: Build, Project Infra
>    Affects Versions: 2.4.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Major
>
> Spark 1.x's Dataset API runs into subtle source incompatibility problems for 
> Java 8 and Scala 2.12 users when Spark is built against Scala 2.12. In a 
> nutshell, the current API has overloaded methods whose signatures are 
> ambiguous when resolving calls that use the Java 8 lambda syntax (only if 
> Spark is build against Scala 2.12).
> This issue is somewhat subtle, so there's a full writeup at 
> https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit?usp=sharing
>  which describes the exact circumstances under which the current APIs are 
> problematic. The writeup also proposes a solution which involves the removal 
> of certain overloads only in Scala 2.12 builds of Spark and the introduction 
> of implicit conversions for retaining source compatibility.
> We don't need to implement any of these changes until we add Scala 2.12 
> support since the changes must only be applied when building against Scala 
> 2.12 and will be done via traits + shims which are mixed in via 
> per-Scala-version source directories (like how we handle the 
> Scala-version-specific parts of the REPL). For now, this JIRA acts as a 
> placeholder so that the parent JIRA reflects the complete set of tasks which 
> need to be finished for 2.12 support.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to