Sorry, but just to be clear here, this is the 2.12 API issue: 
https://issues.apache.org/jira/browse/SPARK-14643, with more details in this 
doc: 
https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit.

Basically, if we are allowed to change Spark’s API a little to have only one 
version of methods that are currently overloaded between Java and Scala, we can 
get away with a single source three for all Scala versions and Java ABI 
compatibility against any type of Spark (whether using Scala 2.11 or 2.12). On 
the other hand, if we want to keep the API and ABI of the Spark 2.x branch, 
we’ll need a different source tree for Scala 2.12 with different copies of 
pretty large classes such as RDD, DataFrame and DStream, and Java users may 
have to change their code when linking against different versions of Spark.

This is of course only one of the possible ABI changes, but it is a 
considerable engineering effort, so we’d have to sign up for maintaining all 
these different source files. It seems kind of silly given that Scala 2.12 was 
released in 2016, so we’re doing all this work to keep ABI compatibility for 
Scala 2.11, which isn’t even that widely used any more for new projects. Also 
keep in mind that the next Spark release will probably take at least 3-4 
months, so we’re talking about what people will be using in fall 2018.

Matei

> On Apr 5, 2018, at 10:13 AM, Marcelo Vanzin <van...@cloudera.com> wrote:
> 
> I remember seeing somewhere that Scala still has some issues with Java
> 9/10 so that might be hard...
> 
> But on that topic, it might be better to shoot for Java 11
> compatibility. 9 and 10, following the new release model, aren't
> really meant to be long-term releases.
> 
> In general, agree with Sean here. Doesn't look like 2.12 support
> requires unexpected API breakages. So unless there's a really good
> reason to break / remove a bunch of existing APIs...
> 
> On Thu, Apr 5, 2018 at 9:04 AM, Marco Gaido <marcogaid...@gmail.com> wrote:
>> Hi all,
>> 
>> I also agree with Mark that we should add Java 9/10 support to an eventual
>> Spark 3.0 release, because supporting Java 9 is not a trivial task since we
>> are using some internal APIs for the memory management which changed: either
>> we find a solution which works on both (but I am not sure it is feasible) or
>> we have to switch between 2 implementations according to the Java version.
>> So I'd rather avoid doing this in a non-major release.
>> 
>> Thanks,
>> Marco
>> 
>> 
>> 2018-04-05 17:35 GMT+02:00 Mark Hamstra <m...@clearstorydata.com>:
>>> 
>>> As with Sean, I'm not sure that this will require a new major version, but
>>> we should also be looking at Java 9 & 10 support -- particularly with regard
>>> to their better functionality in a containerized environment (memory limits
>>> from cgroups, not sysconf; support for cpusets). In that regard, we should
>>> also be looking at using the latest Scala 2.11.x maintenance release in
>>> current Spark branches.
>>> 
>>> On Thu, Apr 5, 2018 at 5:45 AM, Sean Owen <sro...@gmail.com> wrote:
>>>> 
>>>> On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin <r...@databricks.com> wrote:
>>>>> 
>>>>> The primary motivating factor IMO for a major version bump is to support
>>>>> Scala 2.12, which requires minor API breaking changes to Spark’s APIs.
>>>>> Similar to Spark 2.0, I think there are also opportunities for other 
>>>>> changes
>>>>> that we know have been biting us for a long time but can’t be changed in
>>>>> feature releases (to be clear, I’m actually not sure they are all good
>>>>> ideas, but I’m writing them down as candidates for consideration):
>>>> 
>>>> 
>>>> IIRC from looking at this, it is possible to support 2.11 and 2.12
>>>> simultaneously. The cross-build already works now in 2.3.0. Barring some 
>>>> big
>>>> change needed to get 2.12 fully working -- and that may be the case -- it
>>>> nearly works that way now.
>>>> 
>>>> Compiling vs 2.11 and 2.12 does however result in some APIs that differ
>>>> in byte code. However Scala itself isn't mutually compatible between 2.11
>>>> and 2.12 anyway; that's never been promised as compatible.
>>>> 
>>>> (Interesting question about what *Java* users should expect; they would
>>>> see a difference in 2.11 vs 2.12 Spark APIs, but that has always been 
>>>> true.)
>>>> 
>>>> I don't disagree with shooting for Spark 3.0, just saying I don't know if
>>>> 2.12 support requires moving to 3.0. But, Spark 3.0 could consider dropping
>>>> 2.11 support if needed to make supporting 2.12 less painful.
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Marcelo
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to