Re: time for Apache Spark 3.0?

Steve Loughran Thu, 05 Apr 2018 10:45:06 -0700


On 5 Apr 2018, at 18:04, Matei Zaharia 
<matei.zaha...@gmail.com<mailto:matei.zaha...@gmail.com>> wrote:


Java 9/10 support would be great to add as well.

Be aware that the work moving hadoop core to java 9+ is still a big piece of 
work being undertaken by Akira Ajisaka & colleagues at NTT

https://issues.apache.org/jira/browse/HADOOP-11123

Big dependency updates and handling Oracle hiding sun.misc stuff which low 
level code depends on are the troublespots, with a move to Log4J 2 going to be 
observably traumatic to all apps which require a log4.properties to set 
themselves up. As usual: any testing which can be done early will be welcomed 
by all, the earlier the better

That stuff is all about getting things working: supporting the java 9 packaging 
model. Which is a really compelling reason to go for it


Regarding Scala 2.12, I thought that supporting it would become easier if we 
change the Spark API and ABI slightly. Basically, it is of course possible to 
create an alternate source tree today, but it might be possible to share the 
same source files if we tweak some small things in the methods that are 
overloaded across Scala and Java. I don’t remember the exact details, but the 
idea was to reduce the total maintenance work needed at the cost of requiring 
users to recompile their apps.

I’m personally for moving to 3.0 because of the other things we can clean up as 
well, e.g. the default SQL dialect, Iterable stuff, and possibly dependency 
shading (a major pain point for lots of users)

Hadoop 3 does have a shaded client, though not enough for Spark; if work 
identifying & fixing the outstanding dependencies is started now, Hadoop 3.2 
should be able to offer the set of shaded libraries needed by Spark.

There's always a price to that, which is in redistributable size and it's 
impact on start times, duplicate classes loaded (memory,  reduced chance of JIT 
recompilation, ...), and the whole transitive-shading problem. Java 9 should be 
the real target for a clean solution to all of this.

Re: time for Apache Spark 3.0?

Reply via email to