Hey guys, thanks for the insights. Also, I realize Hadoop has gotten
way better about this with 2.2+ and I think it's great progress.

We have well defined API levels in Spark and also automated checking
of API violations for new pull requests. When doing code reviews we
always enforce the narrowest possible visibility:

1. private
2. private[spark]
3. @Experimental or @DeveloperApi
4. public

Our automated checks exclude 1-3. Anything that breaks 4 will trigger
a build failure.

The Scala compiler prevents anyone external from using 1 or 2. We do
have "bytecode public but annotated" (3) API's that we might change.
We spent a lot of time looking into whether these can offer compiler
warnings, but we haven't found a way to do this and do not see a
better alternative at this point.

Regarding Scala compatibility, Scala 2.11+ is "source code
compatible", meaning we'll be able to cross-compile Spark for
different versions of Scala. We've already been in touch with Typesafe
about this and they've offered to integrate Spark into their
compatibility test suite. They've also committed to patching 2.11 with
a minor release if bugs are found.

Anyways, my point is we've actually thought a lot about this already.

The CLASSPATH thing is different than API stability, but indeed also a
form of compatibility. This is something where I'd also like to see
Spark have better isolation of user classes from Spark's own
execution...

- Patrick



On Fri, May 30, 2014 at 12:30 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> On Fri, May 30, 2014 at 12:05 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote:
>> I don't know if Scala provides any mechanisms to do this beyond what Java 
>> provides.
>
> In fact it does. You can say something like "private[foo]" and the
> annotated element will be visible for all classes under "foo" (where
> "foo" is any package in the hierarchy leading up to the class). That's
> used a lot in Spark.
>
> I haven't fully looked at how the @DeveloperApi is used, but I agree
> with you - annotations are not a good way to do this. The Scala
> feature above would be much better, but it might still leak things at
> the Java bytecode level (don't know how Scala implements it under the
> cover, but I assume it's not by declaring the element as a Java
> "private").
>
> Another thing is that in Scala the default visibility is public, which
> makes it very easy to inadvertently add things to the API. I'd like to
> see more care in making things have the proper visibility - I
> generally declare things private first, and relax that as needed.
> Using @VisibleForTesting would be great too, when the Scala
> private[foo] approach doesn't work.
>
>> Does Spark also expose its CLASSPATH in
>> this way to executors?  I was under the impression that it did.
>
> If you're using the Spark assemblies, yes, there is a lot of things
> that your app gets exposed to. For example, you can see Guava and
> Jetty (and many other things) there. This is something that has always
> bugged me, but I don't really have a good suggestion of how to fix it;
> shading goes a certain way, but it also breaks codes that uses
> reflection (e.g. Class.forName()-style class loading).
>
> What is worse is that Spark doesn't even agree with the Hadoop code it
> depends on; e.g., Spark uses Guava 14.x while Hadoop is still in Guava
> 11.x. So when you run your Scala app, what gets loaded?
>
>> At some point we will also have to confront the Scala version issue.  Will
>> there be flag days where Spark jobs need to be upgraded to a new,
>> incompatible version of Scala to run on the latest Spark?
>
> Yes, this could be an issue - I'm not sure Scala has a policy towards
> this, but updates (at least minor, e.g. 2.9 -> 2.10) tend to break
> binary compatibility.
>
> Scala also makes some API updates tricky - e.g., adding a new named
> argument to a Scala method is not a binary compatible change (while,
> e.g., adding a new keyword argument in a python method is just fine).
> The use of implicits and other Scala features make this even more
> opaque...
>
> Anyway, not really any solutions in this message, just a few comments
> I wanted to throw out there. :-)
>
> --
> Marcelo

Reply via email to