Re: [discuss] ending support for Java 7 in Spark 2.0

Al Pivonka Thu, 24 Mar 2016 09:39:09 -0700

As an end user (developer) and Cluster Admin.
I would have to agree with Koet.


To me the real question is timing,  current version is 1.6.1, the question
I have is how many more releases till 2.0 and what is the time frame?

If you give people six to twelve months to plan and make sure they know
(paste it all over the web site) most can plan ahead.


Just my two pennies





On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com> wrote:

> (PS CDH5 runs fine with Java 8, but I understand your more general point.)
>
> This is a familiar context indeed, but in that context, would a group
> not wanting to update to Java 8 want to manually put Spark 2.0 into
> the mix? That is, if this is a context where the cluster is
> purposefully some stable mix of components, would you be updating just
> one?
>
> You make a good point about Scala being more library than
> infrastructure component. So it can be updated on a per-app basis. On
> the one hand it's harder to handle different Scala versions from the
> framework side, it's less hard on the deployment side.
>
> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > i think the arguments are convincing, but it also makes me wonder if i
> live
> > in some kind of alternate universe... we deploy on customers clusters,
> where
> > the OS, python version, java version and hadoop distro are not chosen by
> us.
> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have
> > access to a single proxy machine and launch through yarn. asking them to
> > upgrade java is pretty much out of the question or a 6+ month ordeal. of
> the
> > 10 client clusters i can think of on the top of my head all of them are
> on
> > java 7, none are on java 8. so by doing this you would make spark 2
> > basically unusable for us (unless most of them have plans of upgrading in
> > near term to java 8, i will ask around and report back...).
> >
> > on a side note, its particularly interesting to me that spark 2 chose to
> > continue support for scala 2.10, because even for us in our very
> constricted
> > client environments the scala version is something we can easily upgrade
> (we
> > just deploy a custom build of spark for the relevant scala version and
> > hadoop distro). and because scala is not a dependency of any hadoop
> distro
> > (so not on classpath, which i am very happy about) we can use whatever
> scala
> > version we like. also i found the upgrade path from scala 2.10 to 2.11
> to be
> > very easy, so i have a hard time understanding why anyone would stay on
> > scala 2.10. and finally with scala 2.12 around the corner you really dont
> > want to be supporting 3 versions. so clearly i am missing something here.
> >
> >
> >
> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >>
> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support
> of
> >> Java 7. It makes sense.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
> >>>
> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> >>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> >>> Spark 2.0 would require Java 8 to run).
> >>>
> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> >>> removed public downloads for JDK 7 in July 2015. In the past I've
> >>> actually been against dropping Java 8, but today I ran into an issue
> >>> with the new Dataset API not working well with Java 8 lambdas, and that
> >>> changed my opinion on this.
> >>>
> >>> I've been thinking more about this issue today and also talked with a
> >>> lot people offline to gather feedback, and I actually think the pros
> >>> outweighs the cons, for the following reasons (in some rough order of
> >>> importance):
> >>>
> >>> 1. It is complicated to test how well Spark APIs work for Java lambdas
> >>> if we support Java 7. Jenkins machines need to have both Java 7 and
> Java
> >>> 8 installed and we must run through a set of test suites in 7, and then
> >>> the lambda tests in Java 8. This complicates build
> environments/scripts,
> >>> and makes them less robust. Without good testing infrastructure, I have
> >>> no confidence in building good APIs for Java 8.
> >>>
> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
> >>> Java 7. The primary APIs we want users to use in Spark 2.x are
> >>> Dataset/DataFrame, and this impacts pretty much everything from machine
> >>> learning to structured streaming. We have made great progress in their
> >>> performance through extensive use of code generation. (In many
> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
> compiler
> >>> than a MapReduce or query engine.) These optimizations don't work well
> >>> in Java 7 due to broken code cache flushing. This problem has been
> fixed
> >>> by Oracle in Java 8. In addition, Java 8 comes with better support for
> >>> Unsafe and SIMD.
> >>>
> >>> 3. Scala 2.12 will come out soon, and we will want to add support for
> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
> >>> have a fairly complicated compatibility matrix and testing
> >>> infrastructure.
> >>>
> >>> 4. There are libraries that I've looked into in the past that support
> >>> only Java 8. This is more common in high performance libraries such as
> >>> Aeron (a messaging library). Having to support Java 7 means we are not
> >>> able to use these. It is not that big of a deal right now, but will
> >>> become increasingly more difficult as we optimize performance.
> >>>
> >>>
> >>> The downside of not supporting Java 7 is also obvious. Some
> >>> organizations are stuck with Java 7, and they wouldn't be able to use
> >>> Spark 2.0 without upgrading Java.
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
Those who say it can't be done, are usually interrupted by those doing it.

Re: [discuss] ending support for Java 7 in Spark 2.0

Reply via email to