i think marcelo also pointed this out before. its very interesting to hear, i was not aware of that until today. it would mean we would only have to convince a group/client with a cluster to install jdk8 on the nodes, without actually transitioning to it, if i understand it correctly. that would definitely lower the hurdle by a lot.
On Thu, Mar 24, 2016 at 9:36 PM, Mridul Muralidharan <mri...@gmail.com> wrote: > > Container Java version can be different from yarn Java version : we run > jobs with jdk8 on jdk7 cluster without issues. > > Regards > Mridul > > > On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote: > >> i guess what i am saying is that in a yarn world the only hard >> restrictions left are the the containers you run in, which means the hadoop >> version, java version and python version (if you use python). >> >> >> On Thu, Mar 24, 2016 at 12:39 PM, Koert Kuipers <ko...@tresata.com> >> wrote: >> >>> The group will not upgrade to spark 2.0 themselves, but they are mostly >>> fine with vendors like us deploying our application via yarn with whatever >>> spark version we choose (and bundle, so they do not install it separately, >>> they might not even be aware of what spark version we use). This all works >>> because spark does not need to be on the cluster nodes, just on the one >>> machine where our application gets launched. Having yarn is pretty awesome >>> in this respect. >>> >>> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com> wrote: >>> >>>> (PS CDH5 runs fine with Java 8, but I understand your more general >>>> point.) >>>> >>>> This is a familiar context indeed, but in that context, would a group >>>> not wanting to update to Java 8 want to manually put Spark 2.0 into >>>> the mix? That is, if this is a context where the cluster is >>>> purposefully some stable mix of components, would you be updating just >>>> one? >>>> >>>> You make a good point about Scala being more library than >>>> infrastructure component. So it can be updated on a per-app basis. On >>>> the one hand it's harder to handle different Scala versions from the >>>> framework side, it's less hard on the deployment side. >>>> >>>> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> > i think the arguments are convincing, but it also makes me wonder if >>>> i live >>>> > in some kind of alternate universe... we deploy on customers >>>> clusters, where >>>> > the OS, python version, java version and hadoop distro are not chosen >>>> by us. >>>> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply >>>> have >>>> > access to a single proxy machine and launch through yarn. asking them >>>> to >>>> > upgrade java is pretty much out of the question or a 6+ month ordeal. >>>> of the >>>> > 10 client clusters i can think of on the top of my head all of them >>>> are on >>>> > java 7, none are on java 8. so by doing this you would make spark 2 >>>> > basically unusable for us (unless most of them have plans of >>>> upgrading in >>>> > near term to java 8, i will ask around and report back...). >>>> > >>>> > on a side note, its particularly interesting to me that spark 2 chose >>>> to >>>> > continue support for scala 2.10, because even for us in our very >>>> constricted >>>> > client environments the scala version is something we can easily >>>> upgrade (we >>>> > just deploy a custom build of spark for the relevant scala version and >>>> > hadoop distro). and because scala is not a dependency of any hadoop >>>> distro >>>> > (so not on classpath, which i am very happy about) we can use >>>> whatever scala >>>> > version we like. also i found the upgrade path from scala 2.10 to >>>> 2.11 to be >>>> > very easy, so i have a hard time understanding why anyone would stay >>>> on >>>> > scala 2.10. and finally with scala 2.12 around the corner you really >>>> dont >>>> > want to be supporting 3 versions. so clearly i am missing something >>>> here. >>>> > >>>> > >>>> > >>>> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré < >>>> j...@nanthrax.net> >>>> > wrote: >>>> >> >>>> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end >>>> support of >>>> >> Java 7. It makes sense. >>>> >> >>>> >> Regards >>>> >> JB >>>> >> >>>> >> >>>> >> On 03/24/2016 08:27 AM, Reynold Xin wrote: >>>> >>> >>>> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I >>>> am >>>> >>> wondering if we should also just drop Java 7 support in Spark 2.0 >>>> (i.e. >>>> >>> Spark 2.0 would require Java 8 to run). >>>> >>> >>>> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), >>>> and >>>> >>> removed public downloads for JDK 7 in July 2015. In the past I've >>>> >>> actually been against dropping Java 8, but today I ran into an issue >>>> >>> with the new Dataset API not working well with Java 8 lambdas, and >>>> that >>>> >>> changed my opinion on this. >>>> >>> >>>> >>> I've been thinking more about this issue today and also talked with >>>> a >>>> >>> lot people offline to gather feedback, and I actually think the pros >>>> >>> outweighs the cons, for the following reasons (in some rough order >>>> of >>>> >>> importance): >>>> >>> >>>> >>> 1. It is complicated to test how well Spark APIs work for Java >>>> lambdas >>>> >>> if we support Java 7. Jenkins machines need to have both Java 7 and >>>> Java >>>> >>> 8 installed and we must run through a set of test suites in 7, and >>>> then >>>> >>> the lambda tests in Java 8. This complicates build >>>> environments/scripts, >>>> >>> and makes them less robust. Without good testing infrastructure, I >>>> have >>>> >>> no confidence in building good APIs for Java 8. >>>> >>> >>>> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in >>>> >>> Java 7. The primary APIs we want users to use in Spark 2.x are >>>> >>> Dataset/DataFrame, and this impacts pretty much everything from >>>> machine >>>> >>> learning to structured streaming. We have made great progress in >>>> their >>>> >>> performance through extensive use of code generation. (In many >>>> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a >>>> compiler >>>> >>> than a MapReduce or query engine.) These optimizations don't work >>>> well >>>> >>> in Java 7 due to broken code cache flushing. This problem has been >>>> fixed >>>> >>> by Oracle in Java 8. In addition, Java 8 comes with better support >>>> for >>>> >>> Unsafe and SIMD. >>>> >>> >>>> >>> 3. Scala 2.12 will come out soon, and we will want to add support >>>> for >>>> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd >>>> >>> have a fairly complicated compatibility matrix and testing >>>> >>> infrastructure. >>>> >>> >>>> >>> 4. There are libraries that I've looked into in the past that >>>> support >>>> >>> only Java 8. This is more common in high performance libraries such >>>> as >>>> >>> Aeron (a messaging library). Having to support Java 7 means we are >>>> not >>>> >>> able to use these. It is not that big of a deal right now, but will >>>> >>> become increasingly more difficult as we optimize performance. >>>> >>> >>>> >>> >>>> >>> The downside of not supporting Java 7 is also obvious. Some >>>> >>> organizations are stuck with Java 7, and they wouldn't be able to >>>> use >>>> >>> Spark 2.0 without upgrading Java. >>>> >>> >>>> >>> >>>> >> >>>> >> -- >>>> >> Jean-Baptiste Onofré >>>> >> jbono...@apache.org >>>> >> http://blog.nanthrax.net >>>> >> Talend - http://www.talend.com >>>> >> >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>> >> For additional commands, e-mail: dev-h...@spark.apache.org >>>> >> >>>> > >>>> >>> >>> >>