Re: [discuss] ending support for Java 7 in Spark 2.0
Since my original email, I've talked to a lot more users and looked at what various environments support. It is true that a lot of enterprises, and even some technology companies, are still using Java 7. One thing is that up until this date, users still can't install openjdk 8 on Ubuntu by default. I see that as an indication that it is too early to drop Java 7. Looking at the timeline, JDK release a major new version roughly every 3 years. We dropped Java 6 support one year ago, so from a timeline point of view we would be very aggressive here if we were to drop Java 7 support in Spark 2.0. Note that not dropping Java 7 support now doesn't mean we have to support Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even though Spark 1.0 started with Java 6. In terms of testing, Josh has actually improved our test infra so now we would run the Java 8 tests: https://github.com/apache/spark/pull/12073 On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin wrote: > Arguments are really convincing; new Dataset API as well as performance > > improvements is exiting, so I'm personally +1 on moving onto Java8. > > > > However, I'm afraid Tencent is one of "the organizations stuck with Java7" > > -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and > > wouldn't upgrade to Java8 until Java9 is out. > > > So: > > (non-binding) +1 on dropping scala 2.10 support > > (non-binding) -1 on dropping Java 7 support > > * as long as we figure out a practical way to run > Spark with > > JDK8 on JDK7 clusters, this -1 would then > definitely be +1 > > > Thanks ! > > On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers wrote: > >> i think that logic is reasonable, but then the same should also apply to >> scala 2.10, which is also unmaintained/unsupported at this point (basically >> has been since march 2015 except for one hotfix due to a license >> incompatibility) >> >> who wants to support scala 2.10 three years after they did the last >> maintenance release? >> >> >> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan >> wrote: >> >>> Removing compatibility (with jdk, etc) can be done with a major release- >>> given that 7 has been EOLed a while back and is now unsupported, we have to >>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now). >>> >>> Given the functionality & performance benefits of going to jdk8, future >>> enhancements relevant in 2.x timeframe ( scala, dependencies) which >>> requires it, and simplicity wrt code, test & support it looks like a good >>> checkpoint to drop jdk7 support. >>> >>> As already mentioned in the thread, existing yarn clusters are >>> unaffected if they want to continue running jdk7 and yet use >>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case >>> distribute jdk8 as archive - suboptimal). >>> I am unsure about mesos (standalone might be easier upgrade I guess ?). >>> >>> >>> Proposal is for 1.6x line to continue to be supported with critical >>> fixes; newer features will require 2.x and so jdk8 >>> >>> Regards >>> Mridul >>> >>> >>> On Thursday, March 24, 2016, Marcelo Vanzin wrote: >>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin wrote: > If you want to go down that route, you should also ask somebody who has had > experience managing a large organization's applications and try to update > Scala version. I understand both sides. But if you look at what I've been asking since the beginning, it's all about the cost and benefits of dropping support for java 1.7. The biggest argument in your original e-mail is about testing. And the testing cost is much bigger for supporting scala 2.10 than it is for supporting java 1.7. If you read one of my earlier replies, it should be even possible to just do everything in a single job - compile for java 7 and still be able to test things in 1.8, including lambdas, which seems to be the main thing you were worried about. > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin wrote: >> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin wrote: >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are >> > not >> > binary compatible, whereas JVM 7 and 8 are binary compatible except >> > certain >> > esoteric cases. >> >> True, but ask anyone who manages a large cluster how long it would >> take them to upgrade the jdk across their cluster and validate all >> their applications and everything... binary compatibility is a tiny >> drop in that bucket. >> >> -- >> Marcelo > > -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.ap
Re: explain codegen
Works for me on latest master. scala> sql("explain codegen select 'a' as a group by 1").head res3: org.apache.spark.sql.Row = [Found 2 WholeStageCodegen subtrees. == Subtree 1 / 2 == WholeStageCodegen : +- TungstenAggregate(key=[], functions=[], output=[a#10]) : +- INPUT +- Exchange SinglePartition, None +- WholeStageCodegen : +- TungstenAggregate(key=[], functions=[], output=[]) : +- INPUT +- Scan OneRowRelation[] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ /** Codegened pipeline for: /* 006 */ * TungstenAggregate(key=[], functions=[], output=[a#10]) /* 007 */ +- INPUT /* 008 */ */ /* 009 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 010 */ private Object[] references; /* 011 */ ... On Sun, Apr 3, 2016 at 9:38 PM, Jacek Laskowski wrote: > Hi, > > Looks related to the recent commit... > > Repository: spark > Updated Branches: > refs/heads/master 2262a9335 -> 1f0c5dceb > > [SPARK-14350][SQL] EXPLAIN output should be in a single cell > > Jacek > 03.04.2016 7:00 PM "Ted Yu" napisał(a): > >> Hi, >> Based on master branch refreshed today, I issued 'git clean -fdx' first. >> >> Then this command: >> build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 >> -Dhadoop.version=2.7.0 package -DskipTests >> >> I got the following error: >> >> scala> sql("explain codegen select 'a' as a group by 1").head >> org.apache.spark.sql.catalyst.parser.ParseException: >> extraneous input 'codegen' expecting {'(', 'SELECT', 'FROM', 'ADD', >> 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', >> 'DESCRIBE', 'EXPLAIN', 'LOGICAL', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', >> 'SET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'EXTENDED', 'REFRESH', >> 'CLEAR', 'CACHE', 'UNCACHE', 'FORMATTED', 'DFS', 'TRUNCATE', 'ANALYZE', >> 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', >> 'LOAD'}(line 1, pos 8) >> >> == SQL == >> explain codegen select 'a' as a group by 1 >> ^^^ >> >> Can someone shed light ? >> >> Thanks >> >
Re: explain codegen
Hi, Looks related to the recent commit... Repository: spark Updated Branches: refs/heads/master 2262a9335 -> 1f0c5dceb [SPARK-14350][SQL] EXPLAIN output should be in a single cell Jacek 03.04.2016 7:00 PM "Ted Yu" napisał(a): > Hi, > Based on master branch refreshed today, I issued 'git clean -fdx' first. > > Then this command: > build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 > -Dhadoop.version=2.7.0 package -DskipTests > > I got the following error: > > scala> sql("explain codegen select 'a' as a group by 1").head > org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'codegen' expecting {'(', 'SELECT', 'FROM', 'ADD', > 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', > 'DESCRIBE', 'EXPLAIN', 'LOGICAL', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', > 'SET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'FORMATTED', 'DFS', 'TRUNCATE', 'ANALYZE', > 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', > 'LOAD'}(line 1, pos 8) > > == SQL == > explain codegen select 'a' as a group by 1 > ^^^ > > Can someone shed light ? > > Thanks >
explain codegen
Hi, Based on master branch refreshed today, I issued 'git clean -fdx' first. Then this command: build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.0 package -DskipTests I got the following error: scala> sql("explain codegen select 'a' as a group by 1").head org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'codegen' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'LOGICAL', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'FORMATTED', 'DFS', 'TRUNCATE', 'ANALYZE', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 8) == SQL == explain codegen select 'a' as a group by 1 ^^^ Can someone shed light ? Thanks
[SQL] Dataset.map gives error: missing parameter type for expanded function?
Hi, (since 2.0.0-SNAPSHOT it's more for dev not user) With today's master I'm getting the following: scala> ds res14: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] // WHY?! scala> ds.groupBy(_._1) :26: error: missing parameter type for expanded function ((x$1) => x$1._1) ds.groupBy(_._1) ^ scala> ds.filter(_._1.size > 10) res23: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] It's even on the slide of Michael in https://youtu.be/i7l3JQRx7Qw?t=7m38s from Spark Summit East?! Am I doing something wrong? Please guide. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org