Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Nick Pentreath
+1 for this proposal - as you mention I think it's the defacto current situation anyway. Note that from a developer view it's just the user-facing API that will be only "ml" - the majority of the actual algorithms still operate on RDDs under the good currently. On Wed, 6 Apr 2016 at 05:03, Chris

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Chris Fregly
perhaps renaming to Spark ML would actually clear up code and documentation confusion? +1 for rename > On Apr 5, 2016, at 7:00 PM, Reynold Xin wrote: > > +1 > > This is a no brainer IMO. > > >> On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Reynold Xin
+1 This is a no brainer IMO. On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley wrote: > +1 By the way, the JIRA for tracking (Scala) API parity is: > https://issues.apache.org/jira/browse/SPARK-4591 > > On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-05 Thread Kostas Sakellis
>From both this and the JDK thread, I've noticed (including myself) that people have different notions of compatibility guarantees between major and minor versions. A simple question I have is: What compatibility can we break between minor vs. major releases? It might be worth getting on the same

Re: BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Ted Yu
Probably related to Java 8. I used : $ java -version java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) On Tue, Apr 5, 2016 at 6:32 PM, Jacek Laskowski wrote: > Hi Ted, > > This is a

Re: BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Jacek Laskowski
Hi Ted, This is a similar issue https://issues.apache.org/jira/browse/SPARK-12530. I've fixed today's one and am sending a pull req. My build command is as follows: ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.2 -Phive -Phive-thriftserver -DskipTests clean install I'm on Java 8 / Mac

Re: Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Josh Rosen
I finally figured out the problem: it seems that my *export JAVA_HOME=/path/to/java8/home* was somehow not affecting the javac executable that Zinc's SBT incremental compiler uses when it forks out to javac to handle Java source files. As a result, we were passing a -source 1.8 flag to the

Re: BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Ted Yu
Looking at recent https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7 builds, there was no such error. I don't see anything wrong with the code: usage = "_FUNC_(str) - " + "Returns str, with the first letter of each word in uppercase, all other letters in " + Mind

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Holden Karau
I'm very much in favor of this, the less porting work there is the better :) On Tue, Apr 5, 2016 at 5:32 PM, Joseph Bradley wrote: > +1 By the way, the JIRA for tracking (Scala) API parity is: > https://issues.apache.org/jira/browse/SPARK-4591 > > On Tue, Apr 5, 2016 at

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Joseph Bradley
+1 By the way, the JIRA for tracking (Scala) API parity is: https://issues.apache.org/jira/browse/SPARK-4591 On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia wrote: > This sounds good to me as well. The one thing we should pay attention to > is how we update the docs so

Re: Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Josh Rosen
I've reverted the bulk of the conf changes while I investigate. I think that Zinc might be handling JAVA_HOME in a weird way and am SSH'ing to Jenkins to try to reproduce the problem in isolation. On Tue, Apr 5, 2016 at 4:14 PM Ted Yu wrote: > Josh: > You may have noticed

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Matei Zaharia
This sounds good to me as well. The one thing we should pay attention to is how we update the docs so that people know to start with the spark.ml classes. Right now the docs list spark.mllib first and also seem more comprehensive in that area than in spark.ml, so maybe people naturally move

BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Jacek Laskowski
Hi, Just checked out the latest sources and got this... /Users/jacek/dev/oss/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:626: error: annotation argument needs to be a constant; found: "_FUNC_(str) - ".+("Returns str, with the first letter

Re: [STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?!

2016-04-05 Thread Jacek Laskowski
Hi Ted, Yeah, I saw the line, but forgot it's a test that may have been testing that closures should not have return. More clear now. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-05 Thread Holden Karau
One minor downside to having both 2.10 and 2.11 (and eventually 2.12) is deprecation warnings in our builds that we can't fix without introducing a wrapper/ scala version specific code. This isn't a big deal, and if we drop 2.10 in the 3-6 month time frame talked about we can cleanup those

Re: Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Ted Yu
Josh: You may have noticed the following error ( https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/566/console ): [error] javac: invalid source release: 1.8 [error] Usage: javac [error] use -help for a list of possible options On Tue, Apr 5, 2016 at 2:14 PM, Josh

Re: [STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?!

2016-04-05 Thread Ted Yu
The next line should give some clue: expectCorrectException { ssc.transform(Seq(ds), transformF) } Closure shouldn't include return. On Tue, Apr 5, 2016 at 3:40 PM, Jacek Laskowski wrote: > Hi, > > In >

[STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?!

2016-04-05 Thread Jacek Laskowski
Hi, In https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/DStreamClosureSuite.scala#L190: { return; ssc.sparkContext.emptyRDD[Int] } What is this return inside for? I don't understand the line and am about to propose a change to remove it.

Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Josh Rosen
In order to be able to run Java 8 API compatibility tests, I'm going to push a new set of Jenkins configurations for Spark's test and PR builders so that those jobs use a Java 8 JDK. I tried this once in the past and it seemed to introduce some rare, transient flakiness in certain tests, so if

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng
Yes, DB (cc'ed) is working on porting the local linear algebra library over (SPARK-13944). There are also frequent pattern mining algorithms we need to port over in order to reach feature parity. -Xiangrui On Tue, Apr 5, 2016 at 12:08 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote:

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Shivaram Venkataraman
Overall this sounds good to me. One question I have is that in addition to the ML algorithms we have a number of linear algebra (various distributed matrices) and statistical methods in the spark.mllib package. Is the plan to port or move these to the spark.ml namespace in the 2.x series ? Thanks

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Sean Owen
FWIW, all of that sounds like a good plan to me. Developing one API is certainly better than two. On Tue, Apr 5, 2016 at 7:01 PM, Xiangrui Meng wrote: > Hi all, > > More than a year ago, in Spark 1.2 we introduced the ML pipeline API built > on top of Spark SQL’s DataFrames.

Spark Streaming UI reporting a different task duration

2016-04-05 Thread Renyi Xiong
Hi TD, We noticed that Spark Streaming UI is reporting a different task duration from time to time. e.g. here's the standard output of the application which reports the duration of the longest task is about 3.3 minutes: 16/04/01 16:07:19 INFO TaskSetManager: Finished task 1077.0 in stage 0.0

Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng
Hi all, More than a year ago, in Spark 1.2 we introduced the ML pipeline API built on top of Spark SQL’s DataFrames. Since then the new DataFrame-based API has been developed under the spark.ml package, while the old RDD-based API has been developed in parallel under the spark.mllib package.

Re: Build with Thrift Server & Scala 2.11

2016-04-05 Thread Raymond Honderdors
I did a check of that i could not find that in any of the config files I also used config files that work with 1.6.1 Sent from Outlook Mobile On Tue, Apr 5, 2016 at 9:22 AM -0700, "Ted Yu" > wrote: Raymond: Did

Re: Build with Thrift Server & Scala 2.11

2016-04-05 Thread Ted Yu
Raymond: Did "namenode" appear in any of the Spark config files ? BTW Scala 2.11 is used by the default build. On Tue, Apr 5, 2016 at 6:22 AM, Raymond Honderdors < raymond.honderd...@sizmek.com> wrote: > I can see that the build is successful > > (-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0

Re: RDD Partitions not distributed evenly to executors

2016-04-05 Thread Khaled Ammar
I have a similar experience. Using 32 machines, I can see than number of tasks (partitions) assigned to executors (machines) is not even. Moreover, the distribution change every stage (iteration). I wonder why Spark needs to move partitions around any way, should not the scheduler reduce network

Re: What influences the space complexity of Spark operations?

2016-04-05 Thread Steve Johnston
Submitted: SPARK-14389 - OOM during BroadcastNestedLoopJoin. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-influences-the-space-complexity-of-Spark-operations-tp16944p17029.html Sent from the Apache Spark Developers List mailing list archive at

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-04-05 Thread Reynold Xin
Hi Sean, See http://www.oracle.com/technetwork/java/eol-135779.html Java 7 hasn't EOLed yet. If you look at support you can get from Oracle, it's actually goes to 2019. And you can even get more support after that. Spark has always maintained great backward compatibility with other systems, way

RE: Build with Thrift Server & Scala 2.11

2016-04-05 Thread Raymond Honderdors
Here is the error after build with scala 2.10 “ Spark Command: /usr/lib/jvm/java-1.8.0/bin/java -cp /home/raymond.honderdors/Documents/IdeaProjects/spark/conf/:/home/raymond.honderdors/Documents/IdeaProjects/spark/assembly/target/scala-2.10/jars/* -Xms5g -Xmx5g

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-04-05 Thread Sean Owen
Following https://github.com/apache/spark/pull/12165#issuecomment-205791222 I'd like to make a point about process and then answer points below. We have this funny system where anyone can propose a change, and any of a few people can veto a change unilaterally. The latter rarely comes up. 9

RE: Build with Thrift Server & Scala 2.11

2016-04-05 Thread Raymond Honderdors
I can see that the build is successful (-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver –Dscala-2.11 -DskipTests clean package) the documents page it still says that “ Building With Hive and JDBC Support To enable Hive integration for Spark SQL along with its JDBC server

Re: Build with Thrift Server & Scala 2.11

2016-04-05 Thread Reynold Xin
What do you mean? The Jenkins build for Spark uses 2.11 and also builds the thrift server. On Tuesday, April 5, 2016, Raymond Honderdors wrote: > Is anyone looking into this one, Build with Thrift Server & Scala 2.11? > > I9f so when can we expect it > > > >

Build with Thrift Server & Scala 2.11

2016-04-05 Thread Raymond Honderdors
Is anyone looking into this one, Build with Thrift Server & Scala 2.11? I9f so when can we expect it Raymond Honderdors Team Lead Analytics BI Business Intelligence Developer raymond.honderd...@sizmek.com T +972.7325.3569 Herzliya [Read