date:20150414

spark git commit: [SPARK-6731] Bump version of apache commons-math3

2015-04-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 77eeb10fd -> 628a72f70


[SPARK-6731] Bump version of apache commons-math3

Version 3.1.1 is two years old and the newer version includes
approximate percentile statistics (among other things).

Author: Punyashloka Biswal 

Closes #5380 from punya/patch-1 and squashes the following commits:

226622b [Punyashloka Biswal] Bump version of apache commons-math3


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/628a72f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/628a72f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/628a72f7

Branch: refs/heads/master
Commit: 628a72f70ed06b8d7aee81cfb16070eb2c87b9cd
Parents: 77eeb10
Author: Punyashloka Biswal 
Authored: Tue Apr 14 11:43:06 2015 +0100
Committer: Sean Owen 
Committed: Tue Apr 14 11:43:06 2015 +0100

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/628a72f7/pom.xml
--
diff --git a/pom.xml b/pom.xml
index d8881c2..0b8d664 100644
--- a/pom.xml
+++ b/pom.xml
@@ -147,7 +147,7 @@
 1.8.3
 1.1.0
 4.2.6
-3.1.1
+3.4.1
 
${project.build.directory}/spark-test-classpath.txt
 2.10.4
 2.10


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

2015-04-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 628a72f70 -> 51b306b93


SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

Author: Erik van Oosten 

Closes #5489 from erikvanoosten/master and squashes the following commits:

1c91954 [Erik van Oosten] Rewrote double range matcher to an exact equality 
assert (SPARK-6878)
f1708c9 [Erik van Oosten] Fix for sum on empty RDD fails with exception 
(SPARK-6878)


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/51b306b9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/51b306b9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/51b306b9

Branch: refs/heads/master
Commit: 51b306b930cfe03ad21af72a3a6ef31e6e626235
Parents: 628a72f
Author: Erik van Oosten 
Authored: Tue Apr 14 12:39:56 2015 +0100
Committer: Sean Owen 
Committed: Tue Apr 14 12:39:56 2015 +0100

--
 .../main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala   | 2 +-
 core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala  | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/51b306b9/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
index 29ca3e9..843a893 100644
--- a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
@@ -31,7 +31,7 @@ import org.apache.spark.util.StatCounter
 class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable {
   /** Add up the elements in this RDD. */
   def sum(): Double = {
-self.reduce(_ + _)
+self.fold(0.0)(_ + _)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/51b306b9/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala 
b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
index 9707938..01039b9 100644
--- a/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
+++ b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
@@ -22,6 +22,12 @@ import org.scalatest.FunSuite
 import org.apache.spark._
 
 class DoubleRDDSuite extends FunSuite with SharedSparkContext {
+  test("sum") {
+assert(sc.parallelize(Seq.empty[Double]).sum() === 0.0)
+assert(sc.parallelize(Seq(1.0)).sum() === 1.0)
+assert(sc.parallelize(Seq(1.0, 2.0)).sum() === 3.0)
+  }
+
   // Verify tests on the histogram functionality. We test with both evenly
   // and non-evenly spaced buckets as the bucket lookup function changes.
   test("WorksOnEmpty") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

2015-04-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 8e5caa227 -> 2954468b0


SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

Author: Erik van Oosten 

Closes #5489 from erikvanoosten/master and squashes the following commits:

1c91954 [Erik van Oosten] Rewrote double range matcher to an exact equality 
assert (SPARK-6878)
f1708c9 [Erik van Oosten] Fix for sum on empty RDD fails with exception 
(SPARK-6878)

(cherry picked from commit 51b306b930cfe03ad21af72a3a6ef31e6e626235)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2954468b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2954468b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2954468b

Branch: refs/heads/branch-1.3
Commit: 2954468b028c89406301dfaedf4c4c0b427f048b
Parents: 8e5caa2
Author: Erik van Oosten 
Authored: Tue Apr 14 12:39:56 2015 +0100
Committer: Sean Owen 
Committed: Tue Apr 14 12:40:11 2015 +0100

--
 .../main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala   | 2 +-
 core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala  | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2954468b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
index a3a03ef..b1d3e39 100644
--- a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
@@ -31,7 +31,7 @@ import org.apache.spark.util.StatCounter
 class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable {
   /** Add up the elements in this RDD. */
   def sum(): Double = {
-self.reduce(_ + _)
+self.fold(0.0)(_ + _)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/2954468b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala 
b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
index 787b06e..b1605da 100644
--- a/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
+++ b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
@@ -22,6 +22,12 @@ import org.scalatest.FunSuite
 import org.apache.spark._
 
 class DoubleRDDSuite extends FunSuite with SharedSparkContext {
+  test("sum") {
+assert(sc.parallelize(Seq.empty[Double]).sum() === 0.0)
+assert(sc.parallelize(Seq(1.0)).sum() === 1.0)
+assert(sc.parallelize(Seq(1.0, 2.0)).sum() === 3.0)
+  }
+
   // Verify tests on the histogram functionality. We test with both evenly
   // and non-evenly spaced buckets as the bucket lookup function changes.
   test("WorksOnEmpty") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

2015-04-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 daec1c635 -> 899ffdcc0


SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

Author: Erik van Oosten 

Closes #5489 from erikvanoosten/master and squashes the following commits:

1c91954 [Erik van Oosten] Rewrote double range matcher to an exact equality 
assert (SPARK-6878)
f1708c9 [Erik van Oosten] Fix for sum on empty RDD fails with exception 
(SPARK-6878)

(cherry picked from commit 51b306b930cfe03ad21af72a3a6ef31e6e626235)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/899ffdcc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/899ffdcc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/899ffdcc

Branch: refs/heads/branch-1.2
Commit: 899ffdcc06bf0fcd40387d73e8a3fddfc72cb33a
Parents: daec1c6
Author: Erik van Oosten 
Authored: Tue Apr 14 12:39:56 2015 +0100
Committer: Sean Owen 
Committed: Tue Apr 14 12:40:23 2015 +0100

--
 .../main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala   | 2 +-
 core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala  | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/899ffdcc/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
index e66c06e..04b52d9 100644
--- a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
@@ -32,7 +32,7 @@ import org.apache.spark.util.StatCounter
 class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable {
   /** Add up the elements in this RDD. */
   def sum(): Double = {
-self.reduce(_ + _)
+self.fold(0.0)(_ + _)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/899ffdcc/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala 
b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
index e29ac0c..7d1ed06 100644
--- a/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
+++ b/core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala
@@ -23,6 +23,12 @@ import org.apache.spark._
 import org.apache.spark.SparkContext._
 
 class DoubleRDDSuite extends FunSuite with SharedSparkContext {
+  test("sum") {
+assert(sc.parallelize(Seq.empty[Double]).sum() === 0.0)
+assert(sc.parallelize(Seq(1.0)).sum() === 1.0)
+assert(sc.parallelize(Seq(1.0, 2.0)).sum() === 3.0)
+  }
+
   // Verify tests on the histogram functionality. We test with both evenly
   // and non-evenly spaced buckets as the bucket lookup function changes.
   test("WorksOnEmpty") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6081] Support fetching http/https uris in driver runner.

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 51b306b93 -> 320bca450


[SPARK-6081] Support fetching http/https uris in driver runner.

Currently if passed uris such as http/https, it won't able to fetch them as it 
only calls HadoopFs get.
This fix utilizes the existing util method to fetch remote uris as well.

Author: Timothy Chen 

Closes #4832 from tnachen/driver_remote and squashes the following commits:

aa52cd6 [Timothy Chen] Support fetching remote uris in driver runner.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/320bca45
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/320bca45
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/320bca45

Branch: refs/heads/master
Commit: 320bca4508e890b874c2eb7abb76a30ef14c932f
Parents: 51b306b
Author: Timothy Chen 
Authored: Tue Apr 14 11:48:12 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 11:49:04 2015 -0700

--
 .../spark/deploy/worker/DriverRunner.scala  | 21 
 .../org/apache/spark/deploy/worker/Worker.scala |  3 ++-
 .../apache/spark/deploy/JsonProtocolSuite.scala |  7 ---
 .../spark/deploy/worker/DriverRunnerTest.scala  |  7 ---
 4 files changed, 23 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/320bca45/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
index e0948e1..ef7a703 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
@@ -24,14 +24,14 @@ import scala.collection.JavaConversions._
 import akka.actor.ActorRef
 import com.google.common.base.Charsets.UTF_8
 import com.google.common.io.Files
-import org.apache.hadoop.fs.{FileUtil, Path}
+import org.apache.hadoop.fs.Path
 
-import org.apache.spark.{Logging, SparkConf}
+import org.apache.spark.{Logging, SparkConf, SecurityManager}
 import org.apache.spark.deploy.{DriverDescription, SparkHadoopUtil}
 import org.apache.spark.deploy.DeployMessages.DriverStateChanged
 import org.apache.spark.deploy.master.DriverState
 import org.apache.spark.deploy.master.DriverState.DriverState
-import org.apache.spark.util.{Clock, SystemClock}
+import org.apache.spark.util.{Utils, Clock, SystemClock}
 
 /**
  * Manages the execution of one driver, including automatically restarting the 
driver on failure.
@@ -44,7 +44,8 @@ private[deploy] class DriverRunner(
 val sparkHome: File,
 val driverDesc: DriverDescription,
 val worker: ActorRef,
-val workerUrl: String)
+val workerUrl: String,
+val securityManager: SecurityManager)
   extends Logging {
 
   @volatile private var process: Option[Process] = None
@@ -136,12 +137,9 @@ private[deploy] class DriverRunner(
* Will throw an exception if there are errors downloading the jar.
*/
   private def downloadUserJar(driverDir: File): String = {
-
 val jarPath = new Path(driverDesc.jarUrl)
 
 val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
-val jarFileSystem = jarPath.getFileSystem(hadoopConf)
-
 val destPath = new File(driverDir.getAbsolutePath, jarPath.getName)
 val jarFileName = jarPath.getName
 val localJarFile = new File(driverDir, jarFileName)
@@ -149,7 +147,14 @@ private[deploy] class DriverRunner(
 
 if (!localJarFile.exists()) { // May already exist if running multiple 
workers on one node
   logInfo(s"Copying user jar $jarPath to $destPath")
-  FileUtil.copy(jarFileSystem, jarPath, destPath, false, hadoopConf)
+  Utils.fetchFile(
+driverDesc.jarUrl,
+driverDir,
+conf,
+securityManager,
+hadoopConf,
+System.currentTimeMillis(),
+useCache = false)
 }
 
 if (!localJarFile.exists()) { // Verify copy succeeded

http://git-wip-us.apache.org/repos/asf/spark/blob/320bca45/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
index c4c24a7..3ee2eb6 100755
--- a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
@@ -436,7 +436,8 @@ private[worker] class Worker(
 sparkHome,
 driverDesc.copy(command = 
Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
 self,
-akkaUrl)
+akkaUrl,
+securityMgr)
   drivers(driverId) = driver

spark git commit: [SPARK-6894]spark.executor.extraLibraryOptions => spark.executor.extraLibraryPath

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 320bca450 -> f63b44a5c


[SPARK-6894]spark.executor.extraLibraryOptions => 
spark.executor.extraLibraryPath

https://issues.apache.org/jira/browse/SPARK-6894

cc vanzin

Author: WangTaoTheTonic 

Closes #5506 from WangTaoTheTonic/SPARK-6894 and squashes the following commits:

4b7ced7 [WangTaoTheTonic] spark.executor.extraLibraryOptions => 
spark.executor.extraLibraryPath


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f63b44a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f63b44a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f63b44a5

Branch: refs/heads/master
Commit: f63b44a5c201d9678738a906462be9a6d7e3e8f8
Parents: 320bca4
Author: WangTaoTheTonic 
Authored: Tue Apr 14 12:02:11 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 12:02:11 2015 -0700

--
 .../src/main/java/org/apache/spark/launcher/SparkLauncher.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f63b44a5/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
--
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java 
b/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
index b566507..d4cfeac 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
@@ -52,7 +52,7 @@ public class SparkLauncher {
   /** Configuration key for the executor VM options. */
   public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = 
"spark.executor.extraJavaOptions";
   /** Configuration key for the executor native library path. */
-  public static final String EXECUTOR_EXTRA_LIBRARY_PATH = 
"spark.executor.extraLibraryOptions";
+  public static final String EXECUTOR_EXTRA_LIBRARY_PATH = 
"spark.executor.extraLibraryPath";
   /** Configuration key for the number of executor CPU cores. */
   public static final String EXECUTOR_CORES = "spark.executor.cores";
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master f63b44a5c -> dcf8a9f33


[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled 
due to previous stage failure

Fixed null check when all the dependent stages are cancelled due to previous 
stage failure. This happens when one of the executor node goes down and all the 
dependent stages are cancelled.

Author: pankaj arora 

Closes #5494 from pankajarora12/NEWBRANCH and squashes the following commits:

55ba5e3 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the 
dependent stages are cancelled due to previous stage failure
4575720 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the 
dependent stages are cancelled due to previous stage failure


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dcf8a9f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dcf8a9f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dcf8a9f3

Branch: refs/heads/master
Commit: dcf8a9f331c6193a62bbc9282bdc99663e23ca19
Parents: f63b44a
Author: pankaj arora 
Authored: Tue Apr 14 12:06:46 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 12:07:08 2015 -0700

--
 .../main/scala/org/apache/spark/scheduler/DAGScheduler.scala  | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dcf8a9f3/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
--
diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index 508fe7b..4a32f89 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -818,12 +818,7 @@ class DAGScheduler(
   }
 }
 
-val properties = if (jobIdToActiveJob.contains(jobId)) {
-  jobIdToActiveJob(stage.jobId).properties
-} else {
-  // this stage will be assigned to "default" pool
-  null
-}
+val properties = jobIdToActiveJob.get(stage.jobId).map(_.properties).orNull
 
 runningStages += stage
 // SparkListenerStageSubmitted should be posted before testing whether 
tasks are


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-2033] Automatically cleanup checkpoint

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master dcf8a9f33 -> 25998e4d7


[SPARK-2033] Automatically cleanup checkpoint

Author: GuoQiang Li 

Closes #855 from witgo/cleanup_checkpoint_date and squashes the following 
commits:

1649850 [GuoQiang Li] review commit
c0087e0 [GuoQiang Li] Automatically cleanup checkpoint


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/25998e4d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/25998e4d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/25998e4d

Branch: refs/heads/master
Commit: 25998e4d73bcc95ac85d9af71adfdc726ec89568
Parents: dcf8a9f
Author: GuoQiang Li 
Authored: Tue Apr 14 12:56:47 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 12:56:47 2015 -0700

--
 .../scala/org/apache/spark/ContextCleaner.scala | 44 +-
 .../apache/spark/rdd/RDDCheckpointData.scala| 27 +--
 .../org/apache/spark/ContextCleanerSuite.scala  | 49 +++-
 3 files changed, 102 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/25998e4d/core/src/main/scala/org/apache/spark/ContextCleaner.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala 
b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
index 9b05c96..715b259 100644
--- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala
+++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
@@ -22,7 +22,7 @@ import java.lang.ref.{ReferenceQueue, WeakReference}
 import scala.collection.mutable.{ArrayBuffer, SynchronizedBuffer}
 
 import org.apache.spark.broadcast.Broadcast
-import org.apache.spark.rdd.RDD
+import org.apache.spark.rdd.{RDDCheckpointData, RDD}
 import org.apache.spark.util.Utils
 
 /**
@@ -33,6 +33,7 @@ private case class CleanRDD(rddId: Int) extends CleanupTask
 private case class CleanShuffle(shuffleId: Int) extends CleanupTask
 private case class CleanBroadcast(broadcastId: Long) extends CleanupTask
 private case class CleanAccum(accId: Long) extends CleanupTask
+private case class CleanCheckpoint(rddId: Int) extends CleanupTask
 
 /**
  * A WeakReference associated with a CleanupTask.
@@ -94,12 +95,12 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
   @volatile private var stopped = false
 
   /** Attach a listener object to get information of when objects are cleaned. 
*/
-  def attachListener(listener: CleanerListener) {
+  def attachListener(listener: CleanerListener): Unit = {
 listeners += listener
   }
 
   /** Start the cleaner. */
-  def start() {
+  def start(): Unit = {
 cleaningThread.setDaemon(true)
 cleaningThread.setName("Spark Context Cleaner")
 cleaningThread.start()
@@ -108,7 +109,7 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
   /**
* Stop the cleaning thread and wait until the thread has finished running 
its current task.
*/
-  def stop() {
+  def stop(): Unit = {
 stopped = true
 // Interrupt the cleaning thread, but wait until the current task has 
finished before
 // doing so. This guards against the race condition where a cleaning 
thread may
@@ -121,7 +122,7 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
   }
 
   /** Register a RDD for cleanup when it is garbage collected. */
-  def registerRDDForCleanup(rdd: RDD[_]) {
+  def registerRDDForCleanup(rdd: RDD[_]): Unit = {
 registerForCleanup(rdd, CleanRDD(rdd.id))
   }
 
@@ -130,17 +131,22 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
   }
 
   /** Register a ShuffleDependency for cleanup when it is garbage collected. */
-  def registerShuffleForCleanup(shuffleDependency: ShuffleDependency[_, _, _]) 
{
+  def registerShuffleForCleanup(shuffleDependency: ShuffleDependency[_, _, 
_]): Unit = {
 registerForCleanup(shuffleDependency, 
CleanShuffle(shuffleDependency.shuffleId))
   }
 
   /** Register a Broadcast for cleanup when it is garbage collected. */
-  def registerBroadcastForCleanup[T](broadcast: Broadcast[T]) {
+  def registerBroadcastForCleanup[T](broadcast: Broadcast[T]): Unit = {
 registerForCleanup(broadcast, CleanBroadcast(broadcast.id))
   }
 
+  /** Register a RDDCheckpointData for cleanup when it is garbage collected. */
+  def registerRDDCheckpointDataForCleanup[T](rdd: RDD[_], parentId: Int): Unit 
= {
+registerForCleanup(rdd, CleanCheckpoint(parentId))
+  }
+
   /** Register an object for cleanup. */
-  private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask) {
+  private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask): 
Unit = {
 referenceBuffer += new CleanupTaskWeakReference(task, objectForCleanup, 
referenceQueu

spark git commit: SPARK-1706: Allow multiple executors per worker in Standalone mode

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 25998e4d7 -> 8f8dc45f6


SPARK-1706: Allow multiple executors per worker in Standalone mode

resubmit of https://github.com/apache/spark/pull/636  for a totally different 
algorithm

https://issues.apache.org/jira/browse/SPARK-1706

In current implementation, the user has to start multiple workers in a server 
for starting multiple executors in a server, which introduces additional 
overhead due to the more JVM processes...

In this patch, I changed the scheduling logic in master to enable the user to 
start multiple executor processes within the same JVM process.

1. user configure spark.executor.maxCoreNumPerExecutor to suggest the maximum 
core he/she would like to allocate to each executor

2. Master assigns the executors to the workers with the major consideration on 
the memoryPerExecutor and the worker.freeMemory, and tries to allocate as many 
as possible cores to the executor ```min(min(memoryPerExecutor, 
worker.freeCore), maxLeftCoreToAssign)``` where ```maxLeftCoreToAssign = 
maxExecutorCanAssign * maxCoreNumPerExecutor```

---

Other small changes include

change memoryPerSlave in ApplicationDescription to memoryPerExecutor, as 
"Slave" is overrided to represent both worker and executor in the documents... 
(we have some discussion on this before?)

Author: CodingCat 

Closes #731 from CodingCat/SPARK-1706-2 and squashes the following commits:

6dee808 [CodingCat] change filter predicate
fbeb7e5 [CodingCat] address the comments
940cb42 [CodingCat] avoid unnecessary allocation
b8ca561 [CodingCat] revert a change
45967b4 [CodingCat] remove unused method
2eeff77 [CodingCat] stylistic fixes
12a1b32 [CodingCat] change the semantic of coresPerExecutor to exact core number
f035423 [CodingCat] stylistic fix
d9c1685 [CodingCat] remove unused var
f595bd6 [CodingCat] recover some unintentional changes
63b3df9 [CodingCat] change the description of the parameter in the submit script
4cf61f1 [CodingCat] improve the code and docs
ff011e2 [CodingCat] start multiple executors on the worker by rewriting 
startExeuctor logic
2c2bcc5 [CodingCat] fix wrong usage info
497ec2c [CodingCat] address andrew's comments
878402c [CodingCat] change the launching executor code
f64a28d [CodingCat] typo fix
387f4ec [CodingCat] bug fix
35c462c [CodingCat] address Andrew's comments
0b64fea [CodingCat] fix compilation issue
19d3da7 [CodingCat] address the comments
5b81466 [CodingCat] remove outdated comments
ec7d421 [CodingCat] test commit
e5efabb [CodingCat] more java docs and consolidate canUse function
a26096d [CodingCat] stylistic fix
a5d629a [CodingCat] java doc
b34ec0c [CodingCat] make master support multiple executors per worker


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8f8dc45f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8f8dc45f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8f8dc45f

Branch: refs/heads/master
Commit: 8f8dc45f6d4c8d7b740eaa3d2ea09d0b531af9dd
Parents: 25998e4
Author: CodingCat 
Authored: Tue Apr 14 13:32:06 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 13:32:06 2015 -0700

--
 .../spark/deploy/ApplicationDescription.scala   |   9 +-
 .../org/apache/spark/deploy/JsonProtocol.scala  |   4 +-
 .../org/apache/spark/deploy/SparkSubmit.scala   |   2 +
 .../spark/deploy/SparkSubmitArguments.scala |   5 +-
 .../spark/deploy/master/ApplicationInfo.scala   |   8 +-
 .../org/apache/spark/deploy/master/Master.scala | 117 ++-
 .../deploy/master/ui/ApplicationPage.scala  |   2 +-
 .../spark/deploy/master/ui/MasterPage.scala |   4 +-
 .../cluster/SparkDeploySchedulerBackend.scala   |   7 +-
 docs/configuration.md   |  11 ++
 10 files changed, 96 insertions(+), 73 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8f8dc45f/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala 
b/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
index b7ae9c1..ae99432 100644
--- a/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
@@ -22,12 +22,13 @@ import java.net.URI
 private[spark] class ApplicationDescription(
 val name: String,
 val maxCores: Option[Int],
-val memoryPerSlave: Int,
+val memoryPerExecutorMB: Int,
 val command: Command,
 var appUiUrl: String,
 val eventLogDir: Option[URI] = None,
 // short name of compression codec used when writing event logs, if any 
(e.g. lzf)
-val eventLogCodec: Option[

spark git commit: [SPARK-6700] [yarn] Re-enable flaky test.

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 8f8dc45f6 -> b075e4b72


[SPARK-6700] [yarn] Re-enable flaky test.

Test runs have been successful on jenkins. So let's re-enable the test and look 
out for any failures, and fix things appropriately.

Author: Marcelo Vanzin 

Closes #5459 from vanzin/SPARK-6700 and squashes the following commits:

2ead85b [Marcelo Vanzin] WIP: re-enable flaky test to catch failure in jenkins.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b075e4b7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b075e4b7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b075e4b7

Branch: refs/heads/master
Commit: b075e4b720221a8204cae93468065a6708348830
Parents: 8f8dc45
Author: Marcelo Vanzin 
Authored: Tue Apr 14 13:34:44 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 13:34:44 2015 -0700

--
 .../test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b075e4b7/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
--
diff --git 
a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala 
b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
index c06c010..76952e3 100644
--- a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
+++ b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
@@ -144,7 +144,7 @@ class YarnClusterSuite extends FunSuite with 
BeforeAndAfterAll with Matchers wit
   }
 
   // Enable this once fix SPARK-6700
-  ignore("run Python application in yarn-cluster mode") {
+  test("run Python application in yarn-cluster mode") {
 val primaryPyFile = new File(tempDir, "test.py")
 Files.write(TEST_PYFILE, primaryPyFile, UTF_8)
 val pyFile = new File(tempDir, "test2.py")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

2015-04-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master b075e4b72 -> 6adb8bcbf


[SPARK-6905] Upgrade to snappy-java 1.1.1.7

We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a 
fix for a bug that results in worse compression in SnappyOutputStream (see 
https://github.com/xerial/snappy-java/issues/100).

Author: Josh Rosen 

Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:

f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6adb8bcb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6adb8bcb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6adb8bcb

Branch: refs/heads/master
Commit: 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8
Parents: b075e4b
Author: Josh Rosen 
Authored: Tue Apr 14 13:40:07 2015 -0700
Committer: Josh Rosen 
Committed: Tue Apr 14 13:40:07 2015 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6adb8bcb/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 0b8d664..261292d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -156,7 +156,7 @@
 3.6.3
 1.8.8
 2.4.4
-1.1.1.6
+1.1.1.7
 1.1.2
 
 ${java.home}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5808] [build] Package pyspark files in sbt assembly.

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 6adb8bcbf -> 65774370a


[SPARK-5808] [build] Package pyspark files in sbt assembly.

This turned out to be more complicated than I wanted because the
layout of python/ doesn't really follow the usual maven conventions.
So some extra code is needed to copy just the right things.

Author: Marcelo Vanzin 

Closes #5461 from vanzin/SPARK-5808 and squashes the following commits:

7153dac [Marcelo Vanzin] Only try to create resource dir if it doesn't already 
exist.
ee90e84 [Marcelo Vanzin] [SPARK-5808] [build] Package pyspark files in sbt 
assembly.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/65774370
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/65774370
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/65774370

Branch: refs/heads/master
Commit: 65774370a1275e25cd8a3357e397d116767793a9
Parents: 6adb8bc
Author: Marcelo Vanzin 
Authored: Tue Apr 14 13:41:38 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 13:41:38 2015 -0700

--
 project/SparkBuild.scala | 60 ++-
 1 file changed, 59 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/65774370/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 5f51f4b..09b4976 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -15,7 +15,7 @@
  * limitations under the License.
  */
 
-import java.io.File
+import java.io._
 
 import scala.util.Properties
 import scala.collection.JavaConversions._
@@ -166,6 +166,9 @@ object SparkBuild extends PomBuild {
   /* Enable Assembly for all assembly projects */
   assemblyProjects.foreach(enable(Assembly.settings))
 
+  /* Package pyspark artifacts in the main assembly. */
+  enable(PySparkAssembly.settings)(assembly)
+
   /* Enable unidoc only for the root spark project */
   enable(Unidoc.settings)(spark)
 
@@ -316,6 +319,7 @@ object Hive {
 }
 
 object Assembly {
+  import sbtassembly.AssemblyUtils._
   import sbtassembly.Plugin._
   import AssemblyKeys._
 
@@ -347,6 +351,60 @@ object Assembly {
   )
 }
 
+object PySparkAssembly {
+  import sbtassembly.Plugin._
+  import AssemblyKeys._
+
+  lazy val settings = Seq(
+unmanagedJars in Compile += { BuildCommons.sparkHome / 
"python/lib/py4j-0.8.2.1-src.zip" },
+// Use a resource generator to copy all .py files from python/pyspark into 
a managed directory
+// to be included in the assembly. We can't just add "python/" to the 
assembly's resource dir
+// list since that will copy unneeded / unwanted files.
+resourceGenerators in Compile <+= resourceManaged in Compile map { outDir: 
File =>
+  val dst = new File(outDir, "pyspark")
+  if (!dst.isDirectory()) {
+require(dst.mkdirs())
+  }
+
+  val src = new File(BuildCommons.sparkHome, "python/pyspark")
+  copy(src, dst)
+}
+  )
+
+  private def copy(src: File, dst: File): Seq[File] = {
+src.listFiles().flatMap { f =>
+  val child = new File(dst, f.getName())
+  if (f.isDirectory()) {
+child.mkdir()
+copy(f, child)
+  } else if (f.getName().endsWith(".py")) {
+var in: Option[FileInputStream] = None
+var out: Option[FileOutputStream] = None
+try {
+  in = Some(new FileInputStream(f))
+  out = Some(new FileOutputStream(child))
+
+  val bytes = new Array[Byte](1024)
+  var read = 0
+  while (read >= 0) {
+read = in.get.read(bytes)
+if (read > 0) {
+  out.get.write(bytes, 0, read)
+}
+  }
+
+  Some(child)
+} finally {
+  in.foreach(_.close())
+  out.foreach(_.close())
+}
+  } else {
+None
+  }
+}
+  }
+}
+
 object Unidoc {
 
   import BuildCommons._


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

2015-04-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 2954468b0 -> db2154d7d


[SPARK-6905] Upgrade to snappy-java 1.1.1.7

We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a 
fix for a bug that results in worse compression in SnappyOutputStream (see 
https://github.com/xerial/snappy-java/issues/100).

Author: Josh Rosen 

Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:

f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.

(cherry picked from commit 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8)
Signed-off-by: Josh Rosen 

Conflicts:
pom.xml


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/db2154d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/db2154d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/db2154d7

Branch: refs/heads/branch-1.3
Commit: db2154d7d982c15b63cf6911405ec23a9506a96a
Parents: 2954468
Author: Josh Rosen 
Authored: Tue Apr 14 13:40:07 2015 -0700
Committer: Josh Rosen 
Committed: Tue Apr 14 13:41:50 2015 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/db2154d7/pom.xml
--
diff --git a/pom.xml b/pom.xml
index a599ad4..5cd21f1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -155,7 +155,7 @@
 3.6.3
 1.8.8
 2.4.4
-1.1.1.6
+1.1.1.7

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

2015-04-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 899ffdcc0 -> 3c13936aa


[SPARK-6905] Upgrade to snappy-java 1.1.1.7

We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a 
fix for a bug that results in worse compression in SnappyOutputStream (see 
https://github.com/xerial/snappy-java/issues/100).

Author: Josh Rosen 

Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:

f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.

(cherry picked from commit 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8)
Signed-off-by: Josh Rosen 

Conflicts:
pom.xml


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3c13936a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3c13936a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3c13936a

Branch: refs/heads/branch-1.2
Commit: 3c13936aa031662f7f372842979ebba56e107a0e
Parents: 899ffdc
Author: Josh Rosen 
Authored: Tue Apr 14 13:40:07 2015 -0700
Committer: Josh Rosen 
Committed: Tue Apr 14 13:43:53 2015 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3c13936a/pom.xml
--
diff --git a/pom.xml b/pom.xml
index eef8540..8e49561 100644
--- a/pom.xml
+++ b/pom.xml
@@ -413,7 +413,7 @@
   
 org.xerial.snappy
 snappy-java
-1.1.1.6
+1.1.1.7
   
   
 net.jpountz.lz4


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6769][YARN][TEST] Usage of the ListenerBus in YarnClusterSuite is wrong

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 65774370a -> 4d4b24927


[SPARK-6769][YARN][TEST] Usage of the ListenerBus in YarnClusterSuite is wrong

In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle 
ExecutorAddedEvent as follows.

```
private class SaveExecutorInfo extends SparkListener {
  val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()

  override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
addedExecutorInfos(executor.executorId) = executor.executorInfo
  }
}

...

listener = new SaveExecutorInfo
val sc = new SparkContext(new SparkConf()
  .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and 
$dollarSigns"))
sc.addSparkListener(listener)
val status = new File(args(0))
var result = "failure"
try {
  val data = sc.parallelize(1 to 4, 4).collect().toSet
  assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
  data should be (Set(1, 2, 3, 4))
  result = "success"
} finally {
  sc.stop()
  Files.write(result, status, UTF_8)
}
```

But, the usage is wrong because Executors will spawn during initializing 
SparkContext and SparkContext#addSparkListener should be invoked after the 
initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle 
ExecutorAddedEvent.

Following code refers the result of the handling ExecutorAddedEvent. Because of 
the reason above, we cannot reach the assertion.

```
// verify log urls are present
listener.addedExecutorInfos.values.foreach { info =>
  assert(info.logUrlMap.nonEmpty)
}
```

Author: Kousuke Saruta 

Closes #5417 from sarutak/SPARK-6769 and squashes the following commits:

8adc8ba [Kousuke Saruta] Fixed compile error
e258530 [Kousuke Saruta] Fixed style
591cf3e [Kousuke Saruta] Fixed style
48ec89a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark 
into SPARK-6769
860c965 [Kousuke Saruta] Simplified code
207d325 [Kousuke Saruta] Added findListenersByClass method to ListenerBus
2408c84 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark 
into SPARK-6769
2d7e409 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark 
into SPARK-6769
3874adf [Kousuke Saruta] Fixed the usage of listener bus in 
LogUrlsStandaloneSuite
153a91b [Kousuke Saruta] Fixed the usage of listener bus in YarnClusterSuite


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4d4b2492
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4d4b2492
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4d4b2492

Branch: refs/heads/master
Commit: 4d4b24927417b2c17810e94d6d46c37491c68869
Parents: 6577437
Author: Kousuke Saruta 
Authored: Tue Apr 14 14:00:49 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 14:01:55 2015 -0700

--
 .../org/apache/spark/util/ListenerBus.scala |  8 
 .../spark/deploy/LogUrlsStandaloneSuite.scala   | 20 +++-
 .../spark/deploy/yarn/YarnClusterSuite.scala| 17 ++---
 3 files changed, 29 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4d4b2492/core/src/main/scala/org/apache/spark/util/ListenerBus.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/ListenerBus.scala 
b/core/src/main/scala/org/apache/spark/util/ListenerBus.scala
index d60b8b9..a725767 100644
--- a/core/src/main/scala/org/apache/spark/util/ListenerBus.scala
+++ b/core/src/main/scala/org/apache/spark/util/ListenerBus.scala
@@ -19,9 +19,12 @@ package org.apache.spark.util
 
 import java.util.concurrent.CopyOnWriteArrayList
 
+import scala.collection.JavaConversions._
+import scala.reflect.ClassTag
 import scala.util.control.NonFatal
 
 import org.apache.spark.Logging
+import org.apache.spark.scheduler.SparkListener
 
 /**
  * An event bus which posts events to its listeners.
@@ -64,4 +67,9 @@ private[spark] trait ListenerBus[L <: AnyRef, E] extends 
Logging {
*/
   def onPostEvent(listener: L, event: E): Unit
 
+  private[spark] def findListenersByClass[T <: L : ClassTag](): Seq[T] = {
+val c = implicitly[ClassTag[T]].runtimeClass
+listeners.filter(_.getClass == c).map(_.asInstanceOf[T]).toSeq
+  }
+
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/4d4b2492/core/src/test/scala/org/apache/spark/deploy/LogUrlsStandaloneSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/LogUrlsStandaloneSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/LogUrlsStandaloneSuite.scala
index 9cdb428..c93d16f 100644
--- a/core/src/test/scala/org/apache/spark/deploy/LogUrlsStandaloneSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/L

spark git commit: Revert "[SPARK-6352] [SQL] Add DirectParquetOutputCommitter"

2015-04-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 4d4b24927 -> a76b921a9


Revert "[SPARK-6352] [SQL] Add DirectParquetOutputCommitter"

This reverts commit b29663eeea440b1d1a288d41b5ddf67e77c5bd54.

I'm reverting this because it broke test compilation for the Hadoop 1.x
profiles.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a76b921a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a76b921a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a76b921a

Branch: refs/heads/master
Commit: a76b921a923ac37d3c73ee18d24df4bb611daba3
Parents: 4d4b249
Author: Josh Rosen 
Authored: Tue Apr 14 14:07:25 2015 -0700
Committer: Josh Rosen 
Committed: Tue Apr 14 14:10:15 2015 -0700

--
 .../parquet/DirectParquetOutputCommitter.scala  | 66 
 .../sql/parquet/ParquetTableOperations.scala| 22 ---
 .../spark/sql/parquet/ParquetIOSuite.scala  | 21 ---
 3 files changed, 109 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a76b921a/sql/core/src/main/scala/org/apache/spark/sql/parquet/DirectParquetOutputCommitter.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/DirectParquetOutputCommitter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/DirectParquetOutputCommitter.scala
deleted file mode 100644
index 25a66cb..000
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/DirectParquetOutputCommitter.scala
+++ /dev/null
@@ -1,66 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.parquet
-
-import org.apache.hadoop.fs.Path
-import org.apache.hadoop.mapreduce.{JobContext, TaskAttemptContext}
-import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
-
-import parquet.Log
-import parquet.hadoop.util.ContextUtil
-import parquet.hadoop.{ParquetFileReader, ParquetFileWriter, 
ParquetOutputCommitter}
-
-private[parquet] class DirectParquetOutputCommitter(outputPath: Path, context: 
TaskAttemptContext)
-  extends ParquetOutputCommitter(outputPath, context) {
-  val LOG = Log.getLog(classOf[ParquetOutputCommitter])
-
-  override def getWorkPath(): Path = outputPath
-  override def abortTask(taskContext: TaskAttemptContext): Unit = {}
-  override def commitTask(taskContext: TaskAttemptContext): Unit = {}
-  override def needsTaskCommit(taskContext: TaskAttemptContext): Boolean = true
-  override def setupJob(jobContext: JobContext): Unit = {}
-  override def setupTask(taskContext: TaskAttemptContext): Unit = {}
-
-  override def commitJob(jobContext: JobContext) {
-try {
-  val configuration = ContextUtil.getConfiguration(jobContext)
-  val fileSystem = outputPath.getFileSystem(configuration)
-  val outputStatus = fileSystem.getFileStatus(outputPath)
-  val footers = ParquetFileReader.readAllFootersInParallel(configuration, 
outputStatus)
-  try {
-ParquetFileWriter.writeMetadataFile(configuration, outputPath, footers)
-if 
(configuration.getBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", 
true)) {
-  val successPath = new Path(outputPath, 
FileOutputCommitter.SUCCEEDED_FILE_NAME)
-  fileSystem.create(successPath).close()
-}
-  } catch {
-case e: Exception => {
-  LOG.warn("could not write summary file for " + outputPath, e)
-  val metadataPath = new Path(outputPath, 
ParquetFileWriter.PARQUET_METADATA_FILE)
-  if (fileSystem.exists(metadataPath)) {
-fileSystem.delete(metadataPath, true)
-  }
-}
-  }
-} catch {
-  case e: Exception => LOG.warn("could not write summary file for " + 
outputPath, e)
-}
-  }
-
-}
-

http://git-wip-us.apache.org/repos/asf/spark/blob/a76b921a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableO

spark git commit: [SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to branch 1.3)

2015-04-14 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 db2154d7d -> 1ab423f6e


[SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and 
StreamingListenerBatchStarted (backport to branch 1.3)

Backport SPARK-6766 #5414 to branch 1.3

Conflicts:


streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala

Author: zsxwing 

Closes #5452 from zsxwing/SPARK-6766-branch-1.3 and squashes the following 
commits:

cb87e44 [zsxwing] [SPARK-6766][Streaming] Fix issue about 
StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to 
branch 1.3)


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1ab423f6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1ab423f6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1ab423f6

Branch: refs/heads/branch-1.3
Commit: 1ab423f6e6a303bb84b426db4870d28ff53de254
Parents: db2154d
Author: zsxwing 
Authored: Tue Apr 14 16:00:30 2015 -0700
Committer: Tathagata Das 
Committed: Tue Apr 14 16:00:30 2015 -0700

--
 .../streaming/scheduler/JobScheduler.scala  |   8 +-
 .../ui/StreamingJobProgressListener.scala   |  16 +--
 .../streaming/StreamingListenerSuite.scala  |  55 +++--
 .../ui/StreamingJobProgressListenerSuite.scala  | 119 +++
 4 files changed, 180 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1ab423f6/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
 
b/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
index b3ffc71..5b0eb78 100644
--- 
a/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
+++ 
b/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
@@ -105,6 +105,7 @@ class JobScheduler(val ssc: StreamingContext) extends 
Logging {
 if (jobSet.jobs.isEmpty) {
   logInfo("No jobs added for time " + jobSet.time)
 } else {
+  listenerBus.post(StreamingListenerBatchSubmitted(jobSet.toBatchInfo))
   jobSets.put(jobSet.time, jobSet)
   jobSet.jobs.foreach(job => jobExecutor.execute(new JobHandler(job)))
   logInfo("Added jobs for time " + jobSet.time)
@@ -134,10 +135,13 @@ class JobScheduler(val ssc: StreamingContext) extends 
Logging {
 
   private def handleJobStart(job: Job) {
 val jobSet = jobSets.get(job.time)
-if (!jobSet.hasStarted) {
+val isFirstJobOfJobSet = !jobSet.hasStarted
+jobSet.handleJobStart(job)
+if (isFirstJobOfJobSet) {
+  // "StreamingListenerBatchStarted" should be posted after calling 
"handleJobStart" to get the
+  // correct "jobSet.processingStartTime".
   listenerBus.post(StreamingListenerBatchStarted(jobSet.toBatchInfo))
 }
-jobSet.handleJobStart(job)
 logInfo("Starting job " + job.id + " from job set of time " + jobSet.time)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1ab423f6/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala
 
b/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala
index 5ee53a5..49afeda 100644
--- 
a/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala
+++ 
b/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala
@@ -32,7 +32,7 @@ private[streaming] class StreamingJobProgressListener(ssc: 
StreamingContext)
 
   private val waitingBatchInfos = new HashMap[Time, BatchInfo]
   private val runningBatchInfos = new HashMap[Time, BatchInfo]
-  private val completedaBatchInfos = new Queue[BatchInfo]
+  private val completedBatchInfos = new Queue[BatchInfo]
   private val batchInfoLimit = 
ssc.conf.getInt("spark.streaming.ui.retainedBatches", 100)
   private var totalCompletedBatches = 0L
   private var totalReceivedRecords = 0L
@@ -60,7 +60,7 @@ private[streaming] class StreamingJobProgressListener(ssc: 
StreamingContext)
   }
 
   override def onBatchSubmitted(batchSubmitted: 
StreamingListenerBatchSubmitted) = synchronized {
-runningBatchInfos(batchSubmitted.batchInfo.batchTime) = 
batchSubmitted.batchInfo
+waitingBatchInfos(batchSubmitted.batchInfo.batchTime) = 
batchSubmitted.batchInfo
   }
 
   override def onBatchStarted(batchStarted: StreamingListenerBatchStarted) = 
synchronized {
@@ -75,8 +75,8 @@ private[streaming] class StreamingJobProgressListener(ssc: 
StreamingContext)
   ov

spark git commit: [SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Batches" lists to StreamingPage

2015-04-14 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master a76b921a9 -> 6de282e2d


[SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Batches" 
lists to StreamingPage

This PR adds two lists, `Active Batches` and `Completed Batches`. Here is the 
screenshot:

![batch_list](https://cloud.githubusercontent.com/assets/1000778/7060458/d8898572-deb3-11e4-938b-6f8602c71a9f.png)

Due to [SPARK-6766](https://issues.apache.org/jira/browse/SPARK-6766), I need 
to merge #5414 in my local machine to get the above screenshot.

Author: zsxwing 

Closes #5434 from zsxwing/SPARK-6796 and squashes the following commits:

be50fc6 [zsxwing] Fix the code style
51b792e [zsxwing] Fix the unit test
6f3078e [zsxwing] Make 'startTime' readable
f40e0a9 [zsxwing] Merge branch 'master' into SPARK-6796
2525336 [zsxwing] Rename 'Processed batches' and 'Waiting batches' and also add 
links
a69c091 [zsxwing] Show the number of total completed batches too
a12ad7b [zsxwing] Change 'records' to 'events' in the UI
86b5e7f [zsxwing] Make BatchTableBase abstract
b248787 [zsxwing] Add tests to verify the new tables
d18ab7d [zsxwing] Fix the code style
6ceffb3 [zsxwing] Add "Active Batches" and "Completed Batches" lists to 
StreamingPage


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6de282e2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6de282e2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6de282e2

Branch: refs/heads/master
Commit: 6de282e2de3cb69f9b746d03fde581429248824a
Parents: a76b921
Author: zsxwing 
Authored: Tue Apr 14 16:51:36 2015 -0700
Committer: Tathagata Das 
Committed: Tue Apr 14 16:51:36 2015 -0700

--
 .../spark/streaming/ui/AllBatchesTable.scala| 114 +++
 .../spark/streaming/ui/StreamingPage.scala  |  44 +--
 .../spark/streaming/UISeleniumSuite.scala   |  11 ++
 3 files changed, 159 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6de282e2/streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala
new file mode 100644
index 000..df1c0a1
--- /dev/null
+++ 
b/streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.ui
+
+import scala.xml.Node
+
+import org.apache.spark.streaming.scheduler.BatchInfo
+import org.apache.spark.ui.UIUtils
+
+private[ui] abstract class BatchTableBase(tableId: String) {
+
+  protected def columns: Seq[Node] = {
+Batch Time
+  Input Size
+  Scheduling Delay
+  Processing Time
+  }
+
+  protected def baseRow(batch: BatchInfo): Seq[Node] = {
+val batchTime = batch.batchTime.milliseconds
+val formattedBatchTime = UIUtils.formatDate(batch.batchTime.milliseconds)
+val eventCount = batch.receivedBlockInfo.values.map {
+  receivers => receivers.map(_.numRecords).sum
+}.sum
+val schedulingDelay = batch.schedulingDelay
+val formattedSchedulingDelay = 
schedulingDelay.map(UIUtils.formatDuration).getOrElse("-")
+val processingTime = batch.processingDelay
+val formattedProcessingTime = 
processingTime.map(UIUtils.formatDuration).getOrElse("-")
+
+{formattedBatchTime}
+  {eventCount.toString} 
events
+  
+{formattedSchedulingDelay}
+  
+  
+{formattedProcessingTime}
+  
+  }
+
+  private def batchTable: Seq[Node] = {
+
+  
+{columns}
+  
+  
+{renderRows}
+  
+
+  }
+
+  def toNodeSeq: Seq[Node] = {
+batchTable
+  }
+
+  /**
+   * Return HTML for all rows of this table.
+   */
+  protected def renderRows: Seq[Node]
+}
+
+private[ui] class ActiveBatchTable(runningBatches: Seq[BatchInfo], 
waitingBatches: Seq[BatchInfo])
+  extends BatchTableB

spark git commit: [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 6de282e2d -> 971738936


[SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.

The fix for SPARK-6406 broke the case where sub-processes are launched
when SPARK_PREPEND_CLASSES is set, because the code now would only add
the launcher's build directory to the sub-process's classpath instead
of the complete assembly.

This patch fixes the problem by having the launch scripts stash the
assembly's location in an environment variable. This is not the prettiest
solution, but it avoids having to plumb that location all the way through
the Worker code that launches executors. The env variable is always
set by the launch scripts, so users cannot override it.

Author: Marcelo Vanzin 

Closes #5504 from vanzin/SPARK-6890 and squashes the following commits:

7aec921 [Marcelo Vanzin] Fix tests.
ff87a60 [Marcelo Vanzin] Merge branch 'master' into SPARK-6890
31d3ce8 [Marcelo Vanzin] [SPARK-6890] [core] Fix launcher lib work with 
SPARK_PREPEND_CLASSES.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/97173893
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/97173893
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/97173893

Branch: refs/heads/master
Commit: 9717389365772d218cd7c67f9a13c3440f3c6791
Parents: 6de282e
Author: Marcelo Vanzin 
Authored: Tue Apr 14 18:51:39 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 18:51:39 2015 -0700

--
 bin/spark-class | 11 -
 bin/spark-class2.cmd| 11 -
 .../spark/launcher/AbstractCommandBuilder.java  | 44 ++--
 .../spark/launcher/CommandBuilderUtils.java |  1 +
 .../SparkSubmitCommandBuilderSuite.java | 15 ---
 5 files changed, 71 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/97173893/bin/spark-class
--
diff --git a/bin/spark-class b/bin/spark-class
index c03946d..c49d97c 100755
--- a/bin/spark-class
+++ b/bin/spark-class
@@ -82,13 +82,22 @@ if [ $(command -v "$JAR_CMD") ] ; then
   fi
 fi
 
+LAUNCH_CLASSPATH="$SPARK_ASSEMBLY_JAR"
+
+# Add the launcher build dir to the classpath if requested.
+if [ -n "$SPARK_PREPEND_CLASSES" ]; then
+  
LAUNCH_CLASSPATH="$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
+fi
+
+export _SPARK_ASSEMBLY="$SPARK_ASSEMBLY_JAR"
+
 # The launcher library will print arguments separated by a NULL character, to 
allow arguments with
 # characters that would be otherwise interpreted by the shell. Read that in a 
while loop, populating
 # an array that will be used to exec the final command.
 CMD=()
 while IFS= read -d '' -r ARG; do
   CMD+=("$ARG")
-done < <("$RUNNER" -cp "$SPARK_ASSEMBLY_JAR" org.apache.spark.launcher.Main 
"$@")
+done < <("$RUNNER" -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@")
 
 if [ "${CMD[0]}" = "usage" ]; then
   "${CMD[@]}"

http://git-wip-us.apache.org/repos/asf/spark/blob/97173893/bin/spark-class2.cmd
--
diff --git a/bin/spark-class2.cmd b/bin/spark-class2.cmd
index 4b3401d..3d068dd 100644
--- a/bin/spark-class2.cmd
+++ b/bin/spark-class2.cmd
@@ -46,13 +46,22 @@ if "%SPARK_ASSEMBLY_JAR%"=="0" (
   exit /b 1
 )
 
+set LAUNCH_CLASSPATH=%SPARK_ASSEMBLY_JAR%
+
+rem Add the launcher build dir to the classpath if requested.
+if not "x%SPARK_PREPEND_CLASSES%"=="x" (
+  set 
LAUNCH_CLASSPATH=%SPARK_HOME%\launcher\target\scala-%SPARK_SCALA_VERSION%\classes;%LAUNCH_CLASSPATH%
+)
+
+set _SPARK_ASSEMBLY=%SPARK_ASSEMBLY_JAR%
+
 rem Figure out where java is.
 set RUNNER=java
 if not "x%JAVA_HOME%"=="x" set RUNNER=%JAVA_HOME%\bin\java
 
 rem The launcher library prints the command to be executed in a single line 
suitable for being
 rem executed by the batch interpreter. So read all the output of the launcher 
into a variable.
-for /f "tokens=*" %%i in ('cmd /C ""%RUNNER%" -cp %SPARK_ASSEMBLY_JAR% 
org.apache.spark.launcher.Main %*"') do (
+for /f "tokens=*" %%i in ('cmd /C ""%RUNNER%" -cp %LAUNCH_CLASSPATH% 
org.apache.spark.launcher.Main %*"') do (
   set SPARK_CMD=%%i
 )
 %SPARK_CMD%

http://git-wip-us.apache.org/repos/asf/spark/blob/97173893/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
--
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java 
b/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
index d827914..b8f02b9 100644
--- 
a/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
+++ 
b/launcher/src/main/java/org/apache/spark/launcher/AbstractComma

spark git commit: [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 971738936 -> 30a6e0dcc


[SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

...ound.

Author: Marcelo Vanzin 

Closes #5515 from vanzin/SPARK-5634 and squashes the following commits:

f74ecf1 [Marcelo Vanzin] [SPARK-5634] [core] Show correct message in HS when no 
incomplete apps found.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30a6e0dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30a6e0dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30a6e0dc

Branch: refs/heads/master
Commit: 30a6e0dcc0bd298731c1387546779cddcc16bc72
Parents: 9717389
Author: Marcelo Vanzin 
Authored: Tue Apr 14 18:52:48 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 18:52:48 2015 -0700

--
 .../main/scala/org/apache/spark/deploy/history/HistoryPage.scala   | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/30a6e0dc/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
index 6e432d6..3781b4e 100644
--- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
@@ -90,6 +90,8 @@ private[history] class HistoryPage(parent: HistoryServer) 
extends WebUIPage("")
 
++
   appTable
+} else if (requestedIncomplete) {
+  No incomplete applications found!
 } else {
   No completed applications found! ++
   Did you specify the correct logging directory?


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 3c13936aa -> 5845a6236


[SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

...ound.

Author: Marcelo Vanzin 

Closes #5515 from vanzin/SPARK-5634 and squashes the following commits:

f74ecf1 [Marcelo Vanzin] [SPARK-5634] [core] Show correct message in HS when no 
incomplete apps found.

(cherry picked from commit 30a6e0dcc0bd298731c1387546779cddcc16bc72)
Signed-off-by: Andrew Or 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5845a623
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5845a623
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5845a623

Branch: refs/heads/branch-1.2
Commit: 5845a62361c39eb97df5de01c982821c8858de76
Parents: 3c13936
Author: Marcelo Vanzin 
Authored: Tue Apr 14 18:52:48 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 18:53:10 2015 -0700

--
 .../main/scala/org/apache/spark/deploy/history/HistoryPage.scala   | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5845a623/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
index 5fdc350..3e6baa0 100644
--- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
@@ -57,6 +57,8 @@ private[spark] class HistoryPage(parent: HistoryServer) 
extends WebUIPage("") {
 
++
   appTable
+} else if (requestedIncomplete) {
+  No incomplete applications found!
 } else {
   No completed applications found! ++
   Did you specify the correct logging directory?


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

2015-04-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 1ab423f6e -> 0e5ca9e09


[SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

...ound.

Author: Marcelo Vanzin 

Closes #5515 from vanzin/SPARK-5634 and squashes the following commits:

f74ecf1 [Marcelo Vanzin] [SPARK-5634] [core] Show correct message in HS when no 
incomplete apps found.

(cherry picked from commit 30a6e0dcc0bd298731c1387546779cddcc16bc72)
Signed-off-by: Andrew Or 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0e5ca9e0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0e5ca9e0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0e5ca9e0

Branch: refs/heads/branch-1.3
Commit: 0e5ca9e093e362e10491979aa8e0cf74cf2b05e3
Parents: 1ab423f
Author: Marcelo Vanzin 
Authored: Tue Apr 14 18:52:48 2015 -0700
Committer: Andrew Or 
Committed: Tue Apr 14 18:53:00 2015 -0700

--
 .../main/scala/org/apache/spark/deploy/history/HistoryPage.scala   | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0e5ca9e0/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
index 26ebc75..cce81f4 100644
--- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala
@@ -90,6 +90,8 @@ private[spark] class HistoryPage(parent: HistoryServer) 
extends WebUIPage("") {
 
++
   appTable
+} else if (requestedIncomplete) {
+  No incomplete applications found!
 } else {
   No completed applications found! ++
   Did you specify the correct logging directory?


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6871][SQL] WITH clause in CTE can not following another WITH clause

2015-04-14 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 30a6e0dcc -> 6be918942


[SPARK-6871][SQL] WITH clause in CTE can not following another WITH clause

JIRA https://issues.apache.org/jira/browse/SPARK-6871

Author: Liang-Chi Hsieh 

Closes #5480 from viirya/no_cte_after_cte and squashes the following commits:

4da3712 [Liang-Chi Hsieh] Create new test.
40b38ed [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into 
no_cte_after_cte
0edf568 [Liang-Chi Hsieh] for comments.
6591b79 [Liang-Chi Hsieh] WITH clause in CTE can not following another WITH 
clause.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6be91894
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6be91894
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6be91894

Branch: refs/heads/master
Commit: 6be918942c4078692d169d72fa9c358f6e98e85e
Parents: 30a6e0d
Author: Liang-Chi Hsieh 
Authored: Tue Apr 14 23:47:16 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Apr 14 23:47:16 2015 -0700

--
 .../org/apache/spark/sql/catalyst/SqlParser.scala | 18 +-
 .../org/apache/spark/sql/SQLQuerySuite.scala  |  6 ++
 2 files changed, 15 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6be91894/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
index bc8d375..9a3531c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
@@ -121,14 +121,14 @@ class SqlParser extends AbstractSparkSQLParser with 
DataTypeParser {
   }
 
   protected lazy val start: Parser[LogicalPlan] =
-( (select | ("(" ~> select <~ ")")) *
-  ( UNION ~ ALL^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Union(q1, q2) }
-  | INTERSECT  ^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Intersect(q1, q2) }
-  | EXCEPT ^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Except(q1, q2)}
-  | UNION ~ DISTINCT.? ^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Distinct(Union(q1, q2)) }
-  )
-| insert
-| cte
+start1 | insert | cte
+
+  protected lazy val start1: Parser[LogicalPlan] =
+(select | ("(" ~> select <~ ")")) *
+( UNION ~ ALL^^^ { (q1: LogicalPlan, q2: LogicalPlan) => Union(q1, 
q2) }
+| INTERSECT  ^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Intersect(q1, q2) }
+| EXCEPT ^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Except(q1, q2)}
+| UNION ~ DISTINCT.? ^^^ { (q1: LogicalPlan, q2: LogicalPlan) => 
Distinct(Union(q1, q2)) }
 )
 
   protected lazy val select: Parser[LogicalPlan] =
@@ -159,7 +159,7 @@ class SqlParser extends AbstractSparkSQLParser with 
DataTypeParser {
 }
 
   protected lazy val cte: Parser[LogicalPlan] =
-WITH ~> rep1sep(ident ~ ( AS ~ "(" ~> start <~ ")"), ",") ~ start ^^ {
+WITH ~> rep1sep(ident ~ ( AS ~ "(" ~> start1 <~ ")"), ",") ~ (start1 | 
insert) ^^ {
   case r ~ s => With(s, r.map({case n ~ s => (n, Subquery(n, s))}).toMap)
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6be91894/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 73fb791..0174aae 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -431,6 +431,12 @@ class SQLQuerySuite extends QueryTest with 
BeforeAndAfterAll {
 
   }
 
+  test("Allow only a single WITH clause per query") {
+intercept[RuntimeException] {
+  sql("with q1 as (select * from testData) with q2 as (select * from q1) 
select * from q2")
+}
+  }
+
   test("date row") {
 checkAnswer(sql(
   """select cast("2015-01-28" as date) from testData limit 1"""),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6731] Bump version of apache commons-math3

spark git commit: SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

spark git commit: SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

spark git commit: SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception

spark git commit: [SPARK-6081] Support fetching http/https uris in driver runner.

spark git commit: [SPARK-6894]spark.executor.extraLibraryOptions => spark.executor.extraLibraryPath

spark git commit: [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure

spark git commit: [SPARK-2033] Automatically cleanup checkpoint

spark git commit: SPARK-1706: Allow multiple executors per worker in Standalone mode

spark git commit: [SPARK-6700] [yarn] Re-enable flaky test.

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

spark git commit: [SPARK-5808] [build] Package pyspark files in sbt assembly.

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

spark git commit: [SPARK-6769][YARN][TEST] Usage of the ListenerBus in YarnClusterSuite is wrong

spark git commit: Revert "[SPARK-6352] [SQL] Add DirectParquetOutputCommitter"

spark git commit: [SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to branch 1.3)

spark git commit: [SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Batches" lists to StreamingPage

spark git commit: [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.

spark git commit: [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

spark git commit: [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

spark git commit: [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

spark git commit: [SPARK-6871][SQL] WITH clause in CTE can not following another WITH clause

23 matches

Site Navigation

Mail list logo

Footer information