date:20151130

spark git commit: [MINOR][DOCS] fixed list display in ml-ensembles

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 32911de77 -> 0978ec11c


[MINOR][DOCS] fixed list display in ml-ensembles

The list in ml-ensembles.md wasn't properly formatted and, as a result, was 
looking like this:
![old](http://i.imgur.com/2ZhELLR.png)

This PR aims to make it look like this:
![new](http://i.imgur.com/0Xriwd2.png)

Author: BenFradet 

Closes #10025 from BenFradet/ml-ensembles-doc.

(cherry picked from commit f2fbfa444f6e8d27953ec2d1c0b3abd603c963f9)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0978ec11
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0978ec11
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0978ec11

Branch: refs/heads/branch-1.6
Commit: 0978ec11c9a080bd493da2e9d11c81c08e8e6962
Parents: 32911de
Author: BenFradet 
Authored: Mon Nov 30 13:02:08 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 13:02:19 2015 -0800

--
 docs/ml-ensembles.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0978ec11/docs/ml-ensembles.md
--
diff --git a/docs/ml-ensembles.md b/docs/ml-ensembles.md
index f6c3c30..14fef76 100644
--- a/docs/ml-ensembles.md
+++ b/docs/ml-ensembles.md
@@ -20,6 +20,7 @@ Both use [MLlib decision trees](ml-decision-tree.html) as 
their base models.
 Users can find more information about ensemble algorithms in the [MLlib 
Ensemble guide](mllib-ensembles.html).  In this section, we demonstrate the 
Pipelines API for ensembles.
 
 The main differences between this API and the [original MLlib ensembles 
API](mllib-ensembles.html) are:
+
 * support for ML Pipelines
 * separation of classification vs. regression
 * use of DataFrame metadata to distinguish continuous and categorical features


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOCS] fixed list display in ml-ensembles

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 8df584b02 -> f2fbfa444


[MINOR][DOCS] fixed list display in ml-ensembles

The list in ml-ensembles.md wasn't properly formatted and, as a result, was 
looking like this:
![old](http://i.imgur.com/2ZhELLR.png)

This PR aims to make it look like this:
![new](http://i.imgur.com/0Xriwd2.png)

Author: BenFradet 

Closes #10025 from BenFradet/ml-ensembles-doc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f2fbfa44
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f2fbfa44
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f2fbfa44

Branch: refs/heads/master
Commit: f2fbfa444f6e8d27953ec2d1c0b3abd603c963f9
Parents: 8df584b
Author: BenFradet 
Authored: Mon Nov 30 13:02:08 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 13:02:08 2015 -0800

--
 docs/ml-ensembles.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f2fbfa44/docs/ml-ensembles.md
--
diff --git a/docs/ml-ensembles.md b/docs/ml-ensembles.md
index f6c3c30..14fef76 100644
--- a/docs/ml-ensembles.md
+++ b/docs/ml-ensembles.md
@@ -20,6 +20,7 @@ Both use [MLlib decision trees](ml-decision-tree.html) as 
their base models.
 Users can find more information about ensemble algorithms in the [MLlib 
Ensemble guide](mllib-ensembles.html).  In this section, we demonstrate the 
Pipelines API for ensembles.
 
 The main differences between this API and the [original MLlib ensembles 
API](mllib-ensembles.html) are:
+
 * support for ML Pipelines
 * separation of classification vs. regression
 * use of DataFrame metadata to distinguish continuous and categorical features


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Revert "[SPARK-11206] Support SQL UI on the history server"

2015-11-30 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master f2fbfa444 -> 2c5dee0fb


Revert "[SPARK-11206] Support SQL UI on the history server"

This reverts commit cc243a079b1c039d6e7f0b410d1654d94a090e14 / PR #9297

I'm reverting this because it broke SQLListenerMemoryLeakSuite in the master 
Maven builds.

See #9991 for a discussion of why this broke the tests.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c5dee0f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2c5dee0f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2c5dee0f

Branch: refs/heads/master
Commit: 2c5dee0fb8e4d1734ea3a0f22e0b5bfd2f6dba46
Parents: f2fbfa4
Author: Josh Rosen 
Authored: Mon Nov 30 13:41:52 2015 -0800
Committer: Josh Rosen 
Committed: Mon Nov 30 13:42:35 2015 -0800

--
 .rat-excludes   |   1 -
 .../org/apache/spark/JavaSparkListener.java |   3 -
 .../org/apache/spark/SparkFirehoseListener.java |   4 -
 .../spark/scheduler/EventLoggingListener.scala  |   4 -
 .../apache/spark/scheduler/SparkListener.scala  |  24 +---
 .../spark/scheduler/SparkListenerBus.scala  |   1 -
 .../scala/org/apache/spark/ui/SparkUI.scala |  16 +--
 .../org/apache/spark/util/JsonProtocol.scala|  11 +-
 spark.scheduler.SparkHistoryListenerFactory |   1 -
 .../scala/org/apache/spark/sql/SQLContext.scala |  18 +--
 .../spark/sql/execution/SQLExecution.scala  |  24 +++-
 .../spark/sql/execution/SparkPlanInfo.scala |  46 --
 .../sql/execution/metric/SQLMetricInfo.scala|  30 
 .../spark/sql/execution/metric/SQLMetrics.scala |  56 +++-
 .../spark/sql/execution/ui/ExecutionPage.scala  |   4 +-
 .../spark/sql/execution/ui/SQLListener.scala| 139 ++-
 .../apache/spark/sql/execution/ui/SQLTab.scala  |  12 +-
 .../spark/sql/execution/ui/SparkPlanGraph.scala |  20 +--
 .../sql/execution/metric/SQLMetricsSuite.scala  |   4 +-
 .../sql/execution/ui/SQLListenerSuite.scala |  43 +++---
 .../spark/sql/test/SharedSQLContext.scala   |   1 -
 21 files changed, 135 insertions(+), 327 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/.rat-excludes
--
diff --git a/.rat-excludes b/.rat-excludes
index 7262c96..08fba6d 100644
--- a/.rat-excludes
+++ b/.rat-excludes
@@ -82,5 +82,4 @@ INDEX
 gen-java.*
 .*avpr
 org.apache.spark.sql.sources.DataSourceRegister
-org.apache.spark.scheduler.SparkHistoryListenerFactory
 .*parquet

http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/core/src/main/java/org/apache/spark/JavaSparkListener.java
--
diff --git a/core/src/main/java/org/apache/spark/JavaSparkListener.java 
b/core/src/main/java/org/apache/spark/JavaSparkListener.java
index 23bc9a2..fa9acf0 100644
--- a/core/src/main/java/org/apache/spark/JavaSparkListener.java
+++ b/core/src/main/java/org/apache/spark/JavaSparkListener.java
@@ -82,7 +82,4 @@ public class JavaSparkListener implements SparkListener {
   @Override
   public void onBlockUpdated(SparkListenerBlockUpdated blockUpdated) { }
 
-  @Override
-  public void onOtherEvent(SparkListenerEvent event) { }
-
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
--
diff --git a/core/src/main/java/org/apache/spark/SparkFirehoseListener.java 
b/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
index e6b24af..1214d05 100644
--- a/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
+++ b/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
@@ -118,8 +118,4 @@ public class SparkFirehoseListener implements SparkListener 
{
 onEvent(blockUpdated);
 }
 
-@Override
-public void onOtherEvent(SparkListenerEvent event) {
-onEvent(event);
-}
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala 
b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
index eaa07ac..000a021 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
@@ -207,10 +207,6 @@ private[spark] class EventLoggingListener(
   // No-op because logging every update would be overkill
   override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit =

spark git commit: [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 0ddfe7868 -> e07494420


[SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form 
zk://host:port for a multi-master Mesos cluster using ZooKeeper

* According to below doc and validation logic in 
[SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231),
 master URL for a mesos cluster should always start with `mesos://`

http://spark.apache.org/docs/latest/running-on-mesos.html
`The Master URLs for Mesos are in the form mesos://host:5050 for a 
single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos 
cluster using ZooKeeper.`

* However, 
[SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749)
 fails the validation and can receive master URL in the form `zk://host:port`

* For the master URLs in the form `zk:host:port`, the valid form should be 
`mesos://zk://host:port`

* This PR restrict the validation in `SparkContext.scala`, and now only mesos 
master URLs prefixed with `mesos://` can be accepted.

* This PR also updated corresponding unit test.

Author: toddwan 

Closes #9886 from toddwan/S11859.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0749442
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0749442
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0749442

Branch: refs/heads/master
Commit: e0749442051d6e29dae4f4cdcb2937c0b015f98f
Parents: 0ddfe78
Author: toddwan 
Authored: Mon Nov 30 09:26:29 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 09:26:29 2015 +

--
 .../main/scala/org/apache/spark/SparkContext.scala  | 16 ++--
 .../spark/SparkContextSchedulerCreationSuite.scala  |  5 +
 2 files changed, 15 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e0749442/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index b030d3c..8a62b71 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2708,15 +2708,14 @@ object SparkContext extends Logging {
 scheduler.initialize(backend)
 (backend, scheduler)
 
-  case mesosUrl @ MESOS_REGEX(_) =>
+  case MESOS_REGEX(mesosUrl) =>
 MesosNativeLibrary.load()
 val scheduler = new TaskSchedulerImpl(sc)
 val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", 
defaultValue = true)
-val url = mesosUrl.stripPrefix("mesos://") // strip scheme from raw 
Mesos URLs
 val backend = if (coarseGrained) {
-  new CoarseMesosSchedulerBackend(scheduler, sc, url, 
sc.env.securityManager)
+  new CoarseMesosSchedulerBackend(scheduler, sc, mesosUrl, 
sc.env.securityManager)
 } else {
-  new MesosSchedulerBackend(scheduler, sc, url)
+  new MesosSchedulerBackend(scheduler, sc, mesosUrl)
 }
 scheduler.initialize(backend)
 (backend, scheduler)
@@ -2727,6 +2726,11 @@ object SparkContext extends Logging {
 scheduler.initialize(backend)
 (backend, scheduler)
 
+  case zkUrl if zkUrl.startsWith("zk://") =>
+logWarning("Master URL for a multi-master Mesos cluster managed by 
ZooKeeper should be " +
+  "in the form mesos://zk://host:port. Current Master URL will stop 
working in Spark 2.0.")
+createTaskScheduler(sc, "mesos://" + zkUrl)
+
   case _ =>
 throw new SparkException("Could not parse Master URL: '" + master + 
"'")
 }
@@ -2745,8 +2749,8 @@ private object SparkMasterRegex {
   val LOCAL_CLUSTER_REGEX = 
"""local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
   // Regular expression for connecting to Spark deploy clusters
   val SPARK_REGEX = """spark://(.*)""".r
-  // Regular expression for connection to Mesos cluster by mesos:// or zk:// 
url
-  val MESOS_REGEX = """(mesos|zk)://.*""".r
+  // Regular expression for connection to Mesos cluster by mesos:// or 
mesos://zk:// url
+  val MESOS_REGEX = """mesos://(.*)""".r
   // Regular expression for connection to Simr cluster
   val SIMR_REGEX = """simr://(.*)""".r
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/e0749442/core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala

spark git commit: [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a4a2a7deb -> 12d97b0c5


[SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form 
zk://host:port for a multi-master Mesos cluster using ZooKeeper

* According to below doc and validation logic in 
[SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231),
 master URL for a mesos cluster should always start with `mesos://`

http://spark.apache.org/docs/latest/running-on-mesos.html
`The Master URLs for Mesos are in the form mesos://host:5050 for a 
single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos 
cluster using ZooKeeper.`

* However, 
[SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749)
 fails the validation and can receive master URL in the form `zk://host:port`

* For the master URLs in the form `zk:host:port`, the valid form should be 
`mesos://zk://host:port`

* This PR restrict the validation in `SparkContext.scala`, and now only mesos 
master URLs prefixed with `mesos://` can be accepted.

* This PR also updated corresponding unit test.

Author: toddwan 

Closes #9886 from toddwan/S11859.

(cherry picked from commit e0749442051d6e29dae4f4cdcb2937c0b015f98f)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12d97b0c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12d97b0c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12d97b0c

Branch: refs/heads/branch-1.6
Commit: 12d97b0c5213e04453f81156f04ed95d877f199c
Parents: a4a2a7d
Author: toddwan 
Authored: Mon Nov 30 09:26:29 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 09:26:37 2015 +

--
 .../main/scala/org/apache/spark/SparkContext.scala  | 16 ++--
 .../spark/SparkContextSchedulerCreationSuite.scala  |  5 +
 2 files changed, 15 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/12d97b0c/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index b030d3c..8a62b71 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2708,15 +2708,14 @@ object SparkContext extends Logging {
 scheduler.initialize(backend)
 (backend, scheduler)
 
-  case mesosUrl @ MESOS_REGEX(_) =>
+  case MESOS_REGEX(mesosUrl) =>
 MesosNativeLibrary.load()
 val scheduler = new TaskSchedulerImpl(sc)
 val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", 
defaultValue = true)
-val url = mesosUrl.stripPrefix("mesos://") // strip scheme from raw 
Mesos URLs
 val backend = if (coarseGrained) {
-  new CoarseMesosSchedulerBackend(scheduler, sc, url, 
sc.env.securityManager)
+  new CoarseMesosSchedulerBackend(scheduler, sc, mesosUrl, 
sc.env.securityManager)
 } else {
-  new MesosSchedulerBackend(scheduler, sc, url)
+  new MesosSchedulerBackend(scheduler, sc, mesosUrl)
 }
 scheduler.initialize(backend)
 (backend, scheduler)
@@ -2727,6 +2726,11 @@ object SparkContext extends Logging {
 scheduler.initialize(backend)
 (backend, scheduler)
 
+  case zkUrl if zkUrl.startsWith("zk://") =>
+logWarning("Master URL for a multi-master Mesos cluster managed by 
ZooKeeper should be " +
+  "in the form mesos://zk://host:port. Current Master URL will stop 
working in Spark 2.0.")
+createTaskScheduler(sc, "mesos://" + zkUrl)
+
   case _ =>
 throw new SparkException("Could not parse Master URL: '" + master + 
"'")
 }
@@ -2745,8 +2749,8 @@ private object SparkMasterRegex {
   val LOCAL_CLUSTER_REGEX = 
"""local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
   // Regular expression for connecting to Spark deploy clusters
   val SPARK_REGEX = """spark://(.*)""".r
-  // Regular expression for connection to Mesos cluster by mesos:// or zk:// 
url
-  val MESOS_REGEX = """(mesos|zk)://.*""".r
+  // Regular expression for connection to Mesos cluster by mesos:// or 
mesos://zk:// url
+  val MESOS_REGEX = """mesos://(.*)""".r
   // Regular expression for connection to Simr cluster
   val SIMR_REGEX = """simr://(.*)""".r
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/12d97b0c/core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala

spark git commit: [SPARK-12023][BUILD] Fix warnings while packaging spark with maven.

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 aaf835f1d -> 33cd171b2


[SPARK-12023][BUILD] Fix warnings while packaging spark with maven.

this is a trivial fix, discussed 
[here](http://stackoverflow.com/questions/28500401/maven-assembly-plugin-warning-the-assembly-descriptor-contains-a-filesystem-roo/).

Author: Prashant Sharma 

Closes #10014 from ScrapCodes/assembly-warning.

(cherry picked from commit bf0e85a70a54a2d7fd6804b6bd00c63c20e2bb00)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33cd171b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33cd171b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33cd171b

Branch: refs/heads/branch-1.6
Commit: 33cd171b24d45081c8dd1b22a8bd0152a392d115
Parents: aaf835f
Author: Prashant Sharma 
Authored: Mon Nov 30 10:11:27 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 10:11:39 2015 +

--
 assembly/src/main/assembly/assembly.xml  | 8 
 external/mqtt/src/main/assembly/assembly.xml | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/33cd171b/assembly/src/main/assembly/assembly.xml
--
diff --git a/assembly/src/main/assembly/assembly.xml 
b/assembly/src/main/assembly/assembly.xml
index 7111563..009d4b9 100644
--- a/assembly/src/main/assembly/assembly.xml
+++ b/assembly/src/main/assembly/assembly.xml
@@ -32,7 +32,7 @@
   
 
${project.parent.basedir}/core/src/main/resources/org/apache/spark/ui/static/
   
-  
/ui-resources/org/apache/spark/ui/static
+  
ui-resources/org/apache/spark/ui/static
   
 **/*
   
@@ -41,7 +41,7 @@
   
 ${project.parent.basedir}/sbin/
   
-  /sbin
+  sbin
   
 **/*
   
@@ -50,7 +50,7 @@
   
 ${project.parent.basedir}/bin/
   
-  /bin
+  bin
   
 **/*
   
@@ -59,7 +59,7 @@
   
 ${project.parent.basedir}/assembly/target/${spark.jar.dir}
   
-  /
+  
   
 ${spark.jar.basename}
   

http://git-wip-us.apache.org/repos/asf/spark/blob/33cd171b/external/mqtt/src/main/assembly/assembly.xml
--
diff --git a/external/mqtt/src/main/assembly/assembly.xml 
b/external/mqtt/src/main/assembly/assembly.xml
index ecab5b3..c110b01 100644
--- a/external/mqtt/src/main/assembly/assembly.xml
+++ b/external/mqtt/src/main/assembly/assembly.xml
@@ -24,7 +24,7 @@
   
 
   
${project.build.directory}/scala-${scala.binary.version}/test-classes
-  /
+  
 
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12023][BUILD] Fix warnings while packaging spark with maven.

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 26c3581f1 -> bf0e85a70


[SPARK-12023][BUILD] Fix warnings while packaging spark with maven.

this is a trivial fix, discussed 
[here](http://stackoverflow.com/questions/28500401/maven-assembly-plugin-warning-the-assembly-descriptor-contains-a-filesystem-roo/).

Author: Prashant Sharma 

Closes #10014 from ScrapCodes/assembly-warning.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf0e85a7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf0e85a7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf0e85a7

Branch: refs/heads/master
Commit: bf0e85a70a54a2d7fd6804b6bd00c63c20e2bb00
Parents: 26c3581
Author: Prashant Sharma 
Authored: Mon Nov 30 10:11:27 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 10:11:27 2015 +

--
 assembly/src/main/assembly/assembly.xml  | 8 
 external/mqtt/src/main/assembly/assembly.xml | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bf0e85a7/assembly/src/main/assembly/assembly.xml
--
diff --git a/assembly/src/main/assembly/assembly.xml 
b/assembly/src/main/assembly/assembly.xml
index 7111563..009d4b9 100644
--- a/assembly/src/main/assembly/assembly.xml
+++ b/assembly/src/main/assembly/assembly.xml
@@ -32,7 +32,7 @@
   
 
${project.parent.basedir}/core/src/main/resources/org/apache/spark/ui/static/
   
-  
/ui-resources/org/apache/spark/ui/static
+  
ui-resources/org/apache/spark/ui/static
   
 **/*
   
@@ -41,7 +41,7 @@
   
 ${project.parent.basedir}/sbin/
   
-  /sbin
+  sbin
   
 **/*
   
@@ -50,7 +50,7 @@
   
 ${project.parent.basedir}/bin/
   
-  /bin
+  bin
   
 **/*
   
@@ -59,7 +59,7 @@
   
 ${project.parent.basedir}/assembly/target/${spark.jar.dir}
   
-  /
+  
   
 ${spark.jar.basename}
   

http://git-wip-us.apache.org/repos/asf/spark/blob/bf0e85a7/external/mqtt/src/main/assembly/assembly.xml
--
diff --git a/external/mqtt/src/main/assembly/assembly.xml 
b/external/mqtt/src/main/assembly/assembly.xml
index ecab5b3..c110b01 100644
--- a/external/mqtt/src/main/assembly/assembly.xml
+++ b/external/mqtt/src/main/assembly/assembly.xml
@@ -24,7 +24,7 @@
   
 
   
${project.build.directory}/scala-${scala.binary.version}/test-classes
-  /
+  
 
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11700] [SQL] Remove thread local SQLContext in SparkPlan

2015-11-30 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a589736a1 -> 32911de77


[SPARK-11700] [SQL] Remove thread local SQLContext  in SparkPlan

In 1.6, we introduce a public API to have a SQLContext for current thread, 
SparkPlan should use that.

Author: Davies Liu 

Closes #9990 from davies/leak_context.

(cherry picked from commit 17275fa99c670537c52843df405279a52b5c9594)
Signed-off-by: Davies Liu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32911de7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32911de7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32911de7

Branch: refs/heads/branch-1.6
Commit: 32911de77d0ff346f821c7cd1dc607c611780164
Parents: a589736
Author: Davies Liu 
Authored: Mon Nov 30 10:32:13 2015 -0800
Committer: Davies Liu 
Committed: Mon Nov 30 10:32:30 2015 -0800

--
 .../main/scala/org/apache/spark/sql/SQLContext.scala  | 10 +-
 .../apache/spark/sql/execution/QueryExecution.scala   |  3 +--
 .../org/apache/spark/sql/execution/SparkPlan.scala| 14 --
 .../org/apache/spark/sql/MultiSQLContextsSuite.scala  |  2 +-
 .../sql/execution/ExchangeCoordinatorSuite.scala  |  2 +-
 .../sql/execution/RowFormatConvertersSuite.scala  |  4 ++--
 6 files changed, 14 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/32911de7/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 46bf544..9cc65de 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -26,7 +26,6 @@ import scala.collection.immutable
 import scala.reflect.runtime.universe.TypeTag
 import scala.util.control.NonFatal
 
-import org.apache.spark.{SparkException, SparkContext}
 import org.apache.spark.annotation.{DeveloperApi, Experimental}
 import org.apache.spark.api.java.{JavaRDD, JavaSparkContext}
 import org.apache.spark.rdd.RDD
@@ -45,9 +44,10 @@ import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.execution.ui.{SQLListener, SQLTab}
 import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.types._
-import org.apache.spark.sql.{execution => sparkexecution}
 import org.apache.spark.sql.util.ExecutionListenerManager
+import org.apache.spark.sql.{execution => sparkexecution}
 import org.apache.spark.util.Utils
+import org.apache.spark.{SparkContext, SparkException}
 
 /**
  * The entry point for working with structured data (rows and columns) in 
Spark.  Allows the
@@ -401,7 +401,7 @@ class SQLContext private[sql](
*/
   @Experimental
   def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = {
-SparkPlan.currentContext.set(self)
+SQLContext.setActive(self)
 val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
 val attributeSeq = schema.toAttributes
 val rowRDD = RDDConversions.productToRowRdd(rdd, schema.map(_.dataType))
@@ -417,7 +417,7 @@ class SQLContext private[sql](
*/
   @Experimental
   def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame = {
-SparkPlan.currentContext.set(self)
+SQLContext.setActive(self)
 val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
 val attributeSeq = schema.toAttributes
 DataFrame(self, LocalRelation.fromProduct(attributeSeq, data))
@@ -1328,7 +1328,7 @@ object SQLContext {
 activeContext.remove()
   }
 
-  private[sql] def getActiveContextOption(): Option[SQLContext] = {
+  private[sql] def getActive(): Option[SQLContext] = {
 Option(activeContext.get())
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/32911de7/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index 5da5aea..107570f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -42,9 +42,8 @@ class QueryExecution(val sqlContext: SQLContext, val logical: 
LogicalPlan) {
 
   lazy val optimizedPlan: LogicalPlan = 
sqlContext.optimizer.execute(withCachedData)
 
-  // TODO: Don't just pick the first one...
   lazy val sparkPlan: SparkPlan = {
-SparkPlan.currentContext.set(sqlContext)
+

spark git commit: [SPARK-11700] [SQL] Remove thread local SQLContext in SparkPlan

2015-11-30 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master 2db4662fe -> 17275fa99


[SPARK-11700] [SQL] Remove thread local SQLContext  in SparkPlan

In 1.6, we introduce a public API to have a SQLContext for current thread, 
SparkPlan should use that.

Author: Davies Liu 

Closes #9990 from davies/leak_context.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17275fa9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17275fa9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17275fa9

Branch: refs/heads/master
Commit: 17275fa99c670537c52843df405279a52b5c9594
Parents: 2db4662
Author: Davies Liu 
Authored: Mon Nov 30 10:32:13 2015 -0800
Committer: Davies Liu 
Committed: Mon Nov 30 10:32:13 2015 -0800

--
 .../main/scala/org/apache/spark/sql/SQLContext.scala  | 10 +-
 .../apache/spark/sql/execution/QueryExecution.scala   |  3 +--
 .../org/apache/spark/sql/execution/SparkPlan.scala| 14 --
 .../org/apache/spark/sql/MultiSQLContextsSuite.scala  |  2 +-
 .../sql/execution/ExchangeCoordinatorSuite.scala  |  2 +-
 .../sql/execution/RowFormatConvertersSuite.scala  |  4 ++--
 6 files changed, 14 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/17275fa9/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 1c2ac5f..8d27839 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -26,7 +26,6 @@ import scala.collection.immutable
 import scala.reflect.runtime.universe.TypeTag
 import scala.util.control.NonFatal
 
-import org.apache.spark.{SparkException, SparkContext}
 import org.apache.spark.annotation.{DeveloperApi, Experimental}
 import org.apache.spark.api.java.{JavaRDD, JavaSparkContext}
 import org.apache.spark.rdd.RDD
@@ -45,9 +44,10 @@ import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.execution.ui.{SQLListener, SQLTab}
 import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.types._
-import org.apache.spark.sql.{execution => sparkexecution}
 import org.apache.spark.sql.util.ExecutionListenerManager
+import org.apache.spark.sql.{execution => sparkexecution}
 import org.apache.spark.util.Utils
+import org.apache.spark.{SparkContext, SparkException}
 
 /**
  * The entry point for working with structured data (rows and columns) in 
Spark.  Allows the
@@ -401,7 +401,7 @@ class SQLContext private[sql](
*/
   @Experimental
   def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = {
-SparkPlan.currentContext.set(self)
+SQLContext.setActive(self)
 val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
 val attributeSeq = schema.toAttributes
 val rowRDD = RDDConversions.productToRowRdd(rdd, schema.map(_.dataType))
@@ -417,7 +417,7 @@ class SQLContext private[sql](
*/
   @Experimental
   def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame = {
-SparkPlan.currentContext.set(self)
+SQLContext.setActive(self)
 val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
 val attributeSeq = schema.toAttributes
 DataFrame(self, LocalRelation.fromProduct(attributeSeq, data))
@@ -1334,7 +1334,7 @@ object SQLContext {
 activeContext.remove()
   }
 
-  private[sql] def getActiveContextOption(): Option[SQLContext] = {
+  private[sql] def getActive(): Option[SQLContext] = {
 Option(activeContext.get())
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/17275fa9/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index 5da5aea..107570f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -42,9 +42,8 @@ class QueryExecution(val sqlContext: SQLContext, val logical: 
LogicalPlan) {
 
   lazy val optimizedPlan: LogicalPlan = 
sqlContext.optimizer.execute(withCachedData)
 
-  // TODO: Don't just pick the first one...
   lazy val sparkPlan: SparkPlan = {
-SparkPlan.currentContext.set(sqlContext)
+SQLContext.setActive(sqlContext)
 sqlContext.planner.plan(optimizedPlan).next()
   }

spark git commit: [SPARK-11982] [SQL] improve performance of cartesian product

2015-11-30 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master 17275fa99 -> 8df584b02


[SPARK-11982] [SQL] improve performance of cartesian product

This PR improve the performance of CartesianProduct by caching the result of 
right plan.

After this patch, the query time of TPC-DS Q65 go down to 4 seconds from 28 
minutes (420X faster).

cc nongli

Author: Davies Liu 

Closes #9969 from davies/improve_cartesian.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8df584b0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8df584b0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8df584b0

Branch: refs/heads/master
Commit: 8df584b0200402d8b2ce0a8de24f7a760ced8655
Parents: 17275fa
Author: Davies Liu 
Authored: Mon Nov 30 11:54:18 2015 -0800
Committer: Davies Liu 
Committed: Mon Nov 30 11:54:18 2015 -0800

--
 .../unsafe/sort/UnsafeExternalSorter.java   | 63 
 .../unsafe/sort/UnsafeInMemorySorter.java   |  7 ++
 .../sql/execution/joins/CartesianProduct.scala  | 76 +---
 .../sql/execution/metric/SQLMetricsSuite.scala  |  2 +-
 4 files changed, 139 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8df584b0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 2e40312..5a97f4f 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -21,6 +21,7 @@ import javax.annotation.Nullable;
 import java.io.File;
 import java.io.IOException;
 import java.util.LinkedList;
+import java.util.Queue;
 
 import com.google.common.annotations.VisibleForTesting;
 import org.slf4j.Logger;
@@ -521,4 +522,66 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return upstream.getKeyPrefix();
 }
   }
+
+  /**
+   * Returns a iterator, which will return the rows in the order as inserted.
+   *
+   * It is the caller's responsibility to call `cleanupResources()`
+   * after consuming this iterator.
+   */
+  public UnsafeSorterIterator getIterator() throws IOException {
+if (spillWriters.isEmpty()) {
+  assert(inMemSorter != null);
+  return inMemSorter.getIterator();
+} else {
+  LinkedList queue = new LinkedList<>();
+  for (UnsafeSorterSpillWriter spillWriter : spillWriters) {
+queue.add(spillWriter.getReader(blockManager));
+  }
+  if (inMemSorter != null) {
+queue.add(inMemSorter.getIterator());
+  }
+  return new ChainedIterator(queue);
+}
+  }
+
+  /**
+   * Chain multiple UnsafeSorterIterator together as single one.
+   */
+  class ChainedIterator extends UnsafeSorterIterator {
+
+private final Queue iterators;
+private UnsafeSorterIterator current;
+
+public ChainedIterator(Queue iterators) {
+  assert iterators.size() > 0;
+  this.iterators = iterators;
+  this.current = iterators.remove();
+}
+
+@Override
+public boolean hasNext() {
+  while (!current.hasNext() && !iterators.isEmpty()) {
+current = iterators.remove();
+  }
+  return current.hasNext();
+}
+
+@Override
+public void loadNext() throws IOException {
+  current.loadNext();
+}
+
+@Override
+public Object getBaseObject() { return current.getBaseObject(); }
+
+@Override
+public long getBaseOffset() { return current.getBaseOffset(); }
+
+@Override
+public int getRecordLength() { return current.getRecordLength(); }
+
+@Override
+public long getKeyPrefix() { return current.getKeyPrefix(); }
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/8df584b0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
index dce1f15..c91e88f 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
@@ -226,4 +226,11 @@ public final class UnsafeInMemorySorter {
 sorter.sort(array, 0, pos / 2, sortComparator);
 return new SortedIterator(pos / 2);

spark git commit: [SPARK-12037][CORE] initialize heartbeatReceiverRef before calling startDriverHeartbeat

2015-11-30 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 43ffa0373 -> a4e134827


[SPARK-12037][CORE] initialize heartbeatReceiverRef before calling 
startDriverHeartbeat

https://issues.apache.org/jira/browse/SPARK-12037

a simple fix by changing the order of the statements

Author: CodingCat 

Closes #10032 from CodingCat/SPARK-12037.

(cherry picked from commit 0a46e4377216a1f7de478f220c3b3042a77789e2)
Signed-off-by: Andrew Or 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a4e13482
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a4e13482
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a4e13482

Branch: refs/heads/branch-1.6
Commit: a4e134827bdfee866a5af22751452c7e03801645
Parents: 43ffa03
Author: CodingCat 
Authored: Mon Nov 30 17:19:26 2015 -0800
Committer: Andrew Or 
Committed: Mon Nov 30 17:19:58 2015 -0800

--
 core/src/main/scala/org/apache/spark/executor/Executor.scala | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a4e13482/core/src/main/scala/org/apache/spark/executor/Executor.scala
--
diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala 
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index 6154f06..7b68dfe 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -109,6 +109,10 @@ private[spark] class Executor(
   // Executor for the heartbeat task.
   private val heartbeater = 
ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-heartbeater")
 
+  // must be initialized before running startDriverHeartbeat()
+  private val heartbeatReceiverRef =
+RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv)
+
   startDriverHeartbeater()
 
   def launchTask(
@@ -411,9 +415,6 @@ private[spark] class Executor(
 }
   }
 
-  private val heartbeatReceiverRef =
-RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv)
-
   /** Reports heartbeat and metrics for active tasks to the driver. */
   private def reportHeartBeat(): Unit = {
 // list of (task id, metrics) to send back to the driver


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.

2015-11-30 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 0a46e4377 -> 9bf212067


http://git-wip-us.apache.org/repos/asf/spark/blob/9bf21206/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java
--
diff --git 
a/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java
 
b/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java
index 88fa225..9e9be98 100644
--- 
a/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java
+++ 
b/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java
@@ -17,6 +17,7 @@
 
 package org.apache.spark.network;
 
+import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.HashSet;
@@ -26,7 +27,6 @@ import java.util.Set;
 import java.util.concurrent.Semaphore;
 import java.util.concurrent.TimeUnit;
 
-import com.google.common.base.Charsets;
 import com.google.common.collect.Sets;
 import org.junit.AfterClass;
 import org.junit.BeforeClass;
@@ -41,6 +41,7 @@ import org.apache.spark.network.server.OneForOneStreamManager;
 import org.apache.spark.network.server.RpcHandler;
 import org.apache.spark.network.server.StreamManager;
 import org.apache.spark.network.server.TransportServer;
+import org.apache.spark.network.util.JavaUtils;
 import org.apache.spark.network.util.SystemPropertyConfigProvider;
 import org.apache.spark.network.util.TransportConf;
 
@@ -55,11 +56,14 @@ public class RpcIntegrationSuite {
 TransportConf conf = new TransportConf("shuffle", new 
SystemPropertyConfigProvider());
 rpcHandler = new RpcHandler() {
   @Override
-  public void receive(TransportClient client, byte[] message, 
RpcResponseCallback callback) {
-String msg = new String(message, Charsets.UTF_8);
+  public void receive(
+  TransportClient client,
+  ByteBuffer message,
+  RpcResponseCallback callback) {
+String msg = JavaUtils.bytesToString(message);
 String[] parts = msg.split("/");
 if (parts[0].equals("hello")) {
-  callback.onSuccess(("Hello, " + parts[1] + 
"!").getBytes(Charsets.UTF_8));
+  callback.onSuccess(JavaUtils.stringToBytes("Hello, " + parts[1] + 
"!"));
 } else if (parts[0].equals("return error")) {
   callback.onFailure(new RuntimeException("Returned: " + parts[1]));
 } else if (parts[0].equals("throw error")) {
@@ -68,9 +72,8 @@ public class RpcIntegrationSuite {
   }
 
   @Override
-  public void receive(TransportClient client, byte[] message) {
-String msg = new String(message, Charsets.UTF_8);
-oneWayMsgs.add(msg);
+  public void receive(TransportClient client, ByteBuffer message) {
+oneWayMsgs.add(JavaUtils.bytesToString(message));
   }
 
   @Override
@@ -103,8 +106,9 @@ public class RpcIntegrationSuite {
 
 RpcResponseCallback callback = new RpcResponseCallback() {
   @Override
-  public void onSuccess(byte[] message) {
-res.successMessages.add(new String(message, Charsets.UTF_8));
+  public void onSuccess(ByteBuffer message) {
+String response = JavaUtils.bytesToString(message);
+res.successMessages.add(response);
 sem.release();
   }
 
@@ -116,7 +120,7 @@ public class RpcIntegrationSuite {
 };
 
 for (String command : commands) {
-  client.sendRpc(command.getBytes(Charsets.UTF_8), callback);
+  client.sendRpc(JavaUtils.stringToBytes(command), callback);
 }
 
 if (!sem.tryAcquire(commands.length, 5, TimeUnit.SECONDS)) {
@@ -173,7 +177,7 @@ public class RpcIntegrationSuite {
 final String message = "no reply";
 TransportClient client = 
clientFactory.createClient(TestUtils.getLocalHost(), server.getPort());
 try {
-  client.send(message.getBytes(Charsets.UTF_8));
+  client.send(JavaUtils.stringToBytes(message));
   assertEquals(0, client.getHandler().numOutstandingRequests());
 
   // Make sure the message arrives.

http://git-wip-us.apache.org/repos/asf/spark/blob/9bf21206/network/common/src/test/java/org/apache/spark/network/StreamSuite.java
--
diff --git 
a/network/common/src/test/java/org/apache/spark/network/StreamSuite.java 
b/network/common/src/test/java/org/apache/spark/network/StreamSuite.java
index 538f3ef..9c49556 100644
--- a/network/common/src/test/java/org/apache/spark/network/StreamSuite.java
+++ b/network/common/src/test/java/org/apache/spark/network/StreamSuite.java
@@ -116,7 +116,10 @@ public class StreamSuite {
 };
 RpcHandler handler = new RpcHandler() {
   @Override
-  public void receive(TransportClient client, byte[] message, 
RpcResponseCallback callback) {
+  public void receive(
+  TransportClient client,
+  ByteBuffer message,
+  RpcResponseCallback

[1/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.

2015-11-30 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a4e134827 -> ef6f8c262


http://git-wip-us.apache.org/repos/asf/spark/blob/ef6f8c26/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java
--
diff --git 
a/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java
 
b/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java
index 42955ef..f9b5bf9 100644
--- 
a/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java
+++ 
b/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java
@@ -31,6 +31,7 @@ import org.apache.spark.network.server.TransportServer;
 import org.apache.spark.network.util.MapConfigProvider;
 import org.apache.spark.network.util.TransportConf;
 import org.junit.*;
+import static org.junit.Assert.*;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
@@ -84,13 +85,16 @@ public class RequestTimeoutIntegrationSuite {
   @Test
   public void timeoutInactiveRequests() throws Exception {
 final Semaphore semaphore = new Semaphore(1);
-final byte[] response = new byte[16];
+final int responseSize = 16;
 RpcHandler handler = new RpcHandler() {
   @Override
-  public void receive(TransportClient client, byte[] message, 
RpcResponseCallback callback) {
+  public void receive(
+  TransportClient client,
+  ByteBuffer message,
+  RpcResponseCallback callback) {
 try {
   semaphore.tryAcquire(FOREVER, TimeUnit.MILLISECONDS);
-  callback.onSuccess(response);
+  callback.onSuccess(ByteBuffer.allocate(responseSize));
 } catch (InterruptedException e) {
   // do nothing
 }
@@ -110,15 +114,15 @@ public class RequestTimeoutIntegrationSuite {
 // First completes quickly (semaphore starts at 1).
 TestCallback callback0 = new TestCallback();
 synchronized (callback0) {
-  client.sendRpc(new byte[0], callback0);
+  client.sendRpc(ByteBuffer.allocate(0), callback0);
   callback0.wait(FOREVER);
-  assert (callback0.success.length == response.length);
+  assertEquals(responseSize, callback0.successLength);
 }
 
 // Second times out after 2 seconds, with slack. Must be IOException.
 TestCallback callback1 = new TestCallback();
 synchronized (callback1) {
-  client.sendRpc(new byte[0], callback1);
+  client.sendRpc(ByteBuffer.allocate(0), callback1);
   callback1.wait(4 * 1000);
   assert (callback1.failure != null);
   assert (callback1.failure instanceof IOException);
@@ -131,13 +135,16 @@ public class RequestTimeoutIntegrationSuite {
   @Test
   public void timeoutCleanlyClosesClient() throws Exception {
 final Semaphore semaphore = new Semaphore(0);
-final byte[] response = new byte[16];
+final int responseSize = 16;
 RpcHandler handler = new RpcHandler() {
   @Override
-  public void receive(TransportClient client, byte[] message, 
RpcResponseCallback callback) {
+  public void receive(
+  TransportClient client,
+  ByteBuffer message,
+  RpcResponseCallback callback) {
 try {
   semaphore.tryAcquire(FOREVER, TimeUnit.MILLISECONDS);
-  callback.onSuccess(response);
+  callback.onSuccess(ByteBuffer.allocate(responseSize));
 } catch (InterruptedException e) {
   // do nothing
 }
@@ -158,7 +165,7 @@ public class RequestTimeoutIntegrationSuite {
   clientFactory.createClient(TestUtils.getLocalHost(), server.getPort());
 TestCallback callback0 = new TestCallback();
 synchronized (callback0) {
-  client0.sendRpc(new byte[0], callback0);
+  client0.sendRpc(ByteBuffer.allocate(0), callback0);
   callback0.wait(FOREVER);
   assert (callback0.failure instanceof IOException);
   assert (!client0.isActive());
@@ -170,10 +177,10 @@ public class RequestTimeoutIntegrationSuite {
   clientFactory.createClient(TestUtils.getLocalHost(), server.getPort());
 TestCallback callback1 = new TestCallback();
 synchronized (callback1) {
-  client1.sendRpc(new byte[0], callback1);
+  client1.sendRpc(ByteBuffer.allocate(0), callback1);
   callback1.wait(FOREVER);
-  assert (callback1.success.length == response.length);
-  assert (callback1.failure == null);
+  assertEquals(responseSize, callback1.successLength);
+  assertNull(callback1.failure);
 }
   }
 
@@ -191,7 +198,10 @@ public class RequestTimeoutIntegrationSuite {
 };
 RpcHandler handler = new RpcHandler() {
   @Override
-  public void receive(TransportClient client, byte[] message, 
RpcResponseCallback callback) {
+  public void receive(
+  TransportClient client,
+  ByteBuffer message,
+  RpcResponseCallback callback) {
 throw new

spark git commit: [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown

2015-11-30 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master 9bf212067 -> 96bf468c7


[SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown

Avoid potential deadlock with a user app's shutdown hook thread by more 
narrowly synchronizing access to 'hooks'

Author: Sean Owen 

Closes #10042 from srowen/SPARK-12049.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96bf468c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96bf468c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96bf468c

Branch: refs/heads/master
Commit: 96bf468c7860be317c20ccacf259910968d2dc83
Parents: 9bf2120
Author: Sean Owen 
Authored: Mon Nov 30 17:33:09 2015 -0800
Committer: Marcelo Vanzin 
Committed: Mon Nov 30 17:33:09 2015 -0800

--
 .../apache/spark/util/ShutdownHookManager.scala | 33 ++--
 1 file changed, 16 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/96bf468c/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 4012dca..620f226 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -206,7 +206,7 @@ private[spark] object ShutdownHookManager extends Logging {
 private [util] class SparkShutdownHookManager {
 
   private val hooks = new PriorityQueue[SparkShutdownHook]()
-  private var shuttingDown = false
+  @volatile private var shuttingDown = false
 
   /**
* Install a hook to run at shutdown and run all registered hooks in order. 
Hadoop 1.x does not
@@ -232,28 +232,27 @@ private [util] class SparkShutdownHookManager {
 }
   }
 
-  def runAll(): Unit = synchronized {
+  def runAll(): Unit = {
 shuttingDown = true
-while (!hooks.isEmpty()) {
-  Try(Utils.logUncaughtExceptions(hooks.poll().run()))
+var nextHook: SparkShutdownHook = null
+while ({ nextHook = hooks.synchronized { hooks.poll() }; nextHook != null 
}) {
+  Try(Utils.logUncaughtExceptions(nextHook.run()))
 }
   }
 
-  def add(priority: Int, hook: () => Unit): AnyRef = synchronized {
-checkState()
-val hookRef = new SparkShutdownHook(priority, hook)
-hooks.add(hookRef)
-hookRef
-  }
-
-  def remove(ref: AnyRef): Boolean = synchronized {
-hooks.remove(ref)
+  def add(priority: Int, hook: () => Unit): AnyRef = {
+hooks.synchronized {
+  if (shuttingDown) {
+throw new IllegalStateException("Shutdown hooks cannot be modified 
during shutdown.")
+  }
+  val hookRef = new SparkShutdownHook(priority, hook)
+  hooks.add(hookRef)
+  hookRef
+}
   }
 
-  private def checkState(): Unit = {
-if (shuttingDown) {
-  throw new IllegalStateException("Shutdown hooks cannot be modified 
during shutdown.")
-}
+  def remove(ref: AnyRef): Boolean = {
+hooks.synchronized { hooks.remove(ref) }
   }
 
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown

2015-11-30 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 ef6f8c262 -> 2fc3fce8a


[SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown

Avoid potential deadlock with a user app's shutdown hook thread by more 
narrowly synchronizing access to 'hooks'

Author: Sean Owen 

Closes #10042 from srowen/SPARK-12049.

(cherry picked from commit 96bf468c7860be317c20ccacf259910968d2dc83)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2fc3fce8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2fc3fce8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2fc3fce8

Branch: refs/heads/branch-1.6
Commit: 2fc3fce8aee69056f3a6a9c68b450b3802fb3a55
Parents: ef6f8c2
Author: Sean Owen 
Authored: Mon Nov 30 17:33:09 2015 -0800
Committer: Marcelo Vanzin 
Committed: Mon Nov 30 17:33:20 2015 -0800

--
 .../apache/spark/util/ShutdownHookManager.scala | 33 ++--
 1 file changed, 16 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2fc3fce8/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 4012dca..620f226 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -206,7 +206,7 @@ private[spark] object ShutdownHookManager extends Logging {
 private [util] class SparkShutdownHookManager {
 
   private val hooks = new PriorityQueue[SparkShutdownHook]()
-  private var shuttingDown = false
+  @volatile private var shuttingDown = false
 
   /**
* Install a hook to run at shutdown and run all registered hooks in order. 
Hadoop 1.x does not
@@ -232,28 +232,27 @@ private [util] class SparkShutdownHookManager {
 }
   }
 
-  def runAll(): Unit = synchronized {
+  def runAll(): Unit = {
 shuttingDown = true
-while (!hooks.isEmpty()) {
-  Try(Utils.logUncaughtExceptions(hooks.poll().run()))
+var nextHook: SparkShutdownHook = null
+while ({ nextHook = hooks.synchronized { hooks.poll() }; nextHook != null 
}) {
+  Try(Utils.logUncaughtExceptions(nextHook.run()))
 }
   }
 
-  def add(priority: Int, hook: () => Unit): AnyRef = synchronized {
-checkState()
-val hookRef = new SparkShutdownHook(priority, hook)
-hooks.add(hookRef)
-hookRef
-  }
-
-  def remove(ref: AnyRef): Boolean = synchronized {
-hooks.remove(ref)
+  def add(priority: Int, hook: () => Unit): AnyRef = {
+hooks.synchronized {
+  if (shuttingDown) {
+throw new IllegalStateException("Shutdown hooks cannot be modified 
during shutdown.")
+  }
+  val hookRef = new SparkShutdownHook(priority, hook)
+  hooks.add(hookRef)
+  hookRef
+}
   }
 
-  private def checkState(): Unit = {
-if (shuttingDown) {
-  throw new IllegalStateException("Shutdown hooks cannot be modified 
during shutdown.")
-}
+  def remove(ref: AnyRef): Boolean = {
+hooks.synchronized { hooks.remove(ref) }
   }
 
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12035] Add more debug information in include_example tag of Jekyll

2015-11-30 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master d3ca8cfac -> e6dc89a33


[SPARK-12035] Add more debug information in include_example tag of Jekyll

https://issues.apache.org/jira/browse/SPARK-12035

When we debuging lots of example code files, like in 
https://github.com/apache/spark/pull/10002, it's hard to know which file causes 
errors due to limited information in `include_example.rb`. With their 
filenames, we can locate bugs easily.

Author: Xusen Yin 

Closes #10026 from yinxusen/SPARK-12035.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e6dc89a3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e6dc89a3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e6dc89a3

Branch: refs/heads/master
Commit: e6dc89a33951e9197a77dbcacf022c27469ae41e
Parents: d3ca8cf
Author: Xusen Yin 
Authored: Mon Nov 30 17:18:44 2015 -0800
Committer: Andrew Or 
Committed: Mon Nov 30 17:18:44 2015 -0800

--
 docs/_plugins/include_example.rb | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e6dc89a3/docs/_plugins/include_example.rb
--
diff --git a/docs/_plugins/include_example.rb b/docs/_plugins/include_example.rb
index 564c866..f748582 100644
--- a/docs/_plugins/include_example.rb
+++ b/docs/_plugins/include_example.rb
@@ -75,10 +75,10 @@ module Jekyll
 .select { |l, i| l.include? "$example off$" }
 .map { |l, i| i }
 
-  raise "Start indices amount is not equal to end indices amount, please 
check the code." \
+  raise "Start indices amount is not equal to end indices amount, see 
#{@file}." \
 unless startIndices.size == endIndices.size
 
-  raise "No code is selected by include_example, please check the code." \
+  raise "No code is selected by include_example, see #{@file}." \
 if startIndices.size == 0
 
   # Select and join code blocks together, with a space line between each 
of two continuous
@@ -86,8 +86,10 @@ module Jekyll
   lastIndex = -1
   result = ""
   startIndices.zip(endIndices).each do |start, endline|
-raise "Overlapping between two example code blocks are not allowed." 
if start <= lastIndex
-raise "$example on$ should not be in the same line with $example 
off$." if start == endline
+raise "Overlapping between two example code blocks are not allowed, 
see #{@file}." \
+if start <= lastIndex
+raise "$example on$ should not be in the same line with $example off$, 
see #{@file}." \
+if start == endline
 lastIndex = endline
 range = Range.new(start + 1, endline - 1)
 result += trim_codeblock(lines[range]).join


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.

2015-11-30 Thread andrewor14

[SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.

This change seems large, but most of it is just replacing `byte[]`
with `ByteBuffer` and `new byte[]` with `ByteBuffer.allocate()`,
since it changes the network library's API.

The following are parts of the code that actually have meaningful
changes:

- The Message implementations were changed to inherit from a new
  AbstractMessage that can optionally hold a reference to a body
  (in the form of a ManagedBuffer); this is similar to how
  ResponseWithBody worked before, except now it's not restricted
  to just responses.

- The TransportFrameDecoder was pretty much rewritten to avoid
  copies as much as possible; it doesn't rely on CompositeByteBuf
  to accumulate incoming data anymore, since CompositeByteBuf
  has issues when slices are retained. The code now is able to
  create frames without having to resort to copying bytes except
  for a few bytes (containing the frame length) in very rare cases.

- Some minor changes in the SASL layer to convert things back to
  `byte[]` since the JDK SASL API operates on those.

Author: Marcelo Vanzin 

Closes #9987 from vanzin/SPARK-12007.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bf21206
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bf21206
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bf21206

Branch: refs/heads/master
Commit: 9bf2120672ae0f620a217ccd96bef189ab75e0d6
Parents: 0a46e43
Author: Marcelo Vanzin 
Authored: Mon Nov 30 17:22:05 2015 -0800
Committer: Andrew Or 
Committed: Mon Nov 30 17:22:05 2015 -0800

--
 .../mesos/MesosExternalShuffleService.scala |   3 +-
 .../network/netty/NettyBlockRpcServer.scala |   8 +-
 .../netty/NettyBlockTransferService.scala   |   6 +-
 .../apache/spark/rpc/netty/NettyRpcEnv.scala|  16 +--
 .../org/apache/spark/rpc/netty/Outbox.scala |   9 +-
 .../spark/rpc/netty/NettyRpcHandlerSuite.scala  |   3 +-
 .../network/client/RpcResponseCallback.java |   4 +-
 .../spark/network/client/TransportClient.java   |  16 ++-
 .../client/TransportResponseHandler.java|  16 ++-
 .../spark/network/protocol/AbstractMessage.java |  54 +++
 .../protocol/AbstractResponseMessage.java   |  32 +
 .../network/protocol/ChunkFetchFailure.java |   2 +-
 .../network/protocol/ChunkFetchRequest.java |   2 +-
 .../network/protocol/ChunkFetchSuccess.java |   8 +-
 .../apache/spark/network/protocol/Message.java  |  11 +-
 .../spark/network/protocol/MessageEncoder.java  |  29 ++--
 .../spark/network/protocol/OneWayMessage.java   |  33 +++--
 .../network/protocol/ResponseWithBody.java  |  40 --
 .../spark/network/protocol/RpcFailure.java  |   2 +-
 .../spark/network/protocol/RpcRequest.java  |  34 +++--
 .../spark/network/protocol/RpcResponse.java |  39 +++--
 .../spark/network/protocol/StreamFailure.java   |   2 +-
 .../spark/network/protocol/StreamRequest.java   |   2 +-
 .../spark/network/protocol/StreamResponse.java  |  19 +--
 .../spark/network/sasl/SaslClientBootstrap.java |  12 +-
 .../apache/spark/network/sasl/SaslMessage.java  |  31 ++--
 .../spark/network/sasl/SaslRpcHandler.java  |  26 +++-
 .../spark/network/server/MessageHandler.java|   2 +-
 .../spark/network/server/NoOpRpcHandler.java|   8 +-
 .../apache/spark/network/server/RpcHandler.java |   8 +-
 .../network/server/TransportChannelHandler.java |   2 +-
 .../network/server/TransportRequestHandler.java |  15 +-
 .../apache/spark/network/util/JavaUtils.java|  48 ---
 .../network/util/TransportFrameDecoder.java | 142 +--
 .../network/ChunkFetchIntegrationSuite.java |   5 +-
 .../org/apache/spark/network/ProtocolSuite.java |  10 +-
 .../network/RequestTimeoutIntegrationSuite.java |  51 ---
 .../spark/network/RpcIntegrationSuite.java  |  26 ++--
 .../org/apache/spark/network/StreamSuite.java   |   5 +-
 .../network/TransportResponseHandlerSuite.java  |  24 ++--
 .../spark/network/sasl/SparkSaslSuite.java  |  43 --
 .../util/TransportFrameDecoderSuite.java|  23 ++-
 .../shuffle/ExternalShuffleBlockHandler.java|   9 +-
 .../network/shuffle/ExternalShuffleClient.java  |   3 +-
 .../network/shuffle/OneForOneBlockFetcher.java  |   7 +-
 .../mesos/MesosExternalShuffleClient.java   |   5 +-
 .../shuffle/protocol/BlockTransferMessage.java  |   8 +-
 .../network/sasl/SaslIntegrationSuite.java  |  18 +--
 .../shuffle/BlockTransferMessagesSuite.java |   2 +-
 .../ExternalShuffleBlockHandlerSuite.java   |  21 +--
 .../shuffle/OneForOneBlockFetcherSuite.java |   8 +-
 51 files changed, 617 insertions(+), 335 deletions(-)
--

[2/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.

2015-11-30 Thread andrewor14

[SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.

This change seems large, but most of it is just replacing `byte[]`
with `ByteBuffer` and `new byte[]` with `ByteBuffer.allocate()`,
since it changes the network library's API.

The following are parts of the code that actually have meaningful
changes:

- The Message implementations were changed to inherit from a new
  AbstractMessage that can optionally hold a reference to a body
  (in the form of a ManagedBuffer); this is similar to how
  ResponseWithBody worked before, except now it's not restricted
  to just responses.

- The TransportFrameDecoder was pretty much rewritten to avoid
  copies as much as possible; it doesn't rely on CompositeByteBuf
  to accumulate incoming data anymore, since CompositeByteBuf
  has issues when slices are retained. The code now is able to
  create frames without having to resort to copying bytes except
  for a few bytes (containing the frame length) in very rare cases.

- Some minor changes in the SASL layer to convert things back to
  `byte[]` since the JDK SASL API operates on those.

Author: Marcelo Vanzin 

Closes #9987 from vanzin/SPARK-12007.

(cherry picked from commit 9bf2120672ae0f620a217ccd96bef189ab75e0d6)
Signed-off-by: Andrew Or 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef6f8c26
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef6f8c26
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef6f8c26

Branch: refs/heads/branch-1.6
Commit: ef6f8c262648c42041e71ee1578308e205225ccf
Parents: a4e13482
Author: Marcelo Vanzin 
Authored: Mon Nov 30 17:22:05 2015 -0800
Committer: Andrew Or 
Committed: Mon Nov 30 17:22:12 2015 -0800

--
 .../mesos/MesosExternalShuffleService.scala |   3 +-
 .../network/netty/NettyBlockRpcServer.scala |   8 +-
 .../netty/NettyBlockTransferService.scala   |   6 +-
 .../apache/spark/rpc/netty/NettyRpcEnv.scala|  16 +--
 .../org/apache/spark/rpc/netty/Outbox.scala |   9 +-
 .../spark/rpc/netty/NettyRpcHandlerSuite.scala  |   3 +-
 .../network/client/RpcResponseCallback.java |   4 +-
 .../spark/network/client/TransportClient.java   |  16 ++-
 .../client/TransportResponseHandler.java|  16 ++-
 .../spark/network/protocol/AbstractMessage.java |  54 +++
 .../protocol/AbstractResponseMessage.java   |  32 +
 .../network/protocol/ChunkFetchFailure.java |   2 +-
 .../network/protocol/ChunkFetchRequest.java |   2 +-
 .../network/protocol/ChunkFetchSuccess.java |   8 +-
 .../apache/spark/network/protocol/Message.java  |  11 +-
 .../spark/network/protocol/MessageEncoder.java  |  29 ++--
 .../spark/network/protocol/OneWayMessage.java   |  33 +++--
 .../network/protocol/ResponseWithBody.java  |  40 --
 .../spark/network/protocol/RpcFailure.java  |   2 +-
 .../spark/network/protocol/RpcRequest.java  |  34 +++--
 .../spark/network/protocol/RpcResponse.java |  39 +++--
 .../spark/network/protocol/StreamFailure.java   |   2 +-
 .../spark/network/protocol/StreamRequest.java   |   2 +-
 .../spark/network/protocol/StreamResponse.java  |  19 +--
 .../spark/network/sasl/SaslClientBootstrap.java |  12 +-
 .../apache/spark/network/sasl/SaslMessage.java  |  31 ++--
 .../spark/network/sasl/SaslRpcHandler.java  |  26 +++-
 .../spark/network/server/MessageHandler.java|   2 +-
 .../spark/network/server/NoOpRpcHandler.java|   8 +-
 .../apache/spark/network/server/RpcHandler.java |   8 +-
 .../network/server/TransportChannelHandler.java |   2 +-
 .../network/server/TransportRequestHandler.java |  15 +-
 .../apache/spark/network/util/JavaUtils.java|  48 ---
 .../network/util/TransportFrameDecoder.java | 142 +--
 .../network/ChunkFetchIntegrationSuite.java |   5 +-
 .../org/apache/spark/network/ProtocolSuite.java |  10 +-
 .../network/RequestTimeoutIntegrationSuite.java |  51 ---
 .../spark/network/RpcIntegrationSuite.java  |  26 ++--
 .../org/apache/spark/network/StreamSuite.java   |   5 +-
 .../network/TransportResponseHandlerSuite.java  |  24 ++--
 .../spark/network/sasl/SparkSaslSuite.java  |  43 --
 .../util/TransportFrameDecoderSuite.java|  23 ++-
 .../shuffle/ExternalShuffleBlockHandler.java|   9 +-
 .../network/shuffle/ExternalShuffleClient.java  |   3 +-
 .../network/shuffle/OneForOneBlockFetcher.java  |   7 +-
 .../mesos/MesosExternalShuffleClient.java   |   5 +-
 .../shuffle/protocol/BlockTransferMessage.java  |   8 +-
 .../network/sasl/SaslIntegrationSuite.java  |  18 +--
 .../shuffle/BlockTransferMessagesSuite.java |   2 +-
 .../ExternalShuffleBlockHandlerSuite.java   |  21 +--
 .../shuffle/OneForOneBlockFetcherSuite.java |   8 +-
 51 files changed, 617 insertions(+), 335 deletions(-)

spark git commit: [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown

2015-11-30 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 9b6216171 -> d78f1bc45


[SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown

Avoid potential deadlock with a user app's shutdown hook thread by more 
narrowly synchronizing access to 'hooks'

Author: Sean Owen 

Closes #10042 from srowen/SPARK-12049.

(cherry picked from commit 96bf468c7860be317c20ccacf259910968d2dc83)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d78f1bc4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d78f1bc4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d78f1bc4

Branch: refs/heads/branch-1.5
Commit: d78f1bc45ff04637761bcfcc565ce40fb587a5a9
Parents: 9b62161
Author: Sean Owen 
Authored: Mon Nov 30 17:33:09 2015 -0800
Committer: Marcelo Vanzin 
Committed: Mon Nov 30 17:33:32 2015 -0800

--
 .../apache/spark/util/ShutdownHookManager.scala | 33 ++--
 1 file changed, 16 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d78f1bc4/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 091dc03..fddeceb 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -206,7 +206,7 @@ private[spark] object ShutdownHookManager extends Logging {
 private [util] class SparkShutdownHookManager {
 
   private val hooks = new PriorityQueue[SparkShutdownHook]()
-  private var shuttingDown = false
+  @volatile private var shuttingDown = false
 
   /**
* Install a hook to run at shutdown and run all registered hooks in order. 
Hadoop 1.x does not
@@ -230,28 +230,27 @@ private [util] class SparkShutdownHookManager {
 }
   }
 
-  def runAll(): Unit = synchronized {
+  def runAll(): Unit = {
 shuttingDown = true
-while (!hooks.isEmpty()) {
-  Try(Utils.logUncaughtExceptions(hooks.poll().run()))
+var nextHook: SparkShutdownHook = null
+while ({ nextHook = hooks.synchronized { hooks.poll() }; nextHook != null 
}) {
+  Try(Utils.logUncaughtExceptions(nextHook.run()))
 }
   }
 
-  def add(priority: Int, hook: () => Unit): AnyRef = synchronized {
-checkState()
-val hookRef = new SparkShutdownHook(priority, hook)
-hooks.add(hookRef)
-hookRef
-  }
-
-  def remove(ref: AnyRef): Boolean = synchronized {
-hooks.remove(ref)
+  def add(priority: Int, hook: () => Unit): AnyRef = {
+hooks.synchronized {
+  if (shuttingDown) {
+throw new IllegalStateException("Shutdown hooks cannot be modified 
during shutdown.")
+  }
+  val hookRef = new SparkShutdownHook(priority, hook)
+  hooks.add(hookRef)
+  hookRef
+}
   }
 
-  private def checkState(): Unit = {
-if (shuttingDown) {
-  throw new IllegalStateException("Shutdown hooks cannot be modified 
during shutdown.")
-}
+  def remove(ref: AnyRef): Boolean = {
+hooks.synchronized { hooks.remove(ref) }
   }
 
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.

2015-11-30 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 2fc3fce8a -> 86a46ce68


[HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.

I accidentally omitted these as part of #10049.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86a46ce6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86a46ce6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86a46ce6

Branch: refs/heads/branch-1.6
Commit: 86a46ce686cd140aa9854e2d24cfd3313cd8b894
Parents: 2fc3fce
Author: Josh Rosen 
Authored: Mon Nov 30 17:15:47 2015 -0800
Committer: Josh Rosen 
Committed: Mon Nov 30 18:26:48 2015 -0800

--
 docs/_plugins/copy_api_dirs.rb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/86a46ce6/docs/_plugins/copy_api_dirs.rb
--
diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb
index f2f3e2e..174c202 100644
--- a/docs/_plugins/copy_api_dirs.rb
+++ b/docs/_plugins/copy_api_dirs.rb
@@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1')
 
   puts "Moving to python/docs directory and building sphinx."
   cd("../python/docs")
-  system(make html) || raise("Python doc generation failed")
+  system("make html") || raise("Python doc generation failed")
 
   puts "Moving back into home dir."
   cd("../../")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.

2015-11-30 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 96bf468c7 -> f73379be2


[HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.

I accidentally omitted these as part of #10049.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f73379be
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f73379be
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f73379be

Branch: refs/heads/master
Commit: f73379be2b0c286957b678a996cb56afc96015eb
Parents: 96bf468
Author: Josh Rosen 
Authored: Mon Nov 30 17:15:47 2015 -0800
Committer: Josh Rosen 
Committed: Mon Nov 30 18:25:59 2015 -0800

--
 docs/_plugins/copy_api_dirs.rb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f73379be/docs/_plugins/copy_api_dirs.rb
--
diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb
index f2f3e2e..174c202 100644
--- a/docs/_plugins/copy_api_dirs.rb
+++ b/docs/_plugins/copy_api_dirs.rb
@@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1')
 
   puts "Moving to python/docs directory and building sphinx."
   cd("../python/docs")
-  system(make html) || raise("Python doc generation failed")
+  system("make html") || raise("Python doc generation failed")
 
   puts "Moving back into home dir."
   cd("../../")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters

2015-11-30 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 0978ec11c -> a8c6d8acc


[SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters

```EventLoggingListener.getLogPath``` needs 4 input arguments:
https://github.com/apache/spark/blob/v1.6.0-preview2/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L276-L280

the 3rd parameter should be appAttemptId, 4th parameter is codec...

Author: Teng Qiu 

Closes #10044 from chutium/SPARK-12053.

(cherry picked from commit a8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce)
Signed-off-by: Kousuke Saruta 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8c6d8ac
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8c6d8ac
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8c6d8ac

Branch: refs/heads/branch-1.6
Commit: a8c6d8accae7712568f583792ed2c0a3fce06776
Parents: 0978ec1
Author: Teng Qiu 
Authored: Tue Dec 1 07:27:32 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Dec 1 07:28:02 2015 +0900

--
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a8c6d8ac/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index 9952c97..1355e1a 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -934,7 +934,7 @@ private[deploy] class Master(
 }
 
   val eventLogFilePrefix = EventLoggingListener.getLogPath(
-  eventLogDir, app.id, app.desc.eventLogCodec)
+  eventLogDir, app.id, appAttemptId = None, compressionCodecName = 
app.desc.eventLogCodec)
   val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf)
   val inProgressExists = fs.exists(new Path(eventLogFilePrefix +
   EventLoggingListener.IN_PROGRESS))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters

2015-11-30 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/master 2c5dee0fb -> a8ceec5e8


[SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters

```EventLoggingListener.getLogPath``` needs 4 input arguments:
https://github.com/apache/spark/blob/v1.6.0-preview2/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L276-L280

the 3rd parameter should be appAttemptId, 4th parameter is codec...

Author: Teng Qiu 

Closes #10044 from chutium/SPARK-12053.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8ceec5e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8ceec5e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8ceec5e

Branch: refs/heads/master
Commit: a8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce
Parents: 2c5dee0
Author: Teng Qiu 
Authored: Tue Dec 1 07:27:32 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Dec 1 07:27:32 2015 +0900

--
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a8ceec5e/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index 9952c97..1355e1a 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -934,7 +934,7 @@ private[deploy] class Master(
 }
 
   val eventLogFilePrefix = EventLoggingListener.getLogPath(
-  eventLogDir, app.id, app.desc.eventLogCodec)
+  eventLogDir, app.id, appAttemptId = None, compressionCodecName = 
app.desc.eventLogCodec)
   val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf)
   val inProgressExists = fs.exists(new Path(eventLogFilePrefix +
   EventLoggingListener.IN_PROGRESS))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11960][MLLIB][DOC] User guide for streaming tests

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a387cef3a -> ebf87ebc0


[SPARK-11960][MLLIB][DOC] User guide for streaming tests

CC jkbradley mengxr josepablocam

Author: Feynman Liang 

Closes #10005 from feynmanliang/streaming-test-user-guide.

(cherry picked from commit 55358889309cf2d856b72e72e0f3081dfdf61cfa)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ebf87ebc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ebf87ebc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ebf87ebc

Branch: refs/heads/branch-1.6
Commit: ebf87ebc02075497f4682e3ad0f8e63d33f3b86e
Parents: a387cef
Author: Feynman Liang 
Authored: Mon Nov 30 15:38:44 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 15:38:51 2015 -0800

--
 docs/mllib-guide.md |  1 +
 docs/mllib-statistics.md| 25 
 .../examples/mllib/StreamingTestExample.scala   |  2 ++
 3 files changed, 28 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 54e35fc..43772ad 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -34,6 +34,7 @@ We list major functionality from both below, with links to 
detailed guides.
   * [correlations](mllib-statistics.html#correlations)
   * [stratified sampling](mllib-statistics.html#stratified-sampling)
   * [hypothesis testing](mllib-statistics.html#hypothesis-testing)
+  * [streaming significance 
testing](mllib-statistics.html#streaming-significance-testing)
   * [random data generation](mllib-statistics.html#random-data-generation)
 * [Classification and regression](mllib-classification-regression.html)
   * [linear models (SVMs, logistic regression, linear 
regression)](mllib-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/docs/mllib-statistics.md
--
diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md
index ade5b07..de209f6 100644
--- a/docs/mllib-statistics.md
+++ b/docs/mllib-statistics.md
@@ -521,6 +521,31 @@ print(testResult) # summary of the test including the 
p-value, test statistic,
 
 
 
+### Streaming Significance Testing
+MLlib provides online implementations of some tests to support use cases
+like A/B testing. These tests may be performed on a Spark Streaming
+`DStream[(Boolean,Double)]` where the first element of each tuple
+indicates control group (`false`) or treatment group (`true`) and the
+second element is the value of an observation.
+
+Streaming significance testing supports the following parameters:
+
+* `peacePeriod` - The number of initial data points from the stream to
+ignore, used to mitigate novelty effects.
+* `windowSize` - The number of past batches to perform hypothesis
+testing over. Setting to `0` will perform cumulative processing using
+all prior batches.
+
+
+
+
+[`StreamingTest`](api/scala/index.html#org.apache.spark.mllib.stat.test.StreamingTest)
+provides streaming hypothesis testing.
+
+{% include_example 
scala/org/apache/spark/examples/mllib/StreamingTestExample.scala %}
+
+
+
 
 ## Random data generation
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
index ab29f90..b6677c6 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
@@ -64,6 +64,7 @@ object StreamingTestExample {
   dir.toString
 })
 
+// $example on$
 val data = ssc.textFileStream(dataDir).map(line => line.split(",") match {
   case Array(label, value) => (label.toBoolean, value.toDouble)
 })
@@ -75,6 +76,7 @@ object StreamingTestExample {
 
 val out = streamingTest.registerStream(data)
 out.print()
+// $example off$
 
 // Stop processing if test becomes significant or we time out
 var timeoutCounter = numBatchesTimeout


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters

2015-11-30 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 7900d192e -> 9b6216171


[SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters

```EventLoggingListener.getLogPath``` needs 4 input arguments:
https://github.com/apache/spark/blob/v1.6.0-preview2/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L276-L280

the 3rd parameter should be appAttemptId, 4th parameter is codec...

Author: Teng Qiu 

Closes #10044 from chutium/SPARK-12053.

(cherry picked from commit a8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce)
Signed-off-by: Kousuke Saruta 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9b621617
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9b621617
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9b621617

Branch: refs/heads/branch-1.5
Commit: 9b62161715950c1b76385cf2c2e2e46213fba437
Parents: 7900d19
Author: Teng Qiu 
Authored: Tue Dec 1 07:27:32 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Dec 1 07:28:22 2015 +0900

--
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9b621617/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index 26904d3..c9b1c04 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -925,7 +925,7 @@ private[deploy] class Master(
 }
 
   val eventLogFilePrefix = EventLoggingListener.getLogPath(
-  eventLogDir, app.id, app.desc.eventLogCodec)
+  eventLogDir, app.id, appAttemptId = None, compressionCodecName = 
app.desc.eventLogCodec)
   val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf)
   val inProgressExists = fs.exists(new Path(eventLogFilePrefix +
   EventLoggingListener.IN_PROGRESS))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a8c6d8acc -> 1562ef10f


[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

jira: https://issues.apache.org/jira/browse/SPARK-11689

Add simple user guide for LDA under spark.ml and example code under examples/. 
Use include_example to include example code in the user guide markdown. Check 
SPARK-11606 for instructions.

Original PR is reverted due to document build error. 
https://github.com/apache/spark/pull/9722

mengxr feynmanliang yinxusen  Sorry for the troubling.

Author: Yuhao Yang 

Closes #9974 from hhbyyh/ldaMLExample.

(cherry picked from commit e232720a65dfb9ae6135cbb7674e35eddd88d625)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1562ef10
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1562ef10
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1562ef10

Branch: refs/heads/branch-1.6
Commit: 1562ef10f5d1722a6c275726083684e6d0463a4f
Parents: a8c6d8a
Author: Yuhao Yang 
Authored: Mon Nov 30 14:56:51 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 14:56:58 2015 -0800

--
 docs/ml-clustering.md   | 31 +++
 docs/ml-guide.md|  3 +-
 docs/mllib-guide.md |  1 +
 .../spark/examples/ml/JavaLDAExample.java   | 97 
 .../apache/spark/examples/ml/LDAExample.scala   | 77 
 5 files changed, 208 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/docs/ml-clustering.md
--
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
new file mode 100644
index 000..cfefb5d
--- /dev/null
+++ b/docs/ml-clustering.md
@@ -0,0 +1,31 @@
+---
+layout: global
+title: Clustering - ML
+displayTitle: ML - Clustering
+---
+
+In this section, we introduce the pipeline API for [clustering in 
mllib](mllib-clustering.html).
+
+## Latent Dirichlet allocation (LDA)
+
+`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and 
`OnlineLDAOptimizer`,
+and generates a `LDAModel` as the base models. Expert users may cast a 
`LDAModel` generated by
+`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
+
+
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
+
+{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
+
+
+
+
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) 
for more details.
+
+{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
+
+
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index be18a05..6f35b30 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the 
`spark.ml` API; e.g.,
 provide class probabilities, and linear models provide model summaries.
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision Trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -950,4 +951,4 @@ model.transform(test)
 {% endhighlight %}
 
 
-
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 91e50cc..54e35fc 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,6 +69,7 @@ We list major functionality from both below, with links to 
detailed guides.
 concepts. It also contains sections on using algorithms within the Pipelines 
API, for example:
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
new

spark git commit: fix Maven build

2015-11-30 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 ebf87ebc0 -> 436151780


fix Maven build


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43615178
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43615178
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43615178

Branch: refs/heads/branch-1.6
Commit: 436151780fa6ba2c318ad725c43fcd8a4dcbf00b
Parents: ebf87eb
Author: Davies Liu 
Authored: Mon Nov 30 15:42:10 2015 -0800
Committer: Davies Liu 
Committed: Mon Nov 30 15:45:29 2015 -0800

--
 .../src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/43615178/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
index 507641f..a781777 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
@@ -43,7 +43,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
* populated by the query planning infrastructure.
*/
   @transient
-  protected[spark] final val sqlContext = SQLContext.getActive().get
+  protected[spark] final val sqlContext = 
SQLContext.getActive().getOrElse(null)
 
   protected def sparkContext = sqlContext.sparkContext
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master e232720a6 -> de64b65f7


[SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)

Remove duplicate mllib example (DT/RF/GBT in Java/Python).
Since we have tutorial code for DT/RF/GBT classification/regression in 
Scala/Java/Python and example applications for DT/RF/GBT in Scala, so we mark 
these as duplicated and remove them.
mengxr

Author: Yanbo Liang 

Closes #9954 from yanboliang/SPARK-11975.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/de64b65f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/de64b65f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/de64b65f

Branch: refs/heads/master
Commit: de64b65f7cf2ac58c1abc310ba547637fdbb8557
Parents: e232720
Author: Yanbo Liang 
Authored: Mon Nov 30 15:01:08 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 15:01:08 2015 -0800

--
 .../spark/examples/mllib/JavaDecisionTree.java  | 116 ---
 .../mllib/JavaGradientBoostedTreesRunner.java   | 126 
 .../examples/mllib/JavaRandomForestExample.java | 139 --
 .../main/python/mllib/decision_tree_runner.py   | 144 ---
 .../main/python/mllib/gradient_boosted_trees.py |  77 --
 .../main/python/mllib/random_forest_example.py  |  90 
 6 files changed, 692 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/de64b65f/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java 
b/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java
deleted file mode 100644
index 1f82e3f..000
--- 
a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java
+++ /dev/null
@@ -1,116 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.mllib;
-
-import java.util.HashMap;
-
-import scala.Tuple2;
-
-import org.apache.spark.api.java.function.Function2;
-import org.apache.spark.api.java.JavaPairRDD;
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.api.java.function.Function;
-import org.apache.spark.api.java.function.PairFunction;
-import org.apache.spark.mllib.regression.LabeledPoint;
-import org.apache.spark.mllib.tree.DecisionTree;
-import org.apache.spark.mllib.tree.model.DecisionTreeModel;
-import org.apache.spark.mllib.util.MLUtils;
-import org.apache.spark.SparkConf;
-
-/**
- * Classification and regression using decision trees.
- */
-public final class JavaDecisionTree {
-
-  public static void main(String[] args) {
-String datapath = "data/mllib/sample_libsvm_data.txt";
-if (args.length == 1) {
-  datapath = args[0];
-} else if (args.length > 1) {
-  System.err.println("Usage: JavaDecisionTree ");
-  System.exit(1);
-}
-SparkConf sparkConf = new SparkConf().setAppName("JavaDecisionTree");
-JavaSparkContext sc = new JavaSparkContext(sparkConf);
-
-JavaRDD data = MLUtils.loadLibSVMFile(sc.sc(), 
datapath).toJavaRDD().cache();
-
-// Compute the number of classes from the data.
-Integer numClasses = data.map(new Function() {
-  @Override public Double call(LabeledPoint p) {
-return p.label();
-  }
-}).countByValue().size();
-
-// Set parameters.
-//  Empty categoricalFeaturesInfo indicates all features are continuous.
-HashMap categoricalFeaturesInfo = new HashMap();
-String impurity = "gini";
-Integer maxDepth = 5;
-Integer maxBins = 32;
-
-// Train a DecisionTree model for classification.
-final DecisionTreeModel model = DecisionTree.trainClassifier(data, 
numClasses,
-  categoricalFeaturesInfo, impurity, maxDepth, maxBins);
-
-//

spark git commit: [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 1562ef10f -> a387cef3a


[SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)

Remove duplicate mllib example (DT/RF/GBT in Java/Python).
Since we have tutorial code for DT/RF/GBT classification/regression in 
Scala/Java/Python and example applications for DT/RF/GBT in Scala, so we mark 
these as duplicated and remove them.
mengxr

Author: Yanbo Liang 

Closes #9954 from yanboliang/SPARK-11975.

(cherry picked from commit de64b65f7cf2ac58c1abc310ba547637fdbb8557)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a387cef3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a387cef3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a387cef3

Branch: refs/heads/branch-1.6
Commit: a387cef3a40d47a8ca7fa9c6aa2842318700df49
Parents: 1562ef1
Author: Yanbo Liang 
Authored: Mon Nov 30 15:01:08 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 15:01:16 2015 -0800

--
 .../spark/examples/mllib/JavaDecisionTree.java  | 116 ---
 .../mllib/JavaGradientBoostedTreesRunner.java   | 126 
 .../examples/mllib/JavaRandomForestExample.java | 139 --
 .../main/python/mllib/decision_tree_runner.py   | 144 ---
 .../main/python/mllib/gradient_boosted_trees.py |  77 --
 .../main/python/mllib/random_forest_example.py  |  90 
 6 files changed, 692 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a387cef3/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java 
b/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java
deleted file mode 100644
index 1f82e3f..000
--- 
a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java
+++ /dev/null
@@ -1,116 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.examples.mllib;
-
-import java.util.HashMap;
-
-import scala.Tuple2;
-
-import org.apache.spark.api.java.function.Function2;
-import org.apache.spark.api.java.JavaPairRDD;
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.api.java.function.Function;
-import org.apache.spark.api.java.function.PairFunction;
-import org.apache.spark.mllib.regression.LabeledPoint;
-import org.apache.spark.mllib.tree.DecisionTree;
-import org.apache.spark.mllib.tree.model.DecisionTreeModel;
-import org.apache.spark.mllib.util.MLUtils;
-import org.apache.spark.SparkConf;
-
-/**
- * Classification and regression using decision trees.
- */
-public final class JavaDecisionTree {
-
-  public static void main(String[] args) {
-String datapath = "data/mllib/sample_libsvm_data.txt";
-if (args.length == 1) {
-  datapath = args[0];
-} else if (args.length > 1) {
-  System.err.println("Usage: JavaDecisionTree ");
-  System.exit(1);
-}
-SparkConf sparkConf = new SparkConf().setAppName("JavaDecisionTree");
-JavaSparkContext sc = new JavaSparkContext(sparkConf);
-
-JavaRDD data = MLUtils.loadLibSVMFile(sc.sc(), 
datapath).toJavaRDD().cache();
-
-// Compute the number of classes from the data.
-Integer numClasses = data.map(new Function() {
-  @Override public Double call(LabeledPoint p) {
-return p.label();
-  }
-}).countByValue().size();
-
-// Set parameters.
-//  Empty categoricalFeaturesInfo indicates all features are continuous.
-HashMap categoricalFeaturesInfo = new HashMap();
-String impurity = "gini";
-Integer maxDepth = 5;
-Integer maxBins = 32;
-
-// Train a DecisionTree model for classification.
-final DecisionTreeModel

spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

2015-11-30 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master a8ceec5e8 -> e232720a6


[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

jira: https://issues.apache.org/jira/browse/SPARK-11689

Add simple user guide for LDA under spark.ml and example code under examples/. 
Use include_example to include example code in the user guide markdown. Check 
SPARK-11606 for instructions.

Original PR is reverted due to document build error. 
https://github.com/apache/spark/pull/9722

mengxr feynmanliang yinxusen  Sorry for the troubling.

Author: Yuhao Yang 

Closes #9974 from hhbyyh/ldaMLExample.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e232720a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e232720a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e232720a

Branch: refs/heads/master
Commit: e232720a65dfb9ae6135cbb7674e35eddd88d625
Parents: a8ceec5
Author: Yuhao Yang 
Authored: Mon Nov 30 14:56:51 2015 -0800
Committer: Xiangrui Meng 
Committed: Mon Nov 30 14:56:51 2015 -0800

--
 docs/ml-clustering.md   | 31 +++
 docs/ml-guide.md|  3 +-
 docs/mllib-guide.md |  1 +
 .../spark/examples/ml/JavaLDAExample.java   | 97 
 .../apache/spark/examples/ml/LDAExample.scala   | 77 
 5 files changed, 208 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/docs/ml-clustering.md
--
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
new file mode 100644
index 000..cfefb5d
--- /dev/null
+++ b/docs/ml-clustering.md
@@ -0,0 +1,31 @@
+---
+layout: global
+title: Clustering - ML
+displayTitle: ML - Clustering
+---
+
+In this section, we introduce the pipeline API for [clustering in 
mllib](mllib-clustering.html).
+
+## Latent Dirichlet allocation (LDA)
+
+`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and 
`OnlineLDAOptimizer`,
+and generates a `LDAModel` as the base models. Expert users may cast a 
`LDAModel` generated by
+`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
+
+
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
+
+{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
+
+
+
+
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) 
for more details.
+
+{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
+
+
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index be18a05..6f35b30 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the 
`spark.ml` API; e.g.,
 provide class probabilities, and linear models provide model summaries.
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision Trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -950,4 +951,4 @@ model.transform(test)
 {% endhighlight %}
 
 
-
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 91e50cc..54e35fc 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,6 +69,7 @@ We list major functionality from both below, with links to 
detailed guides.
 concepts. It also contains sections on using algorithms within the Pipelines 
API, for example:
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
new file mode 100644
index 000..3a5d323
--- /dev/null
+++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java

spark git commit: fix Maven build

2015-11-30 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master 553588893 -> ecc00ec3f


fix Maven build


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ecc00ec3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ecc00ec3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ecc00ec3

Branch: refs/heads/master
Commit: ecc00ec3faf09b0e21bc4b468cb45e45d05cf482
Parents: 5535888
Author: Davies Liu 
Authored: Mon Nov 30 15:42:10 2015 -0800
Committer: Davies Liu 
Committed: Mon Nov 30 15:42:10 2015 -0800

--
 .../src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ecc00ec3/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
index 507641f..a781777 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
@@ -43,7 +43,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
* populated by the query planning infrastructure.
*/
   @transient
-  protected[spark] final val sqlContext = SQLContext.getActive().get
+  protected[spark] final val sqlContext = 
SQLContext.getActive().getOrElse(null)
 
   protected def sparkContext = sqlContext.sparkContext
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12000] Fix API doc generation issues

2015-11-30 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master edb26e7f4 -> d3ca8cfac


[SPARK-12000] Fix API doc generation issues

This pull request fixes multiple issues with API doc generation.

- Modify the Jekyll plugin so that the entire doc build fails if API docs 
cannot be generated. This will make it easy to detect when the doc build 
breaks, since this will now trigger Jenkins failures.
- Change how we handle the `-target` compiler option flag in order to fix 
`javadoc` generation.
- Incorporate doc changes from thunterdb (in #10048).

Closes #10048.

Author: Josh Rosen 
Author: Timothy Hunter 

Closes #10049 from JoshRosen/fix-doc-build.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3ca8cfa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3ca8cfa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3ca8cfa

Branch: refs/heads/master
Commit: d3ca8cfac286ae19f8bedc736877ea9d0a0a072c
Parents: edb26e7
Author: Josh Rosen 
Authored: Mon Nov 30 16:37:27 2015 -0800
Committer: Josh Rosen 
Committed: Mon Nov 30 16:37:27 2015 -0800

--
 docs/_plugins/copy_api_dirs.rb   |  6 +++---
 .../org/apache/spark/network/client/StreamCallback.java  |  4 ++--
 .../java/org/apache/spark/network/server/RpcHandler.java |  2 +-
 project/SparkBuild.scala | 11 ---
 4 files changed, 14 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d3ca8cfa/docs/_plugins/copy_api_dirs.rb
--
diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb
index 01718d9..f2f3e2e 100644
--- a/docs/_plugins/copy_api_dirs.rb
+++ b/docs/_plugins/copy_api_dirs.rb
@@ -27,7 +27,7 @@ if not (ENV['SKIP_API'] == '1')
 cd("..")
 
 puts "Running 'build/sbt -Pkinesis-asl clean compile unidoc' from " + pwd 
+ "; this may take a few minutes..."
-puts `build/sbt -Pkinesis-asl clean compile unidoc`
+system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc 
generation failed")
 
 puts "Moving back into docs dir."
 cd("docs")
@@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1')
 
   puts "Moving to python/docs directory and building sphinx."
   cd("../python/docs")
-  puts `make html`
+  system(make html) || raise("Python doc generation failed")
 
   puts "Moving back into home dir."
   cd("../../")
@@ -131,7 +131,7 @@ if not (ENV['SKIP_API'] == '1')
   # Build SparkR API docs
   puts "Moving to R directory and building roxygen docs."
   cd("R")
-  puts `./create-docs.sh`
+  system("./create-docs.sh") || raise("R doc generation failed")
 
   puts "Moving back into home dir."
   cd("../")

http://git-wip-us.apache.org/repos/asf/spark/blob/d3ca8cfa/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
--
diff --git 
a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
 
b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
index 093fada..51d34ca 100644
--- 
a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
+++ 
b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
@@ -21,8 +21,8 @@ import java.io.IOException;
 import java.nio.ByteBuffer;
 
 /**
- * Callback for streaming data. Stream data will be offered to the {@link 
onData(ByteBuffer)}
- * method as it arrives. Once all the stream data is received, {@link 
onComplete()} will be
+ * Callback for streaming data. Stream data will be offered to the {@link 
onData(String, ByteBuffer)}
+ * method as it arrives. Once all the stream data is received, {@link 
onComplete(String)} will be
  * called.
  * 
  * The network library guarantees that a single thread will call these methods 
at a time, but

http://git-wip-us.apache.org/repos/asf/spark/blob/d3ca8cfa/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
--
diff --git 
a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java 
b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
index 65109dd..1a11f7b 100644
--- 
a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
+++ 
b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
@@ -55,7 +55,7 @@ public abstract class RpcHandler {
 
   /**
* Receives an RPC message that does not expect a reply. The default 
implementation will
-   * call "{@link receive(TransportClient, byte[],

spark git commit: [SPARK-12000] Fix API doc generation issues

2015-11-30 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 436151780 -> 43ffa0373


[SPARK-12000] Fix API doc generation issues

This pull request fixes multiple issues with API doc generation.

- Modify the Jekyll plugin so that the entire doc build fails if API docs 
cannot be generated. This will make it easy to detect when the doc build 
breaks, since this will now trigger Jenkins failures.
- Change how we handle the `-target` compiler option flag in order to fix 
`javadoc` generation.
- Incorporate doc changes from thunterdb (in #10048).

Closes #10048.

Author: Josh Rosen 
Author: Timothy Hunter 

Closes #10049 from JoshRosen/fix-doc-build.

(cherry picked from commit d3ca8cfac286ae19f8bedc736877ea9d0a0a072c)
Signed-off-by: Josh Rosen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43ffa037
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43ffa037
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43ffa037

Branch: refs/heads/branch-1.6
Commit: 43ffa03738d1ffdf99604ac18de137c60c930550
Parents: 4361517
Author: Josh Rosen 
Authored: Mon Nov 30 16:37:27 2015 -0800
Committer: Josh Rosen 
Committed: Mon Nov 30 16:37:53 2015 -0800

--
 docs/_plugins/copy_api_dirs.rb   |  6 +++---
 .../org/apache/spark/network/client/StreamCallback.java  |  4 ++--
 .../java/org/apache/spark/network/server/RpcHandler.java |  2 +-
 project/SparkBuild.scala | 11 ---
 4 files changed, 14 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/43ffa037/docs/_plugins/copy_api_dirs.rb
--
diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb
index 01718d9..f2f3e2e 100644
--- a/docs/_plugins/copy_api_dirs.rb
+++ b/docs/_plugins/copy_api_dirs.rb
@@ -27,7 +27,7 @@ if not (ENV['SKIP_API'] == '1')
 cd("..")
 
 puts "Running 'build/sbt -Pkinesis-asl clean compile unidoc' from " + pwd 
+ "; this may take a few minutes..."
-puts `build/sbt -Pkinesis-asl clean compile unidoc`
+system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc 
generation failed")
 
 puts "Moving back into docs dir."
 cd("docs")
@@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1')
 
   puts "Moving to python/docs directory and building sphinx."
   cd("../python/docs")
-  puts `make html`
+  system(make html) || raise("Python doc generation failed")
 
   puts "Moving back into home dir."
   cd("../../")
@@ -131,7 +131,7 @@ if not (ENV['SKIP_API'] == '1')
   # Build SparkR API docs
   puts "Moving to R directory and building roxygen docs."
   cd("R")
-  puts `./create-docs.sh`
+  system("./create-docs.sh") || raise("R doc generation failed")
 
   puts "Moving back into home dir."
   cd("../")

http://git-wip-us.apache.org/repos/asf/spark/blob/43ffa037/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
--
diff --git 
a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
 
b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
index 093fada..51d34ca 100644
--- 
a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
+++ 
b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java
@@ -21,8 +21,8 @@ import java.io.IOException;
 import java.nio.ByteBuffer;
 
 /**
- * Callback for streaming data. Stream data will be offered to the {@link 
onData(ByteBuffer)}
- * method as it arrives. Once all the stream data is received, {@link 
onComplete()} will be
+ * Callback for streaming data. Stream data will be offered to the {@link 
onData(String, ByteBuffer)}
+ * method as it arrives. Once all the stream data is received, {@link 
onComplete(String)} will be
  * called.
  * 
  * The network library guarantees that a single thread will call these methods 
at a time, but

http://git-wip-us.apache.org/repos/asf/spark/blob/43ffa037/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
--
diff --git 
a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java 
b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
index 65109dd..1a11f7b 100644
--- 
a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
+++ 
b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java
@@ -55,7 +55,7 @@ public abstract class RpcHandler {
 
   /**
* Receives an RPC

spark git commit: [SPARK-12058][HOTFIX] Disable KinesisStreamTests

2015-11-30 Thread zsxwing

Repository: spark
Updated Branches:
  refs/heads/master ecc00ec3f -> edb26e7f4


[SPARK-12058][HOTFIX] Disable KinesisStreamTests

KinesisStreamTests in test.py is broken because of #9403. See 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46896/testReport/(root)/KinesisStreamTests/test_kinesis_stream/

Because Streaming Python didnât work when merging 
https://github.com/apache/spark/pull/9403, the PR build didnât report the 
Python test failure actually.

This PR just disabled the test to unblock #10039

Author: Shixiong Zhu 

Closes #10047 from zsxwing/disable-python-kinesis-test.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/edb26e7f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/edb26e7f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/edb26e7f

Branch: refs/heads/master
Commit: edb26e7f4e1164645971c9a139eb29ddec8acc5d
Parents: ecc00ec
Author: Shixiong Zhu 
Authored: Mon Nov 30 16:31:59 2015 -0800
Committer: Shixiong Zhu 
Committed: Mon Nov 30 16:31:59 2015 -0800

--
 python/pyspark/streaming/tests.py | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/edb26e7f/python/pyspark/streaming/tests.py
--
diff --git a/python/pyspark/streaming/tests.py 
b/python/pyspark/streaming/tests.py
index d380d69..a647e6b 100644
--- a/python/pyspark/streaming/tests.py
+++ b/python/pyspark/streaming/tests.py
@@ -1409,6 +1409,7 @@ class KinesisStreamTests(PySparkStreamingTestCase):
 InitialPositionInStream.LATEST, 2, StorageLevel.MEMORY_AND_DISK_2,
 "awsAccessKey", "awsSecretKey")
 
+@unittest.skip("Enable it when we fix SPAKR-12058")
 def test_kinesis_stream(self):
 if not are_kinesis_tests_enabled:
 sys.stderr.write(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12018][SQL] Refactor common subexpression elimination code

2015-11-30 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master f73379be2 -> 9693b0d5a


[SPARK-12018][SQL] Refactor common subexpression elimination code

JIRA: https://issues.apache.org/jira/browse/SPARK-12018

The code of common subexpression elimination can be factored and simplified. 
Some unnecessary variables can be removed.

Author: Liang-Chi Hsieh 

Closes #10009 from viirya/refactor-subexpr-eliminate.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9693b0d5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9693b0d5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9693b0d5

Branch: refs/heads/master
Commit: 9693b0d5a55bc1d2da96f04fe2c6de59a8dfcc1b
Parents: f73379b
Author: Liang-Chi Hsieh 
Authored: Mon Nov 30 20:56:42 2015 -0800
Committer: Michael Armbrust 
Committed: Mon Nov 30 20:56:42 2015 -0800

--
 .../sql/catalyst/expressions/Expression.scala   | 10 ++
 .../expressions/codegen/CodeGenerator.scala | 34 ++--
 .../codegen/GenerateUnsafeProjection.scala  |  4 +--
 3 files changed, 14 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9693b0d5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index 169435a..b55d365 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -94,13 +94,9 @@ abstract class Expression extends TreeNode[Expression] {
   def gen(ctx: CodeGenContext): GeneratedExpressionCode = {
 ctx.subExprEliminationExprs.get(this).map { subExprState =>
   // This expression is repeated meaning the code to evaluated has already 
been added
-  // as a function, `subExprState.fnName`. Just call that.
-  val code =
-s"""
-   |/* $this */
-   |${subExprState.fnName}(${ctx.INPUT_ROW});
- """.stripMargin.trim
-  GeneratedExpressionCode(code, subExprState.code.isNull, 
subExprState.code.value)
+  // as a function and called in advance. Just use it.
+  val code = s"/* $this */"
+  GeneratedExpressionCode(code, subExprState.isNull, subExprState.value)
 }.getOrElse {
   val isNull = ctx.freshName("isNull")
   val primitive = ctx.freshName("primitive")

http://git-wip-us.apache.org/repos/asf/spark/blob/9693b0d5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 2f3d6ae..440c7d2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -104,16 +104,13 @@ class CodeGenContext {
   val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
 
   // State used for subexpression elimination.
-  case class SubExprEliminationState(
-  isLoaded: String,
-  code: GeneratedExpressionCode,
-  fnName: String)
+  case class SubExprEliminationState(isNull: String, value: String)
 
   // Foreach expression that is participating in subexpression elimination, 
the state to use.
   val subExprEliminationExprs = mutable.HashMap.empty[Expression, 
SubExprEliminationState]
 
-  // The collection of isLoaded variables that need to be reset on each row.
-  val subExprIsLoadedVariables = mutable.ArrayBuffer.empty[String]
+  // The collection of sub-exression result resetting methods that need to be 
called on each row.
+  val subExprResetVariables = mutable.ArrayBuffer.empty[String]
 
   final val JAVA_BOOLEAN = "boolean"
   final val JAVA_BYTE = "byte"
@@ -408,7 +405,6 @@ class CodeGenContext {
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
 commonExprs.foreach(e => {
   val expr = e.head
-  val isLoaded = freshName("isLoaded")
   val isNull = freshName("isNull")
   val value = freshName("value")
   val fnName = freshName("evalExpr")
@@ -417,18 +413,12 @@ class CodeGenContext {
   val code = expr.gen(this)
   val fn =
 s"""
-   |private void $fnName(InternalRow ${INPUT_ROW})

spark git commit: [SPARK-12018][SQL] Refactor common subexpression elimination code

2015-11-30 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 86a46ce68 -> 1aa39bdb1


[SPARK-12018][SQL] Refactor common subexpression elimination code

JIRA: https://issues.apache.org/jira/browse/SPARK-12018

The code of common subexpression elimination can be factored and simplified. 
Some unnecessary variables can be removed.

Author: Liang-Chi Hsieh 

Closes #10009 from viirya/refactor-subexpr-eliminate.

(cherry picked from commit 9693b0d5a55bc1d2da96f04fe2c6de59a8dfcc1b)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1aa39bdb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1aa39bdb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1aa39bdb

Branch: refs/heads/branch-1.6
Commit: 1aa39bdb136ace5dcd4f7fdf62b8f324c14261f2
Parents: 86a46ce
Author: Liang-Chi Hsieh 
Authored: Mon Nov 30 20:56:42 2015 -0800
Committer: Michael Armbrust 
Committed: Mon Nov 30 20:56:54 2015 -0800

--
 .../sql/catalyst/expressions/Expression.scala   | 10 ++
 .../expressions/codegen/CodeGenerator.scala | 34 ++--
 .../codegen/GenerateUnsafeProjection.scala  |  4 +--
 3 files changed, 14 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1aa39bdb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index 169435a..b55d365 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -94,13 +94,9 @@ abstract class Expression extends TreeNode[Expression] {
   def gen(ctx: CodeGenContext): GeneratedExpressionCode = {
 ctx.subExprEliminationExprs.get(this).map { subExprState =>
   // This expression is repeated meaning the code to evaluated has already 
been added
-  // as a function, `subExprState.fnName`. Just call that.
-  val code =
-s"""
-   |/* $this */
-   |${subExprState.fnName}(${ctx.INPUT_ROW});
- """.stripMargin.trim
-  GeneratedExpressionCode(code, subExprState.code.isNull, 
subExprState.code.value)
+  // as a function and called in advance. Just use it.
+  val code = s"/* $this */"
+  GeneratedExpressionCode(code, subExprState.isNull, subExprState.value)
 }.getOrElse {
   val isNull = ctx.freshName("isNull")
   val primitive = ctx.freshName("primitive")

http://git-wip-us.apache.org/repos/asf/spark/blob/1aa39bdb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 2f3d6ae..440c7d2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -104,16 +104,13 @@ class CodeGenContext {
   val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
 
   // State used for subexpression elimination.
-  case class SubExprEliminationState(
-  isLoaded: String,
-  code: GeneratedExpressionCode,
-  fnName: String)
+  case class SubExprEliminationState(isNull: String, value: String)
 
   // Foreach expression that is participating in subexpression elimination, 
the state to use.
   val subExprEliminationExprs = mutable.HashMap.empty[Expression, 
SubExprEliminationState]
 
-  // The collection of isLoaded variables that need to be reset on each row.
-  val subExprIsLoadedVariables = mutable.ArrayBuffer.empty[String]
+  // The collection of sub-exression result resetting methods that need to be 
called on each row.
+  val subExprResetVariables = mutable.ArrayBuffer.empty[String]
 
   final val JAVA_BOOLEAN = "boolean"
   final val JAVA_BYTE = "byte"
@@ -408,7 +405,6 @@ class CodeGenContext {
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
 commonExprs.foreach(e => {
   val expr = e.head
-  val isLoaded = freshName("isLoaded")
   val isNull = freshName("isNull")
   val value = freshName("value")
   val fnName = freshName("evalExpr")
@@ -417,18 +413,12 @@ class

spark git commit: [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions

2015-11-30 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master bf0e85a70 -> 2db4662fe


[SPARK-11989][SQL] Only use commit in JDBC data source if the underlying 
database supports transactions

Fixes [SPARK-11989](https://issues.apache.org/jira/browse/SPARK-11989)

Author: CK50 
Author: Christian Kurz 

Closes #9973 from CK50/branch-1.6_non-transactional.

(cherry picked from commit a589736a1b237ef2f3bd59fbaeefe143ddcc8f4e)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2db4662f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2db4662f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2db4662f

Branch: refs/heads/master
Commit: 2db4662fe2d72749c06ad5f11f641a388343f77c
Parents: bf0e85a
Author: CK50 
Authored: Mon Nov 30 20:08:49 2015 +0800
Committer: Reynold Xin 
Committed: Mon Nov 30 20:09:05 2015 +0800

--
 .../execution/datasources/jdbc/JdbcUtils.scala  | 22 +---
 1 file changed, 19 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2db4662f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index 7375a5c..252f1cf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -21,6 +21,7 @@ import java.sql.{Connection, PreparedStatement}
 import java.util.Properties
 
 import scala.util.Try
+import scala.util.control.NonFatal
 
 import org.apache.spark.Logging
 import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcType, JdbcDialects}
@@ -125,8 +126,19 @@ object JdbcUtils extends Logging {
   dialect: JdbcDialect): Iterator[Byte] = {
 val conn = getConnection()
 var committed = false
+val supportsTransactions = try {
+  conn.getMetaData().supportsDataManipulationTransactionsOnly() ||
+  
conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions()
+} catch {
+  case NonFatal(e) =>
+logWarning("Exception while detecting transaction support", e)
+true
+}
+
 try {
-  conn.setAutoCommit(false) // Everything in the same db transaction.
+  if (supportsTransactions) {
+conn.setAutoCommit(false) // Everything in the same db transaction.
+  }
   val stmt = insertStatement(conn, table, rddSchema)
   try {
 var rowCount = 0
@@ -175,14 +187,18 @@ object JdbcUtils extends Logging {
   } finally {
 stmt.close()
   }
-  conn.commit()
+  if (supportsTransactions) {
+conn.commit()
+  }
   committed = true
 } finally {
   if (!committed) {
 // The stage must fail.  We got here through an exception path, so
 // let the exception through unless rollback() or close() want to
 // tell the user about another problem.
-conn.rollback()
+if (supportsTransactions) {
+  conn.rollback()
+}
 conn.close()
   } else {
 // The stage must succeed.  We cannot propagate any exception close() 
might throw.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions

2015-11-30 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 33cd171b2 -> a589736a1


[SPARK-11989][SQL] Only use commit in JDBC data source if the underlying 
database supports transactions

Fixes [SPARK-11989](https://issues.apache.org/jira/browse/SPARK-11989)

Author: CK50 
Author: Christian Kurz 

Closes #9973 from CK50/branch-1.6_non-transactional.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a589736a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a589736a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a589736a

Branch: refs/heads/branch-1.6
Commit: a589736a1b237ef2f3bd59fbaeefe143ddcc8f4e
Parents: 33cd171
Author: CK50 
Authored: Mon Nov 30 20:08:49 2015 +0800
Committer: Reynold Xin 
Committed: Mon Nov 30 20:08:49 2015 +0800

--
 .../execution/datasources/jdbc/JdbcUtils.scala  | 22 +---
 1 file changed, 19 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a589736a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index 7375a5c..252f1cf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -21,6 +21,7 @@ import java.sql.{Connection, PreparedStatement}
 import java.util.Properties
 
 import scala.util.Try
+import scala.util.control.NonFatal
 
 import org.apache.spark.Logging
 import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcType, JdbcDialects}
@@ -125,8 +126,19 @@ object JdbcUtils extends Logging {
   dialect: JdbcDialect): Iterator[Byte] = {
 val conn = getConnection()
 var committed = false
+val supportsTransactions = try {
+  conn.getMetaData().supportsDataManipulationTransactionsOnly() ||
+  
conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions()
+} catch {
+  case NonFatal(e) =>
+logWarning("Exception while detecting transaction support", e)
+true
+}
+
 try {
-  conn.setAutoCommit(false) // Everything in the same db transaction.
+  if (supportsTransactions) {
+conn.setAutoCommit(false) // Everything in the same db transaction.
+  }
   val stmt = insertStatement(conn, table, rddSchema)
   try {
 var rowCount = 0
@@ -175,14 +187,18 @@ object JdbcUtils extends Logging {
   } finally {
 stmt.close()
   }
-  conn.commit()
+  if (supportsTransactions) {
+conn.commit()
+  }
   committed = true
 } finally {
   if (!committed) {
 // The stage must fail.  We got here through an exception path, so
 // let the exception through unless rollback() or close() want to
 // tell the user about another problem.
-conn.rollback()
+if (supportsTransactions) {
+  conn.rollback()
+}
 conn.close()
   } else {
 // The stage must succeed.  We cannot propagate any exception close() 
might throw.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [DOC] Explicitly state that top maintains the order of elements

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 e9653921e -> aaf835f1d


[DOC] Explicitly state that top maintains the order of elements

Top is implemented in terms of takeOrdered, which already maintains the
order, so top should, too.

Author: Wieland Hoffmann 

Closes #10013 from mineo/top-order.

(cherry picked from commit 26c3581f17f475fab2f3b5301b8f253ff2fa6438)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aaf835f1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aaf835f1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aaf835f1

Branch: refs/heads/branch-1.6
Commit: aaf835f1d67470d859fae3c0cc3143e4ccaaf3d0
Parents: e965392
Author: Wieland Hoffmann 
Authored: Mon Nov 30 09:32:48 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 09:32:58 2015 +

--
 core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala | 4 ++--
 core/src/main/scala/org/apache/spark/rdd/RDD.scala  | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/aaf835f1/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
index 871be0b..1e9d4f1 100644
--- a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
+++ b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
@@ -556,7 +556,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends 
Serializable {
 
   /**
* Returns the top k (largest) elements from this RDD as defined by
-   * the specified Comparator[T].
+   * the specified Comparator[T] and maintains the order.
* @param num k, the number of top elements to return
* @param comp the comparator that defines the order
* @return an array of top elements
@@ -567,7 +567,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends 
Serializable {
 
   /**
* Returns the top k (largest) elements from this RDD using the
-   * natural ordering for T.
+   * natural ordering for T and maintains the order.
* @param num k, the number of top elements to return
* @return an array of top elements
*/

http://git-wip-us.apache.org/repos/asf/spark/blob/aaf835f1/core/src/main/scala/org/apache/spark/rdd/RDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
index 2aeb5ee..8b3731d 100644
--- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
@@ -1327,7 +1327,8 @@ abstract class RDD[T: ClassTag](
 
   /**
* Returns the top k (largest) elements from this RDD as defined by the 
specified
-   * implicit Ordering[T]. This does the opposite of [[takeOrdered]]. For 
example:
+   * implicit Ordering[T] and maintains the ordering. This does the opposite of
+   * [[takeOrdered]]. For example:
* {{{
*   sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1)
*   // returns Array(12)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [DOC] Explicitly state that top maintains the order of elements

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 953e8e6dc -> 26c3581f1


[DOC] Explicitly state that top maintains the order of elements

Top is implemented in terms of takeOrdered, which already maintains the
order, so top should, too.

Author: Wieland Hoffmann 

Closes #10013 from mineo/top-order.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/26c3581f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/26c3581f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/26c3581f

Branch: refs/heads/master
Commit: 26c3581f17f475fab2f3b5301b8f253ff2fa6438
Parents: 953e8e6
Author: Wieland Hoffmann 
Authored: Mon Nov 30 09:32:48 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 09:32:48 2015 +

--
 core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala | 4 ++--
 core/src/main/scala/org/apache/spark/rdd/RDD.scala  | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/26c3581f/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
index 871be0b..1e9d4f1 100644
--- a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
+++ b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
@@ -556,7 +556,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends 
Serializable {
 
   /**
* Returns the top k (largest) elements from this RDD as defined by
-   * the specified Comparator[T].
+   * the specified Comparator[T] and maintains the order.
* @param num k, the number of top elements to return
* @param comp the comparator that defines the order
* @return an array of top elements
@@ -567,7 +567,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends 
Serializable {
 
   /**
* Returns the top k (largest) elements from this RDD using the
-   * natural ordering for T.
+   * natural ordering for T and maintains the order.
* @param num k, the number of top elements to return
* @return an array of top elements
*/

http://git-wip-us.apache.org/repos/asf/spark/blob/26c3581f/core/src/main/scala/org/apache/spark/rdd/RDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
index 2aeb5ee..8b3731d 100644
--- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
@@ -1327,7 +1327,8 @@ abstract class RDD[T: ClassTag](
 
   /**
* Returns the top k (largest) elements from this RDD as defined by the 
specified
-   * implicit Ordering[T]. This does the opposite of [[takeOrdered]]. For 
example:
+   * implicit Ordering[T] and maintains the ordering. This does the opposite of
+   * [[takeOrdered]]. For example:
* {{{
*   sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1)
*   // returns Array(12)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only.

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master e07494420 -> 953e8e6dc


[MINOR][BUILD] Changed the comment to reflect the plugin project is there to 
support SBT pom reader only.

Author: Prashant Sharma 

Closes #10012 from ScrapCodes/minor-build-comment.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/953e8e6d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/953e8e6d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/953e8e6d

Branch: refs/heads/master
Commit: 953e8e6dcb32cd88005834e9c3720740e201826c
Parents: e074944
Author: Prashant Sharma 
Authored: Mon Nov 30 09:30:58 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 09:30:58 2015 +

--
 project/project/SparkPluginBuild.scala | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/953e8e6d/project/project/SparkPluginBuild.scala
--
diff --git a/project/project/SparkPluginBuild.scala 
b/project/project/SparkPluginBuild.scala
index 471d00b..cbb88dc 100644
--- a/project/project/SparkPluginBuild.scala
+++ b/project/project/SparkPluginBuild.scala
@@ -19,9 +19,8 @@ import sbt._
 import sbt.Keys._
 
 /**
- * This plugin project is there to define new scala style rules for spark. 
This is
- * a plugin project so that this gets compiled first and is put on the 
classpath and
- * becomes available for scalastyle sbt plugin.
+ * This plugin project is there because we use our custom fork of 
sbt-pom-reader plugin. This is
+ * a plugin project so that this gets compiled first and is available on the 
classpath for SBT build.
  */
 object SparkPluginDef extends Build {
   lazy val root = Project("plugins", file(".")) dependsOn(sbtPomReader)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only.

2015-11-30 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 12d97b0c5 -> e9653921e


[MINOR][BUILD] Changed the comment to reflect the plugin project is there to 
support SBT pom reader only.

Author: Prashant Sharma 

Closes #10012 from ScrapCodes/minor-build-comment.

(cherry picked from commit 953e8e6dcb32cd88005834e9c3720740e201826c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9653921
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9653921
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9653921

Branch: refs/heads/branch-1.6
Commit: e9653921e84092f36988e3d73618c58b0c10e938
Parents: 12d97b0
Author: Prashant Sharma 
Authored: Mon Nov 30 09:30:58 2015 +
Committer: Sean Owen 
Committed: Mon Nov 30 09:31:09 2015 +

--
 project/project/SparkPluginBuild.scala | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e9653921/project/project/SparkPluginBuild.scala
--
diff --git a/project/project/SparkPluginBuild.scala 
b/project/project/SparkPluginBuild.scala
index 471d00b..cbb88dc 100644
--- a/project/project/SparkPluginBuild.scala
+++ b/project/project/SparkPluginBuild.scala
@@ -19,9 +19,8 @@ import sbt._
 import sbt.Keys._
 
 /**
- * This plugin project is there to define new scala style rules for spark. 
This is
- * a plugin project so that this gets compiled first and is put on the 
classpath and
- * becomes available for scalastyle sbt plugin.
+ * This plugin project is there because we use our custom fork of 
sbt-pom-reader plugin. This is
+ * a plugin project so that this gets compiled first and is available on the 
classpath for SBT build.
  */
 object SparkPluginDef extends Build {
   lazy val root = Project("plugins", file(".")) dependsOn(sbtPomReader)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

42 matches

Mail list logo