spark git commit: [MINOR][DOCS] fixed list display in ml-ensembles
Repository: spark Updated Branches: refs/heads/branch-1.6 32911de77 -> 0978ec11c [MINOR][DOCS] fixed list display in ml-ensembles The list in ml-ensembles.md wasn't properly formatted and, as a result, was looking like this: ![old](http://i.imgur.com/2ZhELLR.png) This PR aims to make it look like this: ![new](http://i.imgur.com/0Xriwd2.png) Author: BenFradetCloses #10025 from BenFradet/ml-ensembles-doc. (cherry picked from commit f2fbfa444f6e8d27953ec2d1c0b3abd603c963f9) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0978ec11 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0978ec11 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0978ec11 Branch: refs/heads/branch-1.6 Commit: 0978ec11c9a080bd493da2e9d11c81c08e8e6962 Parents: 32911de Author: BenFradet Authored: Mon Nov 30 13:02:08 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 13:02:19 2015 -0800 -- docs/ml-ensembles.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0978ec11/docs/ml-ensembles.md -- diff --git a/docs/ml-ensembles.md b/docs/ml-ensembles.md index f6c3c30..14fef76 100644 --- a/docs/ml-ensembles.md +++ b/docs/ml-ensembles.md @@ -20,6 +20,7 @@ Both use [MLlib decision trees](ml-decision-tree.html) as their base models. Users can find more information about ensemble algorithms in the [MLlib Ensemble guide](mllib-ensembles.html). In this section, we demonstrate the Pipelines API for ensembles. The main differences between this API and the [original MLlib ensembles API](mllib-ensembles.html) are: + * support for ML Pipelines * separation of classification vs. regression * use of DataFrame metadata to distinguish continuous and categorical features - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][DOCS] fixed list display in ml-ensembles
Repository: spark Updated Branches: refs/heads/master 8df584b02 -> f2fbfa444 [MINOR][DOCS] fixed list display in ml-ensembles The list in ml-ensembles.md wasn't properly formatted and, as a result, was looking like this: ![old](http://i.imgur.com/2ZhELLR.png) This PR aims to make it look like this: ![new](http://i.imgur.com/0Xriwd2.png) Author: BenFradetCloses #10025 from BenFradet/ml-ensembles-doc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f2fbfa44 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f2fbfa44 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f2fbfa44 Branch: refs/heads/master Commit: f2fbfa444f6e8d27953ec2d1c0b3abd603c963f9 Parents: 8df584b Author: BenFradet Authored: Mon Nov 30 13:02:08 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 13:02:08 2015 -0800 -- docs/ml-ensembles.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f2fbfa44/docs/ml-ensembles.md -- diff --git a/docs/ml-ensembles.md b/docs/ml-ensembles.md index f6c3c30..14fef76 100644 --- a/docs/ml-ensembles.md +++ b/docs/ml-ensembles.md @@ -20,6 +20,7 @@ Both use [MLlib decision trees](ml-decision-tree.html) as their base models. Users can find more information about ensemble algorithms in the [MLlib Ensemble guide](mllib-ensembles.html). In this section, we demonstrate the Pipelines API for ensembles. The main differences between this API and the [original MLlib ensembles API](mllib-ensembles.html) are: + * support for ML Pipelines * separation of classification vs. regression * use of DataFrame metadata to distinguish continuous and categorical features - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Revert "[SPARK-11206] Support SQL UI on the history server"
Repository: spark Updated Branches: refs/heads/master f2fbfa444 -> 2c5dee0fb Revert "[SPARK-11206] Support SQL UI on the history server" This reverts commit cc243a079b1c039d6e7f0b410d1654d94a090e14 / PR #9297 I'm reverting this because it broke SQLListenerMemoryLeakSuite in the master Maven builds. See #9991 for a discussion of why this broke the tests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c5dee0f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2c5dee0f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2c5dee0f Branch: refs/heads/master Commit: 2c5dee0fb8e4d1734ea3a0f22e0b5bfd2f6dba46 Parents: f2fbfa4 Author: Josh RosenAuthored: Mon Nov 30 13:41:52 2015 -0800 Committer: Josh Rosen Committed: Mon Nov 30 13:42:35 2015 -0800 -- .rat-excludes | 1 - .../org/apache/spark/JavaSparkListener.java | 3 - .../org/apache/spark/SparkFirehoseListener.java | 4 - .../spark/scheduler/EventLoggingListener.scala | 4 - .../apache/spark/scheduler/SparkListener.scala | 24 +--- .../spark/scheduler/SparkListenerBus.scala | 1 - .../scala/org/apache/spark/ui/SparkUI.scala | 16 +-- .../org/apache/spark/util/JsonProtocol.scala| 11 +- spark.scheduler.SparkHistoryListenerFactory | 1 - .../scala/org/apache/spark/sql/SQLContext.scala | 18 +-- .../spark/sql/execution/SQLExecution.scala | 24 +++- .../spark/sql/execution/SparkPlanInfo.scala | 46 -- .../sql/execution/metric/SQLMetricInfo.scala| 30 .../spark/sql/execution/metric/SQLMetrics.scala | 56 +++- .../spark/sql/execution/ui/ExecutionPage.scala | 4 +- .../spark/sql/execution/ui/SQLListener.scala| 139 ++- .../apache/spark/sql/execution/ui/SQLTab.scala | 12 +- .../spark/sql/execution/ui/SparkPlanGraph.scala | 20 +-- .../sql/execution/metric/SQLMetricsSuite.scala | 4 +- .../sql/execution/ui/SQLListenerSuite.scala | 43 +++--- .../spark/sql/test/SharedSQLContext.scala | 1 - 21 files changed, 135 insertions(+), 327 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/.rat-excludes -- diff --git a/.rat-excludes b/.rat-excludes index 7262c96..08fba6d 100644 --- a/.rat-excludes +++ b/.rat-excludes @@ -82,5 +82,4 @@ INDEX gen-java.* .*avpr org.apache.spark.sql.sources.DataSourceRegister -org.apache.spark.scheduler.SparkHistoryListenerFactory .*parquet http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/core/src/main/java/org/apache/spark/JavaSparkListener.java -- diff --git a/core/src/main/java/org/apache/spark/JavaSparkListener.java b/core/src/main/java/org/apache/spark/JavaSparkListener.java index 23bc9a2..fa9acf0 100644 --- a/core/src/main/java/org/apache/spark/JavaSparkListener.java +++ b/core/src/main/java/org/apache/spark/JavaSparkListener.java @@ -82,7 +82,4 @@ public class JavaSparkListener implements SparkListener { @Override public void onBlockUpdated(SparkListenerBlockUpdated blockUpdated) { } - @Override - public void onOtherEvent(SparkListenerEvent event) { } - } http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/core/src/main/java/org/apache/spark/SparkFirehoseListener.java -- diff --git a/core/src/main/java/org/apache/spark/SparkFirehoseListener.java b/core/src/main/java/org/apache/spark/SparkFirehoseListener.java index e6b24af..1214d05 100644 --- a/core/src/main/java/org/apache/spark/SparkFirehoseListener.java +++ b/core/src/main/java/org/apache/spark/SparkFirehoseListener.java @@ -118,8 +118,4 @@ public class SparkFirehoseListener implements SparkListener { onEvent(blockUpdated); } -@Override -public void onOtherEvent(SparkListenerEvent event) { -onEvent(event); -} } http://git-wip-us.apache.org/repos/asf/spark/blob/2c5dee0f/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala index eaa07ac..000a021 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala @@ -207,10 +207,6 @@ private[spark] class EventLoggingListener( // No-op because logging every update would be overkill override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit =
spark git commit: [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper
Repository: spark Updated Branches: refs/heads/master 0ddfe7868 -> e07494420 [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper * According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://` http://spark.apache.org/docs/latest/running-on-mesos.html `The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.` * However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port` * For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port` * This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted. * This PR also updated corresponding unit test. Author: toddwanCloses #9886 from toddwan/S11859. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0749442 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0749442 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0749442 Branch: refs/heads/master Commit: e0749442051d6e29dae4f4cdcb2937c0b015f98f Parents: 0ddfe78 Author: toddwan Authored: Mon Nov 30 09:26:29 2015 + Committer: Sean Owen Committed: Mon Nov 30 09:26:29 2015 + -- .../main/scala/org/apache/spark/SparkContext.scala | 16 ++-- .../spark/SparkContextSchedulerCreationSuite.scala | 5 + 2 files changed, 15 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e0749442/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index b030d3c..8a62b71 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2708,15 +2708,14 @@ object SparkContext extends Logging { scheduler.initialize(backend) (backend, scheduler) - case mesosUrl @ MESOS_REGEX(_) => + case MESOS_REGEX(mesosUrl) => MesosNativeLibrary.load() val scheduler = new TaskSchedulerImpl(sc) val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", defaultValue = true) -val url = mesosUrl.stripPrefix("mesos://") // strip scheme from raw Mesos URLs val backend = if (coarseGrained) { - new CoarseMesosSchedulerBackend(scheduler, sc, url, sc.env.securityManager) + new CoarseMesosSchedulerBackend(scheduler, sc, mesosUrl, sc.env.securityManager) } else { - new MesosSchedulerBackend(scheduler, sc, url) + new MesosSchedulerBackend(scheduler, sc, mesosUrl) } scheduler.initialize(backend) (backend, scheduler) @@ -2727,6 +2726,11 @@ object SparkContext extends Logging { scheduler.initialize(backend) (backend, scheduler) + case zkUrl if zkUrl.startsWith("zk://") => +logWarning("Master URL for a multi-master Mesos cluster managed by ZooKeeper should be " + + "in the form mesos://zk://host:port. Current Master URL will stop working in Spark 2.0.") +createTaskScheduler(sc, "mesos://" + zkUrl) + case _ => throw new SparkException("Could not parse Master URL: '" + master + "'") } @@ -2745,8 +2749,8 @@ private object SparkMasterRegex { val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r // Regular expression for connecting to Spark deploy clusters val SPARK_REGEX = """spark://(.*)""".r - // Regular expression for connection to Mesos cluster by mesos:// or zk:// url - val MESOS_REGEX = """(mesos|zk)://.*""".r + // Regular expression for connection to Mesos cluster by mesos:// or mesos://zk:// url + val MESOS_REGEX = """mesos://(.*)""".r // Regular expression for connection to Simr cluster val SIMR_REGEX = """simr://(.*)""".r } http://git-wip-us.apache.org/repos/asf/spark/blob/e0749442/core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala
spark git commit: [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper
Repository: spark Updated Branches: refs/heads/branch-1.6 a4a2a7deb -> 12d97b0c5 [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper * According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://` http://spark.apache.org/docs/latest/running-on-mesos.html `The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.` * However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port` * For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port` * This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted. * This PR also updated corresponding unit test. Author: toddwanCloses #9886 from toddwan/S11859. (cherry picked from commit e0749442051d6e29dae4f4cdcb2937c0b015f98f) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12d97b0c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12d97b0c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12d97b0c Branch: refs/heads/branch-1.6 Commit: 12d97b0c5213e04453f81156f04ed95d877f199c Parents: a4a2a7d Author: toddwan Authored: Mon Nov 30 09:26:29 2015 + Committer: Sean Owen Committed: Mon Nov 30 09:26:37 2015 + -- .../main/scala/org/apache/spark/SparkContext.scala | 16 ++-- .../spark/SparkContextSchedulerCreationSuite.scala | 5 + 2 files changed, 15 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/12d97b0c/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index b030d3c..8a62b71 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2708,15 +2708,14 @@ object SparkContext extends Logging { scheduler.initialize(backend) (backend, scheduler) - case mesosUrl @ MESOS_REGEX(_) => + case MESOS_REGEX(mesosUrl) => MesosNativeLibrary.load() val scheduler = new TaskSchedulerImpl(sc) val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", defaultValue = true) -val url = mesosUrl.stripPrefix("mesos://") // strip scheme from raw Mesos URLs val backend = if (coarseGrained) { - new CoarseMesosSchedulerBackend(scheduler, sc, url, sc.env.securityManager) + new CoarseMesosSchedulerBackend(scheduler, sc, mesosUrl, sc.env.securityManager) } else { - new MesosSchedulerBackend(scheduler, sc, url) + new MesosSchedulerBackend(scheduler, sc, mesosUrl) } scheduler.initialize(backend) (backend, scheduler) @@ -2727,6 +2726,11 @@ object SparkContext extends Logging { scheduler.initialize(backend) (backend, scheduler) + case zkUrl if zkUrl.startsWith("zk://") => +logWarning("Master URL for a multi-master Mesos cluster managed by ZooKeeper should be " + + "in the form mesos://zk://host:port. Current Master URL will stop working in Spark 2.0.") +createTaskScheduler(sc, "mesos://" + zkUrl) + case _ => throw new SparkException("Could not parse Master URL: '" + master + "'") } @@ -2745,8 +2749,8 @@ private object SparkMasterRegex { val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r // Regular expression for connecting to Spark deploy clusters val SPARK_REGEX = """spark://(.*)""".r - // Regular expression for connection to Mesos cluster by mesos:// or zk:// url - val MESOS_REGEX = """(mesos|zk)://.*""".r + // Regular expression for connection to Mesos cluster by mesos:// or mesos://zk:// url + val MESOS_REGEX = """mesos://(.*)""".r // Regular expression for connection to Simr cluster val SIMR_REGEX = """simr://(.*)""".r } http://git-wip-us.apache.org/repos/asf/spark/blob/12d97b0c/core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala
spark git commit: [SPARK-12023][BUILD] Fix warnings while packaging spark with maven.
Repository: spark Updated Branches: refs/heads/branch-1.6 aaf835f1d -> 33cd171b2 [SPARK-12023][BUILD] Fix warnings while packaging spark with maven. this is a trivial fix, discussed [here](http://stackoverflow.com/questions/28500401/maven-assembly-plugin-warning-the-assembly-descriptor-contains-a-filesystem-roo/). Author: Prashant SharmaCloses #10014 from ScrapCodes/assembly-warning. (cherry picked from commit bf0e85a70a54a2d7fd6804b6bd00c63c20e2bb00) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33cd171b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33cd171b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33cd171b Branch: refs/heads/branch-1.6 Commit: 33cd171b24d45081c8dd1b22a8bd0152a392d115 Parents: aaf835f Author: Prashant Sharma Authored: Mon Nov 30 10:11:27 2015 + Committer: Sean Owen Committed: Mon Nov 30 10:11:39 2015 + -- assembly/src/main/assembly/assembly.xml | 8 external/mqtt/src/main/assembly/assembly.xml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/33cd171b/assembly/src/main/assembly/assembly.xml -- diff --git a/assembly/src/main/assembly/assembly.xml b/assembly/src/main/assembly/assembly.xml index 7111563..009d4b9 100644 --- a/assembly/src/main/assembly/assembly.xml +++ b/assembly/src/main/assembly/assembly.xml @@ -32,7 +32,7 @@ ${project.parent.basedir}/core/src/main/resources/org/apache/spark/ui/static/ - /ui-resources/org/apache/spark/ui/static + ui-resources/org/apache/spark/ui/static **/* @@ -41,7 +41,7 @@ ${project.parent.basedir}/sbin/ - /sbin + sbin **/* @@ -50,7 +50,7 @@ ${project.parent.basedir}/bin/ - /bin + bin **/* @@ -59,7 +59,7 @@ ${project.parent.basedir}/assembly/target/${spark.jar.dir} - / + ${spark.jar.basename} http://git-wip-us.apache.org/repos/asf/spark/blob/33cd171b/external/mqtt/src/main/assembly/assembly.xml -- diff --git a/external/mqtt/src/main/assembly/assembly.xml b/external/mqtt/src/main/assembly/assembly.xml index ecab5b3..c110b01 100644 --- a/external/mqtt/src/main/assembly/assembly.xml +++ b/external/mqtt/src/main/assembly/assembly.xml @@ -24,7 +24,7 @@ ${project.build.directory}/scala-${scala.binary.version}/test-classes - / + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12023][BUILD] Fix warnings while packaging spark with maven.
Repository: spark Updated Branches: refs/heads/master 26c3581f1 -> bf0e85a70 [SPARK-12023][BUILD] Fix warnings while packaging spark with maven. this is a trivial fix, discussed [here](http://stackoverflow.com/questions/28500401/maven-assembly-plugin-warning-the-assembly-descriptor-contains-a-filesystem-roo/). Author: Prashant SharmaCloses #10014 from ScrapCodes/assembly-warning. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf0e85a7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf0e85a7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf0e85a7 Branch: refs/heads/master Commit: bf0e85a70a54a2d7fd6804b6bd00c63c20e2bb00 Parents: 26c3581 Author: Prashant Sharma Authored: Mon Nov 30 10:11:27 2015 + Committer: Sean Owen Committed: Mon Nov 30 10:11:27 2015 + -- assembly/src/main/assembly/assembly.xml | 8 external/mqtt/src/main/assembly/assembly.xml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bf0e85a7/assembly/src/main/assembly/assembly.xml -- diff --git a/assembly/src/main/assembly/assembly.xml b/assembly/src/main/assembly/assembly.xml index 7111563..009d4b9 100644 --- a/assembly/src/main/assembly/assembly.xml +++ b/assembly/src/main/assembly/assembly.xml @@ -32,7 +32,7 @@ ${project.parent.basedir}/core/src/main/resources/org/apache/spark/ui/static/ - /ui-resources/org/apache/spark/ui/static + ui-resources/org/apache/spark/ui/static **/* @@ -41,7 +41,7 @@ ${project.parent.basedir}/sbin/ - /sbin + sbin **/* @@ -50,7 +50,7 @@ ${project.parent.basedir}/bin/ - /bin + bin **/* @@ -59,7 +59,7 @@ ${project.parent.basedir}/assembly/target/${spark.jar.dir} - / + ${spark.jar.basename} http://git-wip-us.apache.org/repos/asf/spark/blob/bf0e85a7/external/mqtt/src/main/assembly/assembly.xml -- diff --git a/external/mqtt/src/main/assembly/assembly.xml b/external/mqtt/src/main/assembly/assembly.xml index ecab5b3..c110b01 100644 --- a/external/mqtt/src/main/assembly/assembly.xml +++ b/external/mqtt/src/main/assembly/assembly.xml @@ -24,7 +24,7 @@ ${project.build.directory}/scala-${scala.binary.version}/test-classes - / + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11700] [SQL] Remove thread local SQLContext in SparkPlan
Repository: spark Updated Branches: refs/heads/branch-1.6 a589736a1 -> 32911de77 [SPARK-11700] [SQL] Remove thread local SQLContext in SparkPlan In 1.6, we introduce a public API to have a SQLContext for current thread, SparkPlan should use that. Author: Davies LiuCloses #9990 from davies/leak_context. (cherry picked from commit 17275fa99c670537c52843df405279a52b5c9594) Signed-off-by: Davies Liu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32911de7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32911de7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32911de7 Branch: refs/heads/branch-1.6 Commit: 32911de77d0ff346f821c7cd1dc607c611780164 Parents: a589736 Author: Davies Liu Authored: Mon Nov 30 10:32:13 2015 -0800 Committer: Davies Liu Committed: Mon Nov 30 10:32:30 2015 -0800 -- .../main/scala/org/apache/spark/sql/SQLContext.scala | 10 +- .../apache/spark/sql/execution/QueryExecution.scala | 3 +-- .../org/apache/spark/sql/execution/SparkPlan.scala| 14 -- .../org/apache/spark/sql/MultiSQLContextsSuite.scala | 2 +- .../sql/execution/ExchangeCoordinatorSuite.scala | 2 +- .../sql/execution/RowFormatConvertersSuite.scala | 4 ++-- 6 files changed, 14 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/32911de7/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala index 46bf544..9cc65de 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala @@ -26,7 +26,6 @@ import scala.collection.immutable import scala.reflect.runtime.universe.TypeTag import scala.util.control.NonFatal -import org.apache.spark.{SparkException, SparkContext} import org.apache.spark.annotation.{DeveloperApi, Experimental} import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} import org.apache.spark.rdd.RDD @@ -45,9 +44,10 @@ import org.apache.spark.sql.execution.datasources._ import org.apache.spark.sql.execution.ui.{SQLListener, SQLTab} import org.apache.spark.sql.sources.BaseRelation import org.apache.spark.sql.types._ -import org.apache.spark.sql.{execution => sparkexecution} import org.apache.spark.sql.util.ExecutionListenerManager +import org.apache.spark.sql.{execution => sparkexecution} import org.apache.spark.util.Utils +import org.apache.spark.{SparkContext, SparkException} /** * The entry point for working with structured data (rows and columns) in Spark. Allows the @@ -401,7 +401,7 @@ class SQLContext private[sql]( */ @Experimental def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = { -SparkPlan.currentContext.set(self) +SQLContext.setActive(self) val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType] val attributeSeq = schema.toAttributes val rowRDD = RDDConversions.productToRowRdd(rdd, schema.map(_.dataType)) @@ -417,7 +417,7 @@ class SQLContext private[sql]( */ @Experimental def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame = { -SparkPlan.currentContext.set(self) +SQLContext.setActive(self) val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType] val attributeSeq = schema.toAttributes DataFrame(self, LocalRelation.fromProduct(attributeSeq, data)) @@ -1328,7 +1328,7 @@ object SQLContext { activeContext.remove() } - private[sql] def getActiveContextOption(): Option[SQLContext] = { + private[sql] def getActive(): Option[SQLContext] = { Option(activeContext.get()) } http://git-wip-us.apache.org/repos/asf/spark/blob/32911de7/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala index 5da5aea..107570f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala @@ -42,9 +42,8 @@ class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan) { lazy val optimizedPlan: LogicalPlan = sqlContext.optimizer.execute(withCachedData) - // TODO: Don't just pick the first one... lazy val sparkPlan: SparkPlan = { -SparkPlan.currentContext.set(sqlContext) +
spark git commit: [SPARK-11700] [SQL] Remove thread local SQLContext in SparkPlan
Repository: spark Updated Branches: refs/heads/master 2db4662fe -> 17275fa99 [SPARK-11700] [SQL] Remove thread local SQLContext in SparkPlan In 1.6, we introduce a public API to have a SQLContext for current thread, SparkPlan should use that. Author: Davies LiuCloses #9990 from davies/leak_context. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17275fa9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17275fa9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17275fa9 Branch: refs/heads/master Commit: 17275fa99c670537c52843df405279a52b5c9594 Parents: 2db4662 Author: Davies Liu Authored: Mon Nov 30 10:32:13 2015 -0800 Committer: Davies Liu Committed: Mon Nov 30 10:32:13 2015 -0800 -- .../main/scala/org/apache/spark/sql/SQLContext.scala | 10 +- .../apache/spark/sql/execution/QueryExecution.scala | 3 +-- .../org/apache/spark/sql/execution/SparkPlan.scala| 14 -- .../org/apache/spark/sql/MultiSQLContextsSuite.scala | 2 +- .../sql/execution/ExchangeCoordinatorSuite.scala | 2 +- .../sql/execution/RowFormatConvertersSuite.scala | 4 ++-- 6 files changed, 14 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/17275fa9/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala index 1c2ac5f..8d27839 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala @@ -26,7 +26,6 @@ import scala.collection.immutable import scala.reflect.runtime.universe.TypeTag import scala.util.control.NonFatal -import org.apache.spark.{SparkException, SparkContext} import org.apache.spark.annotation.{DeveloperApi, Experimental} import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} import org.apache.spark.rdd.RDD @@ -45,9 +44,10 @@ import org.apache.spark.sql.execution.datasources._ import org.apache.spark.sql.execution.ui.{SQLListener, SQLTab} import org.apache.spark.sql.sources.BaseRelation import org.apache.spark.sql.types._ -import org.apache.spark.sql.{execution => sparkexecution} import org.apache.spark.sql.util.ExecutionListenerManager +import org.apache.spark.sql.{execution => sparkexecution} import org.apache.spark.util.Utils +import org.apache.spark.{SparkContext, SparkException} /** * The entry point for working with structured data (rows and columns) in Spark. Allows the @@ -401,7 +401,7 @@ class SQLContext private[sql]( */ @Experimental def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = { -SparkPlan.currentContext.set(self) +SQLContext.setActive(self) val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType] val attributeSeq = schema.toAttributes val rowRDD = RDDConversions.productToRowRdd(rdd, schema.map(_.dataType)) @@ -417,7 +417,7 @@ class SQLContext private[sql]( */ @Experimental def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame = { -SparkPlan.currentContext.set(self) +SQLContext.setActive(self) val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType] val attributeSeq = schema.toAttributes DataFrame(self, LocalRelation.fromProduct(attributeSeq, data)) @@ -1334,7 +1334,7 @@ object SQLContext { activeContext.remove() } - private[sql] def getActiveContextOption(): Option[SQLContext] = { + private[sql] def getActive(): Option[SQLContext] = { Option(activeContext.get()) } http://git-wip-us.apache.org/repos/asf/spark/blob/17275fa9/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala index 5da5aea..107570f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala @@ -42,9 +42,8 @@ class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan) { lazy val optimizedPlan: LogicalPlan = sqlContext.optimizer.execute(withCachedData) - // TODO: Don't just pick the first one... lazy val sparkPlan: SparkPlan = { -SparkPlan.currentContext.set(sqlContext) +SQLContext.setActive(sqlContext) sqlContext.planner.plan(optimizedPlan).next() }
spark git commit: [SPARK-11982] [SQL] improve performance of cartesian product
Repository: spark Updated Branches: refs/heads/master 17275fa99 -> 8df584b02 [SPARK-11982] [SQL] improve performance of cartesian product This PR improve the performance of CartesianProduct by caching the result of right plan. After this patch, the query time of TPC-DS Q65 go down to 4 seconds from 28 minutes (420X faster). cc nongli Author: Davies LiuCloses #9969 from davies/improve_cartesian. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8df584b0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8df584b0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8df584b0 Branch: refs/heads/master Commit: 8df584b0200402d8b2ce0a8de24f7a760ced8655 Parents: 17275fa Author: Davies Liu Authored: Mon Nov 30 11:54:18 2015 -0800 Committer: Davies Liu Committed: Mon Nov 30 11:54:18 2015 -0800 -- .../unsafe/sort/UnsafeExternalSorter.java | 63 .../unsafe/sort/UnsafeInMemorySorter.java | 7 ++ .../sql/execution/joins/CartesianProduct.scala | 76 +--- .../sql/execution/metric/SQLMetricsSuite.scala | 2 +- 4 files changed, 139 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8df584b0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java -- diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 2e40312..5a97f4f 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -21,6 +21,7 @@ import javax.annotation.Nullable; import java.io.File; import java.io.IOException; import java.util.LinkedList; +import java.util.Queue; import com.google.common.annotations.VisibleForTesting; import org.slf4j.Logger; @@ -521,4 +522,66 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return upstream.getKeyPrefix(); } } + + /** + * Returns a iterator, which will return the rows in the order as inserted. + * + * It is the caller's responsibility to call `cleanupResources()` + * after consuming this iterator. + */ + public UnsafeSorterIterator getIterator() throws IOException { +if (spillWriters.isEmpty()) { + assert(inMemSorter != null); + return inMemSorter.getIterator(); +} else { + LinkedList queue = new LinkedList<>(); + for (UnsafeSorterSpillWriter spillWriter : spillWriters) { +queue.add(spillWriter.getReader(blockManager)); + } + if (inMemSorter != null) { +queue.add(inMemSorter.getIterator()); + } + return new ChainedIterator(queue); +} + } + + /** + * Chain multiple UnsafeSorterIterator together as single one. + */ + class ChainedIterator extends UnsafeSorterIterator { + +private final Queue iterators; +private UnsafeSorterIterator current; + +public ChainedIterator(Queue iterators) { + assert iterators.size() > 0; + this.iterators = iterators; + this.current = iterators.remove(); +} + +@Override +public boolean hasNext() { + while (!current.hasNext() && !iterators.isEmpty()) { +current = iterators.remove(); + } + return current.hasNext(); +} + +@Override +public void loadNext() throws IOException { + current.loadNext(); +} + +@Override +public Object getBaseObject() { return current.getBaseObject(); } + +@Override +public long getBaseOffset() { return current.getBaseOffset(); } + +@Override +public int getRecordLength() { return current.getRecordLength(); } + +@Override +public long getKeyPrefix() { return current.getKeyPrefix(); } + } } http://git-wip-us.apache.org/repos/asf/spark/blob/8df584b0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java -- diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java index dce1f15..c91e88f 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java @@ -226,4 +226,11 @@ public final class UnsafeInMemorySorter { sorter.sort(array, 0, pos / 2, sortComparator); return new SortedIterator(pos / 2);
spark git commit: [SPARK-12037][CORE] initialize heartbeatReceiverRef before calling startDriverHeartbeat
Repository: spark Updated Branches: refs/heads/branch-1.6 43ffa0373 -> a4e134827 [SPARK-12037][CORE] initialize heartbeatReceiverRef before calling startDriverHeartbeat https://issues.apache.org/jira/browse/SPARK-12037 a simple fix by changing the order of the statements Author: CodingCatCloses #10032 from CodingCat/SPARK-12037. (cherry picked from commit 0a46e4377216a1f7de478f220c3b3042a77789e2) Signed-off-by: Andrew Or Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a4e13482 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a4e13482 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a4e13482 Branch: refs/heads/branch-1.6 Commit: a4e134827bdfee866a5af22751452c7e03801645 Parents: 43ffa03 Author: CodingCat Authored: Mon Nov 30 17:19:26 2015 -0800 Committer: Andrew Or Committed: Mon Nov 30 17:19:58 2015 -0800 -- core/src/main/scala/org/apache/spark/executor/Executor.scala | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a4e13482/core/src/main/scala/org/apache/spark/executor/Executor.scala -- diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index 6154f06..7b68dfe 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -109,6 +109,10 @@ private[spark] class Executor( // Executor for the heartbeat task. private val heartbeater = ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-heartbeater") + // must be initialized before running startDriverHeartbeat() + private val heartbeatReceiverRef = +RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv) + startDriverHeartbeater() def launchTask( @@ -411,9 +415,6 @@ private[spark] class Executor( } } - private val heartbeatReceiverRef = -RpcUtils.makeDriverRef(HeartbeatReceiver.ENDPOINT_NAME, conf, env.rpcEnv) - /** Reports heartbeat and metrics for active tasks to the driver. */ private def reportHeartBeat(): Unit = { // list of (task id, metrics) to send back to the driver - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[1/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.
Repository: spark Updated Branches: refs/heads/master 0a46e4377 -> 9bf212067 http://git-wip-us.apache.org/repos/asf/spark/blob/9bf21206/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java -- diff --git a/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java b/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java index 88fa225..9e9be98 100644 --- a/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java +++ b/network/common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java @@ -17,6 +17,7 @@ package org.apache.spark.network; +import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Collections; import java.util.HashSet; @@ -26,7 +27,6 @@ import java.util.Set; import java.util.concurrent.Semaphore; import java.util.concurrent.TimeUnit; -import com.google.common.base.Charsets; import com.google.common.collect.Sets; import org.junit.AfterClass; import org.junit.BeforeClass; @@ -41,6 +41,7 @@ import org.apache.spark.network.server.OneForOneStreamManager; import org.apache.spark.network.server.RpcHandler; import org.apache.spark.network.server.StreamManager; import org.apache.spark.network.server.TransportServer; +import org.apache.spark.network.util.JavaUtils; import org.apache.spark.network.util.SystemPropertyConfigProvider; import org.apache.spark.network.util.TransportConf; @@ -55,11 +56,14 @@ public class RpcIntegrationSuite { TransportConf conf = new TransportConf("shuffle", new SystemPropertyConfigProvider()); rpcHandler = new RpcHandler() { @Override - public void receive(TransportClient client, byte[] message, RpcResponseCallback callback) { -String msg = new String(message, Charsets.UTF_8); + public void receive( + TransportClient client, + ByteBuffer message, + RpcResponseCallback callback) { +String msg = JavaUtils.bytesToString(message); String[] parts = msg.split("/"); if (parts[0].equals("hello")) { - callback.onSuccess(("Hello, " + parts[1] + "!").getBytes(Charsets.UTF_8)); + callback.onSuccess(JavaUtils.stringToBytes("Hello, " + parts[1] + "!")); } else if (parts[0].equals("return error")) { callback.onFailure(new RuntimeException("Returned: " + parts[1])); } else if (parts[0].equals("throw error")) { @@ -68,9 +72,8 @@ public class RpcIntegrationSuite { } @Override - public void receive(TransportClient client, byte[] message) { -String msg = new String(message, Charsets.UTF_8); -oneWayMsgs.add(msg); + public void receive(TransportClient client, ByteBuffer message) { +oneWayMsgs.add(JavaUtils.bytesToString(message)); } @Override @@ -103,8 +106,9 @@ public class RpcIntegrationSuite { RpcResponseCallback callback = new RpcResponseCallback() { @Override - public void onSuccess(byte[] message) { -res.successMessages.add(new String(message, Charsets.UTF_8)); + public void onSuccess(ByteBuffer message) { +String response = JavaUtils.bytesToString(message); +res.successMessages.add(response); sem.release(); } @@ -116,7 +120,7 @@ public class RpcIntegrationSuite { }; for (String command : commands) { - client.sendRpc(command.getBytes(Charsets.UTF_8), callback); + client.sendRpc(JavaUtils.stringToBytes(command), callback); } if (!sem.tryAcquire(commands.length, 5, TimeUnit.SECONDS)) { @@ -173,7 +177,7 @@ public class RpcIntegrationSuite { final String message = "no reply"; TransportClient client = clientFactory.createClient(TestUtils.getLocalHost(), server.getPort()); try { - client.send(message.getBytes(Charsets.UTF_8)); + client.send(JavaUtils.stringToBytes(message)); assertEquals(0, client.getHandler().numOutstandingRequests()); // Make sure the message arrives. http://git-wip-us.apache.org/repos/asf/spark/blob/9bf21206/network/common/src/test/java/org/apache/spark/network/StreamSuite.java -- diff --git a/network/common/src/test/java/org/apache/spark/network/StreamSuite.java b/network/common/src/test/java/org/apache/spark/network/StreamSuite.java index 538f3ef..9c49556 100644 --- a/network/common/src/test/java/org/apache/spark/network/StreamSuite.java +++ b/network/common/src/test/java/org/apache/spark/network/StreamSuite.java @@ -116,7 +116,10 @@ public class StreamSuite { }; RpcHandler handler = new RpcHandler() { @Override - public void receive(TransportClient client, byte[] message, RpcResponseCallback callback) { + public void receive( + TransportClient client, + ByteBuffer message, + RpcResponseCallback
[1/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.
Repository: spark Updated Branches: refs/heads/branch-1.6 a4e134827 -> ef6f8c262 http://git-wip-us.apache.org/repos/asf/spark/blob/ef6f8c26/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java -- diff --git a/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java b/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java index 42955ef..f9b5bf9 100644 --- a/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java +++ b/network/common/src/test/java/org/apache/spark/network/RequestTimeoutIntegrationSuite.java @@ -31,6 +31,7 @@ import org.apache.spark.network.server.TransportServer; import org.apache.spark.network.util.MapConfigProvider; import org.apache.spark.network.util.TransportConf; import org.junit.*; +import static org.junit.Assert.*; import java.io.IOException; import java.nio.ByteBuffer; @@ -84,13 +85,16 @@ public class RequestTimeoutIntegrationSuite { @Test public void timeoutInactiveRequests() throws Exception { final Semaphore semaphore = new Semaphore(1); -final byte[] response = new byte[16]; +final int responseSize = 16; RpcHandler handler = new RpcHandler() { @Override - public void receive(TransportClient client, byte[] message, RpcResponseCallback callback) { + public void receive( + TransportClient client, + ByteBuffer message, + RpcResponseCallback callback) { try { semaphore.tryAcquire(FOREVER, TimeUnit.MILLISECONDS); - callback.onSuccess(response); + callback.onSuccess(ByteBuffer.allocate(responseSize)); } catch (InterruptedException e) { // do nothing } @@ -110,15 +114,15 @@ public class RequestTimeoutIntegrationSuite { // First completes quickly (semaphore starts at 1). TestCallback callback0 = new TestCallback(); synchronized (callback0) { - client.sendRpc(new byte[0], callback0); + client.sendRpc(ByteBuffer.allocate(0), callback0); callback0.wait(FOREVER); - assert (callback0.success.length == response.length); + assertEquals(responseSize, callback0.successLength); } // Second times out after 2 seconds, with slack. Must be IOException. TestCallback callback1 = new TestCallback(); synchronized (callback1) { - client.sendRpc(new byte[0], callback1); + client.sendRpc(ByteBuffer.allocate(0), callback1); callback1.wait(4 * 1000); assert (callback1.failure != null); assert (callback1.failure instanceof IOException); @@ -131,13 +135,16 @@ public class RequestTimeoutIntegrationSuite { @Test public void timeoutCleanlyClosesClient() throws Exception { final Semaphore semaphore = new Semaphore(0); -final byte[] response = new byte[16]; +final int responseSize = 16; RpcHandler handler = new RpcHandler() { @Override - public void receive(TransportClient client, byte[] message, RpcResponseCallback callback) { + public void receive( + TransportClient client, + ByteBuffer message, + RpcResponseCallback callback) { try { semaphore.tryAcquire(FOREVER, TimeUnit.MILLISECONDS); - callback.onSuccess(response); + callback.onSuccess(ByteBuffer.allocate(responseSize)); } catch (InterruptedException e) { // do nothing } @@ -158,7 +165,7 @@ public class RequestTimeoutIntegrationSuite { clientFactory.createClient(TestUtils.getLocalHost(), server.getPort()); TestCallback callback0 = new TestCallback(); synchronized (callback0) { - client0.sendRpc(new byte[0], callback0); + client0.sendRpc(ByteBuffer.allocate(0), callback0); callback0.wait(FOREVER); assert (callback0.failure instanceof IOException); assert (!client0.isActive()); @@ -170,10 +177,10 @@ public class RequestTimeoutIntegrationSuite { clientFactory.createClient(TestUtils.getLocalHost(), server.getPort()); TestCallback callback1 = new TestCallback(); synchronized (callback1) { - client1.sendRpc(new byte[0], callback1); + client1.sendRpc(ByteBuffer.allocate(0), callback1); callback1.wait(FOREVER); - assert (callback1.success.length == response.length); - assert (callback1.failure == null); + assertEquals(responseSize, callback1.successLength); + assertNull(callback1.failure); } } @@ -191,7 +198,10 @@ public class RequestTimeoutIntegrationSuite { }; RpcHandler handler = new RpcHandler() { @Override - public void receive(TransportClient client, byte[] message, RpcResponseCallback callback) { + public void receive( + TransportClient client, + ByteBuffer message, + RpcResponseCallback callback) { throw new
spark git commit: [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown
Repository: spark Updated Branches: refs/heads/master 9bf212067 -> 96bf468c7 [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown Avoid potential deadlock with a user app's shutdown hook thread by more narrowly synchronizing access to 'hooks' Author: Sean OwenCloses #10042 from srowen/SPARK-12049. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96bf468c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96bf468c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96bf468c Branch: refs/heads/master Commit: 96bf468c7860be317c20ccacf259910968d2dc83 Parents: 9bf2120 Author: Sean Owen Authored: Mon Nov 30 17:33:09 2015 -0800 Committer: Marcelo Vanzin Committed: Mon Nov 30 17:33:09 2015 -0800 -- .../apache/spark/util/ShutdownHookManager.scala | 33 ++-- 1 file changed, 16 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/96bf468c/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala index 4012dca..620f226 100644 --- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala +++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala @@ -206,7 +206,7 @@ private[spark] object ShutdownHookManager extends Logging { private [util] class SparkShutdownHookManager { private val hooks = new PriorityQueue[SparkShutdownHook]() - private var shuttingDown = false + @volatile private var shuttingDown = false /** * Install a hook to run at shutdown and run all registered hooks in order. Hadoop 1.x does not @@ -232,28 +232,27 @@ private [util] class SparkShutdownHookManager { } } - def runAll(): Unit = synchronized { + def runAll(): Unit = { shuttingDown = true -while (!hooks.isEmpty()) { - Try(Utils.logUncaughtExceptions(hooks.poll().run())) +var nextHook: SparkShutdownHook = null +while ({ nextHook = hooks.synchronized { hooks.poll() }; nextHook != null }) { + Try(Utils.logUncaughtExceptions(nextHook.run())) } } - def add(priority: Int, hook: () => Unit): AnyRef = synchronized { -checkState() -val hookRef = new SparkShutdownHook(priority, hook) -hooks.add(hookRef) -hookRef - } - - def remove(ref: AnyRef): Boolean = synchronized { -hooks.remove(ref) + def add(priority: Int, hook: () => Unit): AnyRef = { +hooks.synchronized { + if (shuttingDown) { +throw new IllegalStateException("Shutdown hooks cannot be modified during shutdown.") + } + val hookRef = new SparkShutdownHook(priority, hook) + hooks.add(hookRef) + hookRef +} } - private def checkState(): Unit = { -if (shuttingDown) { - throw new IllegalStateException("Shutdown hooks cannot be modified during shutdown.") -} + def remove(ref: AnyRef): Boolean = { +hooks.synchronized { hooks.remove(ref) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown
Repository: spark Updated Branches: refs/heads/branch-1.6 ef6f8c262 -> 2fc3fce8a [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown Avoid potential deadlock with a user app's shutdown hook thread by more narrowly synchronizing access to 'hooks' Author: Sean OwenCloses #10042 from srowen/SPARK-12049. (cherry picked from commit 96bf468c7860be317c20ccacf259910968d2dc83) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2fc3fce8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2fc3fce8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2fc3fce8 Branch: refs/heads/branch-1.6 Commit: 2fc3fce8aee69056f3a6a9c68b450b3802fb3a55 Parents: ef6f8c2 Author: Sean Owen Authored: Mon Nov 30 17:33:09 2015 -0800 Committer: Marcelo Vanzin Committed: Mon Nov 30 17:33:20 2015 -0800 -- .../apache/spark/util/ShutdownHookManager.scala | 33 ++-- 1 file changed, 16 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2fc3fce8/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala index 4012dca..620f226 100644 --- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala +++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala @@ -206,7 +206,7 @@ private[spark] object ShutdownHookManager extends Logging { private [util] class SparkShutdownHookManager { private val hooks = new PriorityQueue[SparkShutdownHook]() - private var shuttingDown = false + @volatile private var shuttingDown = false /** * Install a hook to run at shutdown and run all registered hooks in order. Hadoop 1.x does not @@ -232,28 +232,27 @@ private [util] class SparkShutdownHookManager { } } - def runAll(): Unit = synchronized { + def runAll(): Unit = { shuttingDown = true -while (!hooks.isEmpty()) { - Try(Utils.logUncaughtExceptions(hooks.poll().run())) +var nextHook: SparkShutdownHook = null +while ({ nextHook = hooks.synchronized { hooks.poll() }; nextHook != null }) { + Try(Utils.logUncaughtExceptions(nextHook.run())) } } - def add(priority: Int, hook: () => Unit): AnyRef = synchronized { -checkState() -val hookRef = new SparkShutdownHook(priority, hook) -hooks.add(hookRef) -hookRef - } - - def remove(ref: AnyRef): Boolean = synchronized { -hooks.remove(ref) + def add(priority: Int, hook: () => Unit): AnyRef = { +hooks.synchronized { + if (shuttingDown) { +throw new IllegalStateException("Shutdown hooks cannot be modified during shutdown.") + } + val hookRef = new SparkShutdownHook(priority, hook) + hooks.add(hookRef) + hookRef +} } - private def checkState(): Unit = { -if (shuttingDown) { - throw new IllegalStateException("Shutdown hooks cannot be modified during shutdown.") -} + def remove(ref: AnyRef): Boolean = { +hooks.synchronized { hooks.remove(ref) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12035] Add more debug information in include_example tag of Jekyll
Repository: spark Updated Branches: refs/heads/master d3ca8cfac -> e6dc89a33 [SPARK-12035] Add more debug information in include_example tag of Jekyll https://issues.apache.org/jira/browse/SPARK-12035 When we debuging lots of example code files, like in https://github.com/apache/spark/pull/10002, it's hard to know which file causes errors due to limited information in `include_example.rb`. With their filenames, we can locate bugs easily. Author: Xusen YinCloses #10026 from yinxusen/SPARK-12035. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e6dc89a3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e6dc89a3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e6dc89a3 Branch: refs/heads/master Commit: e6dc89a33951e9197a77dbcacf022c27469ae41e Parents: d3ca8cf Author: Xusen Yin Authored: Mon Nov 30 17:18:44 2015 -0800 Committer: Andrew Or Committed: Mon Nov 30 17:18:44 2015 -0800 -- docs/_plugins/include_example.rb | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e6dc89a3/docs/_plugins/include_example.rb -- diff --git a/docs/_plugins/include_example.rb b/docs/_plugins/include_example.rb index 564c866..f748582 100644 --- a/docs/_plugins/include_example.rb +++ b/docs/_plugins/include_example.rb @@ -75,10 +75,10 @@ module Jekyll .select { |l, i| l.include? "$example off$" } .map { |l, i| i } - raise "Start indices amount is not equal to end indices amount, please check the code." \ + raise "Start indices amount is not equal to end indices amount, see #{@file}." \ unless startIndices.size == endIndices.size - raise "No code is selected by include_example, please check the code." \ + raise "No code is selected by include_example, see #{@file}." \ if startIndices.size == 0 # Select and join code blocks together, with a space line between each of two continuous @@ -86,8 +86,10 @@ module Jekyll lastIndex = -1 result = "" startIndices.zip(endIndices).each do |start, endline| -raise "Overlapping between two example code blocks are not allowed." if start <= lastIndex -raise "$example on$ should not be in the same line with $example off$." if start == endline +raise "Overlapping between two example code blocks are not allowed, see #{@file}." \ +if start <= lastIndex +raise "$example on$ should not be in the same line with $example off$, see #{@file}." \ +if start == endline lastIndex = endline range = Range.new(start + 1, endline - 1) result += trim_codeblock(lines[range]).join - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.
[SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer. This change seems large, but most of it is just replacing `byte[]` with `ByteBuffer` and `new byte[]` with `ByteBuffer.allocate()`, since it changes the network library's API. The following are parts of the code that actually have meaningful changes: - The Message implementations were changed to inherit from a new AbstractMessage that can optionally hold a reference to a body (in the form of a ManagedBuffer); this is similar to how ResponseWithBody worked before, except now it's not restricted to just responses. - The TransportFrameDecoder was pretty much rewritten to avoid copies as much as possible; it doesn't rely on CompositeByteBuf to accumulate incoming data anymore, since CompositeByteBuf has issues when slices are retained. The code now is able to create frames without having to resort to copying bytes except for a few bytes (containing the frame length) in very rare cases. - Some minor changes in the SASL layer to convert things back to `byte[]` since the JDK SASL API operates on those. Author: Marcelo VanzinCloses #9987 from vanzin/SPARK-12007. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bf21206 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bf21206 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bf21206 Branch: refs/heads/master Commit: 9bf2120672ae0f620a217ccd96bef189ab75e0d6 Parents: 0a46e43 Author: Marcelo Vanzin Authored: Mon Nov 30 17:22:05 2015 -0800 Committer: Andrew Or Committed: Mon Nov 30 17:22:05 2015 -0800 -- .../mesos/MesosExternalShuffleService.scala | 3 +- .../network/netty/NettyBlockRpcServer.scala | 8 +- .../netty/NettyBlockTransferService.scala | 6 +- .../apache/spark/rpc/netty/NettyRpcEnv.scala| 16 +-- .../org/apache/spark/rpc/netty/Outbox.scala | 9 +- .../spark/rpc/netty/NettyRpcHandlerSuite.scala | 3 +- .../network/client/RpcResponseCallback.java | 4 +- .../spark/network/client/TransportClient.java | 16 ++- .../client/TransportResponseHandler.java| 16 ++- .../spark/network/protocol/AbstractMessage.java | 54 +++ .../protocol/AbstractResponseMessage.java | 32 + .../network/protocol/ChunkFetchFailure.java | 2 +- .../network/protocol/ChunkFetchRequest.java | 2 +- .../network/protocol/ChunkFetchSuccess.java | 8 +- .../apache/spark/network/protocol/Message.java | 11 +- .../spark/network/protocol/MessageEncoder.java | 29 ++-- .../spark/network/protocol/OneWayMessage.java | 33 +++-- .../network/protocol/ResponseWithBody.java | 40 -- .../spark/network/protocol/RpcFailure.java | 2 +- .../spark/network/protocol/RpcRequest.java | 34 +++-- .../spark/network/protocol/RpcResponse.java | 39 +++-- .../spark/network/protocol/StreamFailure.java | 2 +- .../spark/network/protocol/StreamRequest.java | 2 +- .../spark/network/protocol/StreamResponse.java | 19 +-- .../spark/network/sasl/SaslClientBootstrap.java | 12 +- .../apache/spark/network/sasl/SaslMessage.java | 31 ++-- .../spark/network/sasl/SaslRpcHandler.java | 26 +++- .../spark/network/server/MessageHandler.java| 2 +- .../spark/network/server/NoOpRpcHandler.java| 8 +- .../apache/spark/network/server/RpcHandler.java | 8 +- .../network/server/TransportChannelHandler.java | 2 +- .../network/server/TransportRequestHandler.java | 15 +- .../apache/spark/network/util/JavaUtils.java| 48 --- .../network/util/TransportFrameDecoder.java | 142 +-- .../network/ChunkFetchIntegrationSuite.java | 5 +- .../org/apache/spark/network/ProtocolSuite.java | 10 +- .../network/RequestTimeoutIntegrationSuite.java | 51 --- .../spark/network/RpcIntegrationSuite.java | 26 ++-- .../org/apache/spark/network/StreamSuite.java | 5 +- .../network/TransportResponseHandlerSuite.java | 24 ++-- .../spark/network/sasl/SparkSaslSuite.java | 43 -- .../util/TransportFrameDecoderSuite.java| 23 ++- .../shuffle/ExternalShuffleBlockHandler.java| 9 +- .../network/shuffle/ExternalShuffleClient.java | 3 +- .../network/shuffle/OneForOneBlockFetcher.java | 7 +- .../mesos/MesosExternalShuffleClient.java | 5 +- .../shuffle/protocol/BlockTransferMessage.java | 8 +- .../network/sasl/SaslIntegrationSuite.java | 18 +-- .../shuffle/BlockTransferMessagesSuite.java | 2 +- .../ExternalShuffleBlockHandlerSuite.java | 21 +-- .../shuffle/OneForOneBlockFetcherSuite.java | 8 +- 51 files changed, 617 insertions(+), 335 deletions(-) --
[2/2] spark git commit: [SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer.
[SPARK-12007][NETWORK] Avoid copies in the network lib's RPC layer. This change seems large, but most of it is just replacing `byte[]` with `ByteBuffer` and `new byte[]` with `ByteBuffer.allocate()`, since it changes the network library's API. The following are parts of the code that actually have meaningful changes: - The Message implementations were changed to inherit from a new AbstractMessage that can optionally hold a reference to a body (in the form of a ManagedBuffer); this is similar to how ResponseWithBody worked before, except now it's not restricted to just responses. - The TransportFrameDecoder was pretty much rewritten to avoid copies as much as possible; it doesn't rely on CompositeByteBuf to accumulate incoming data anymore, since CompositeByteBuf has issues when slices are retained. The code now is able to create frames without having to resort to copying bytes except for a few bytes (containing the frame length) in very rare cases. - Some minor changes in the SASL layer to convert things back to `byte[]` since the JDK SASL API operates on those. Author: Marcelo VanzinCloses #9987 from vanzin/SPARK-12007. (cherry picked from commit 9bf2120672ae0f620a217ccd96bef189ab75e0d6) Signed-off-by: Andrew Or Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef6f8c26 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef6f8c26 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef6f8c26 Branch: refs/heads/branch-1.6 Commit: ef6f8c262648c42041e71ee1578308e205225ccf Parents: a4e13482 Author: Marcelo Vanzin Authored: Mon Nov 30 17:22:05 2015 -0800 Committer: Andrew Or Committed: Mon Nov 30 17:22:12 2015 -0800 -- .../mesos/MesosExternalShuffleService.scala | 3 +- .../network/netty/NettyBlockRpcServer.scala | 8 +- .../netty/NettyBlockTransferService.scala | 6 +- .../apache/spark/rpc/netty/NettyRpcEnv.scala| 16 +-- .../org/apache/spark/rpc/netty/Outbox.scala | 9 +- .../spark/rpc/netty/NettyRpcHandlerSuite.scala | 3 +- .../network/client/RpcResponseCallback.java | 4 +- .../spark/network/client/TransportClient.java | 16 ++- .../client/TransportResponseHandler.java| 16 ++- .../spark/network/protocol/AbstractMessage.java | 54 +++ .../protocol/AbstractResponseMessage.java | 32 + .../network/protocol/ChunkFetchFailure.java | 2 +- .../network/protocol/ChunkFetchRequest.java | 2 +- .../network/protocol/ChunkFetchSuccess.java | 8 +- .../apache/spark/network/protocol/Message.java | 11 +- .../spark/network/protocol/MessageEncoder.java | 29 ++-- .../spark/network/protocol/OneWayMessage.java | 33 +++-- .../network/protocol/ResponseWithBody.java | 40 -- .../spark/network/protocol/RpcFailure.java | 2 +- .../spark/network/protocol/RpcRequest.java | 34 +++-- .../spark/network/protocol/RpcResponse.java | 39 +++-- .../spark/network/protocol/StreamFailure.java | 2 +- .../spark/network/protocol/StreamRequest.java | 2 +- .../spark/network/protocol/StreamResponse.java | 19 +-- .../spark/network/sasl/SaslClientBootstrap.java | 12 +- .../apache/spark/network/sasl/SaslMessage.java | 31 ++-- .../spark/network/sasl/SaslRpcHandler.java | 26 +++- .../spark/network/server/MessageHandler.java| 2 +- .../spark/network/server/NoOpRpcHandler.java| 8 +- .../apache/spark/network/server/RpcHandler.java | 8 +- .../network/server/TransportChannelHandler.java | 2 +- .../network/server/TransportRequestHandler.java | 15 +- .../apache/spark/network/util/JavaUtils.java| 48 --- .../network/util/TransportFrameDecoder.java | 142 +-- .../network/ChunkFetchIntegrationSuite.java | 5 +- .../org/apache/spark/network/ProtocolSuite.java | 10 +- .../network/RequestTimeoutIntegrationSuite.java | 51 --- .../spark/network/RpcIntegrationSuite.java | 26 ++-- .../org/apache/spark/network/StreamSuite.java | 5 +- .../network/TransportResponseHandlerSuite.java | 24 ++-- .../spark/network/sasl/SparkSaslSuite.java | 43 -- .../util/TransportFrameDecoderSuite.java| 23 ++- .../shuffle/ExternalShuffleBlockHandler.java| 9 +- .../network/shuffle/ExternalShuffleClient.java | 3 +- .../network/shuffle/OneForOneBlockFetcher.java | 7 +- .../mesos/MesosExternalShuffleClient.java | 5 +- .../shuffle/protocol/BlockTransferMessage.java | 8 +- .../network/sasl/SaslIntegrationSuite.java | 18 +-- .../shuffle/BlockTransferMessagesSuite.java | 2 +- .../ExternalShuffleBlockHandlerSuite.java | 21 +-- .../shuffle/OneForOneBlockFetcherSuite.java | 8 +- 51 files changed, 617 insertions(+), 335 deletions(-)
spark git commit: [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown
Repository: spark Updated Branches: refs/heads/branch-1.5 9b6216171 -> d78f1bc45 [SPARK-12049][CORE] User JVM shutdown hook can cause deadlock at shutdown Avoid potential deadlock with a user app's shutdown hook thread by more narrowly synchronizing access to 'hooks' Author: Sean OwenCloses #10042 from srowen/SPARK-12049. (cherry picked from commit 96bf468c7860be317c20ccacf259910968d2dc83) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d78f1bc4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d78f1bc4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d78f1bc4 Branch: refs/heads/branch-1.5 Commit: d78f1bc45ff04637761bcfcc565ce40fb587a5a9 Parents: 9b62161 Author: Sean Owen Authored: Mon Nov 30 17:33:09 2015 -0800 Committer: Marcelo Vanzin Committed: Mon Nov 30 17:33:32 2015 -0800 -- .../apache/spark/util/ShutdownHookManager.scala | 33 ++-- 1 file changed, 16 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d78f1bc4/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala index 091dc03..fddeceb 100644 --- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala +++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala @@ -206,7 +206,7 @@ private[spark] object ShutdownHookManager extends Logging { private [util] class SparkShutdownHookManager { private val hooks = new PriorityQueue[SparkShutdownHook]() - private var shuttingDown = false + @volatile private var shuttingDown = false /** * Install a hook to run at shutdown and run all registered hooks in order. Hadoop 1.x does not @@ -230,28 +230,27 @@ private [util] class SparkShutdownHookManager { } } - def runAll(): Unit = synchronized { + def runAll(): Unit = { shuttingDown = true -while (!hooks.isEmpty()) { - Try(Utils.logUncaughtExceptions(hooks.poll().run())) +var nextHook: SparkShutdownHook = null +while ({ nextHook = hooks.synchronized { hooks.poll() }; nextHook != null }) { + Try(Utils.logUncaughtExceptions(nextHook.run())) } } - def add(priority: Int, hook: () => Unit): AnyRef = synchronized { -checkState() -val hookRef = new SparkShutdownHook(priority, hook) -hooks.add(hookRef) -hookRef - } - - def remove(ref: AnyRef): Boolean = synchronized { -hooks.remove(ref) + def add(priority: Int, hook: () => Unit): AnyRef = { +hooks.synchronized { + if (shuttingDown) { +throw new IllegalStateException("Shutdown hooks cannot be modified during shutdown.") + } + val hookRef = new SparkShutdownHook(priority, hook) + hooks.add(hookRef) + hookRef +} } - private def checkState(): Unit = { -if (shuttingDown) { - throw new IllegalStateException("Shutdown hooks cannot be modified during shutdown.") -} + def remove(ref: AnyRef): Boolean = { +hooks.synchronized { hooks.remove(ref) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.
Repository: spark Updated Branches: refs/heads/branch-1.6 2fc3fce8a -> 86a46ce68 [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin. I accidentally omitted these as part of #10049. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86a46ce6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86a46ce6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86a46ce6 Branch: refs/heads/branch-1.6 Commit: 86a46ce686cd140aa9854e2d24cfd3313cd8b894 Parents: 2fc3fce Author: Josh RosenAuthored: Mon Nov 30 17:15:47 2015 -0800 Committer: Josh Rosen Committed: Mon Nov 30 18:26:48 2015 -0800 -- docs/_plugins/copy_api_dirs.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/86a46ce6/docs/_plugins/copy_api_dirs.rb -- diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb index f2f3e2e..174c202 100644 --- a/docs/_plugins/copy_api_dirs.rb +++ b/docs/_plugins/copy_api_dirs.rb @@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1') puts "Moving to python/docs directory and building sphinx." cd("../python/docs") - system(make html) || raise("Python doc generation failed") + system("make html") || raise("Python doc generation failed") puts "Moving back into home dir." cd("../../") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.
Repository: spark Updated Branches: refs/heads/master 96bf468c7 -> f73379be2 [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin. I accidentally omitted these as part of #10049. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f73379be Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f73379be Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f73379be Branch: refs/heads/master Commit: f73379be2b0c286957b678a996cb56afc96015eb Parents: 96bf468 Author: Josh RosenAuthored: Mon Nov 30 17:15:47 2015 -0800 Committer: Josh Rosen Committed: Mon Nov 30 18:25:59 2015 -0800 -- docs/_plugins/copy_api_dirs.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f73379be/docs/_plugins/copy_api_dirs.rb -- diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb index f2f3e2e..174c202 100644 --- a/docs/_plugins/copy_api_dirs.rb +++ b/docs/_plugins/copy_api_dirs.rb @@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1') puts "Moving to python/docs directory and building sphinx." cd("../python/docs") - system(make html) || raise("Python doc generation failed") + system("make html") || raise("Python doc generation failed") puts "Moving back into home dir." cd("../../") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters
Repository: spark Updated Branches: refs/heads/branch-1.6 0978ec11c -> a8c6d8acc [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters ```EventLoggingListener.getLogPath``` needs 4 input arguments: https://github.com/apache/spark/blob/v1.6.0-preview2/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L276-L280 the 3rd parameter should be appAttemptId, 4th parameter is codec... Author: Teng QiuCloses #10044 from chutium/SPARK-12053. (cherry picked from commit a8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce) Signed-off-by: Kousuke Saruta Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8c6d8ac Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8c6d8ac Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8c6d8ac Branch: refs/heads/branch-1.6 Commit: a8c6d8accae7712568f583792ed2c0a3fce06776 Parents: 0978ec1 Author: Teng Qiu Authored: Tue Dec 1 07:27:32 2015 +0900 Committer: Kousuke Saruta Committed: Tue Dec 1 07:28:02 2015 +0900 -- core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a8c6d8ac/core/src/main/scala/org/apache/spark/deploy/master/Master.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala index 9952c97..1355e1a 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala @@ -934,7 +934,7 @@ private[deploy] class Master( } val eventLogFilePrefix = EventLoggingListener.getLogPath( - eventLogDir, app.id, app.desc.eventLogCodec) + eventLogDir, app.id, appAttemptId = None, compressionCodecName = app.desc.eventLogCodec) val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf) val inProgressExists = fs.exists(new Path(eventLogFilePrefix + EventLoggingListener.IN_PROGRESS)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters
Repository: spark Updated Branches: refs/heads/master 2c5dee0fb -> a8ceec5e8 [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters ```EventLoggingListener.getLogPath``` needs 4 input arguments: https://github.com/apache/spark/blob/v1.6.0-preview2/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L276-L280 the 3rd parameter should be appAttemptId, 4th parameter is codec... Author: Teng QiuCloses #10044 from chutium/SPARK-12053. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8ceec5e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8ceec5e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8ceec5e Branch: refs/heads/master Commit: a8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce Parents: 2c5dee0 Author: Teng Qiu Authored: Tue Dec 1 07:27:32 2015 +0900 Committer: Kousuke Saruta Committed: Tue Dec 1 07:27:32 2015 +0900 -- core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a8ceec5e/core/src/main/scala/org/apache/spark/deploy/master/Master.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala index 9952c97..1355e1a 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala @@ -934,7 +934,7 @@ private[deploy] class Master( } val eventLogFilePrefix = EventLoggingListener.getLogPath( - eventLogDir, app.id, app.desc.eventLogCodec) + eventLogDir, app.id, appAttemptId = None, compressionCodecName = app.desc.eventLogCodec) val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf) val inProgressExists = fs.exists(new Path(eventLogFilePrefix + EventLoggingListener.IN_PROGRESS)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11960][MLLIB][DOC] User guide for streaming tests
Repository: spark Updated Branches: refs/heads/branch-1.6 a387cef3a -> ebf87ebc0 [SPARK-11960][MLLIB][DOC] User guide for streaming tests CC jkbradley mengxr josepablocam Author: Feynman LiangCloses #10005 from feynmanliang/streaming-test-user-guide. (cherry picked from commit 55358889309cf2d856b72e72e0f3081dfdf61cfa) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ebf87ebc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ebf87ebc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ebf87ebc Branch: refs/heads/branch-1.6 Commit: ebf87ebc02075497f4682e3ad0f8e63d33f3b86e Parents: a387cef Author: Feynman Liang Authored: Mon Nov 30 15:38:44 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 15:38:51 2015 -0800 -- docs/mllib-guide.md | 1 + docs/mllib-statistics.md| 25 .../examples/mllib/StreamingTestExample.scala | 2 ++ 3 files changed, 28 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 54e35fc..43772ad 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -34,6 +34,7 @@ We list major functionality from both below, with links to detailed guides. * [correlations](mllib-statistics.html#correlations) * [stratified sampling](mllib-statistics.html#stratified-sampling) * [hypothesis testing](mllib-statistics.html#hypothesis-testing) + * [streaming significance testing](mllib-statistics.html#streaming-significance-testing) * [random data generation](mllib-statistics.html#random-data-generation) * [Classification and regression](mllib-classification-regression.html) * [linear models (SVMs, logistic regression, linear regression)](mllib-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/docs/mllib-statistics.md -- diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md index ade5b07..de209f6 100644 --- a/docs/mllib-statistics.md +++ b/docs/mllib-statistics.md @@ -521,6 +521,31 @@ print(testResult) # summary of the test including the p-value, test statistic, +### Streaming Significance Testing +MLlib provides online implementations of some tests to support use cases +like A/B testing. These tests may be performed on a Spark Streaming +`DStream[(Boolean,Double)]` where the first element of each tuple +indicates control group (`false`) or treatment group (`true`) and the +second element is the value of an observation. + +Streaming significance testing supports the following parameters: + +* `peacePeriod` - The number of initial data points from the stream to +ignore, used to mitigate novelty effects. +* `windowSize` - The number of past batches to perform hypothesis +testing over. Setting to `0` will perform cumulative processing using +all prior batches. + + + + +[`StreamingTest`](api/scala/index.html#org.apache.spark.mllib.stat.test.StreamingTest) +provides streaming hypothesis testing. + +{% include_example scala/org/apache/spark/examples/mllib/StreamingTestExample.scala %} + + + ## Random data generation http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala index ab29f90..b6677c6 100644 --- a/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala @@ -64,6 +64,7 @@ object StreamingTestExample { dir.toString }) +// $example on$ val data = ssc.textFileStream(dataDir).map(line => line.split(",") match { case Array(label, value) => (label.toBoolean, value.toDouble) }) @@ -75,6 +76,7 @@ object StreamingTestExample { val out = streamingTest.registerStream(data) out.print() +// $example off$ // Stop processing if test becomes significant or we time out var timeoutCounter = numBatchesTimeout - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters
Repository: spark Updated Branches: refs/heads/branch-1.5 7900d192e -> 9b6216171 [SPARK-12053][CORE] EventLoggingListener.getLogPath needs 4 parameters ```EventLoggingListener.getLogPath``` needs 4 input arguments: https://github.com/apache/spark/blob/v1.6.0-preview2/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L276-L280 the 3rd parameter should be appAttemptId, 4th parameter is codec... Author: Teng QiuCloses #10044 from chutium/SPARK-12053. (cherry picked from commit a8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce) Signed-off-by: Kousuke Saruta Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9b621617 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9b621617 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9b621617 Branch: refs/heads/branch-1.5 Commit: 9b62161715950c1b76385cf2c2e2e46213fba437 Parents: 7900d19 Author: Teng Qiu Authored: Tue Dec 1 07:27:32 2015 +0900 Committer: Kousuke Saruta Committed: Tue Dec 1 07:28:22 2015 +0900 -- core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9b621617/core/src/main/scala/org/apache/spark/deploy/master/Master.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala index 26904d3..c9b1c04 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala @@ -925,7 +925,7 @@ private[deploy] class Master( } val eventLogFilePrefix = EventLoggingListener.getLogPath( - eventLogDir, app.id, app.desc.eventLogCodec) + eventLogDir, app.id, appAttemptId = None, compressionCodecName = app.desc.eventLogCodec) val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf) val inProgressExists = fs.exists(new Path(eventLogFilePrefix + EventLoggingListener.IN_PROGRESS)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
Repository: spark Updated Branches: refs/heads/branch-1.6 a8c6d8acc -> 1562ef10f [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao YangCloses #9974 from hhbyyh/ldaMLExample. (cherry picked from commit e232720a65dfb9ae6135cbb7674e35eddd88d625) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1562ef10 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1562ef10 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1562ef10 Branch: refs/heads/branch-1.6 Commit: 1562ef10f5d1722a6c275726083684e6d0463a4f Parents: a8c6d8a Author: Yuhao Yang Authored: Mon Nov 30 14:56:51 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 14:56:58 2015 -0800 -- docs/ml-clustering.md | 31 +++ docs/ml-guide.md| 3 +- docs/mllib-guide.md | 1 + .../spark/examples/ml/JavaLDAExample.java | 97 .../apache/spark/examples/ml/LDAExample.scala | 77 5 files changed, 208 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/docs/ml-clustering.md -- diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md new file mode 100644 index 000..cfefb5d --- /dev/null +++ b/docs/ml-clustering.md @@ -0,0 +1,31 @@ +--- +layout: global +title: Clustering - ML +displayTitle: ML - Clustering +--- + +In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html). + +## Latent Dirichlet allocation (LDA) + +`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, +and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by +`EMLDAOptimizer` to a `DistributedLDAModel` if needed. + + + + + +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details. + +{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %} + + + + +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details. + +{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %} + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index be18a05..6f35b30 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g., provide class probabilities, and linear models provide model summaries. * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision Trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) @@ -950,4 +951,4 @@ model.transform(test) {% endhighlight %} - + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 91e50cc..54e35fc 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -69,6 +69,7 @@ We list major functionality from both below, with links to detailed guides. concepts. It also contains sections on using algorithms within the Pipelines API, for example: * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/1562ef10/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java new
spark git commit: fix Maven build
Repository: spark Updated Branches: refs/heads/branch-1.6 ebf87ebc0 -> 436151780 fix Maven build Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43615178 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43615178 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43615178 Branch: refs/heads/branch-1.6 Commit: 436151780fa6ba2c318ad725c43fcd8a4dcbf00b Parents: ebf87eb Author: Davies LiuAuthored: Mon Nov 30 15:42:10 2015 -0800 Committer: Davies Liu Committed: Mon Nov 30 15:45:29 2015 -0800 -- .../src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/43615178/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala index 507641f..a781777 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala @@ -43,7 +43,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ * populated by the query planning infrastructure. */ @transient - protected[spark] final val sqlContext = SQLContext.getActive().get + protected[spark] final val sqlContext = SQLContext.getActive().getOrElse(null) protected def sparkContext = sqlContext.sparkContext - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)
Repository: spark Updated Branches: refs/heads/master e232720a6 -> de64b65f7 [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python) Remove duplicate mllib example (DT/RF/GBT in Java/Python). Since we have tutorial code for DT/RF/GBT classification/regression in Scala/Java/Python and example applications for DT/RF/GBT in Scala, so we mark these as duplicated and remove them. mengxr Author: Yanbo LiangCloses #9954 from yanboliang/SPARK-11975. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/de64b65f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/de64b65f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/de64b65f Branch: refs/heads/master Commit: de64b65f7cf2ac58c1abc310ba547637fdbb8557 Parents: e232720 Author: Yanbo Liang Authored: Mon Nov 30 15:01:08 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 15:01:08 2015 -0800 -- .../spark/examples/mllib/JavaDecisionTree.java | 116 --- .../mllib/JavaGradientBoostedTreesRunner.java | 126 .../examples/mllib/JavaRandomForestExample.java | 139 -- .../main/python/mllib/decision_tree_runner.py | 144 --- .../main/python/mllib/gradient_boosted_trees.py | 77 -- .../main/python/mllib/random_forest_example.py | 90 6 files changed, 692 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/de64b65f/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java b/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java deleted file mode 100644 index 1f82e3f..000 --- a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java +++ /dev/null @@ -1,116 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.examples.mllib; - -import java.util.HashMap; - -import scala.Tuple2; - -import org.apache.spark.api.java.function.Function2; -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.api.java.JavaSparkContext; -import org.apache.spark.api.java.function.Function; -import org.apache.spark.api.java.function.PairFunction; -import org.apache.spark.mllib.regression.LabeledPoint; -import org.apache.spark.mllib.tree.DecisionTree; -import org.apache.spark.mllib.tree.model.DecisionTreeModel; -import org.apache.spark.mllib.util.MLUtils; -import org.apache.spark.SparkConf; - -/** - * Classification and regression using decision trees. - */ -public final class JavaDecisionTree { - - public static void main(String[] args) { -String datapath = "data/mllib/sample_libsvm_data.txt"; -if (args.length == 1) { - datapath = args[0]; -} else if (args.length > 1) { - System.err.println("Usage: JavaDecisionTree "); - System.exit(1); -} -SparkConf sparkConf = new SparkConf().setAppName("JavaDecisionTree"); -JavaSparkContext sc = new JavaSparkContext(sparkConf); - -JavaRDD data = MLUtils.loadLibSVMFile(sc.sc(), datapath).toJavaRDD().cache(); - -// Compute the number of classes from the data. -Integer numClasses = data.map(new Function () { - @Override public Double call(LabeledPoint p) { -return p.label(); - } -}).countByValue().size(); - -// Set parameters. -// Empty categoricalFeaturesInfo indicates all features are continuous. -HashMap categoricalFeaturesInfo = new HashMap (); -String impurity = "gini"; -Integer maxDepth = 5; -Integer maxBins = 32; - -// Train a DecisionTree model for classification. -final DecisionTreeModel model = DecisionTree.trainClassifier(data, numClasses, - categoricalFeaturesInfo, impurity, maxDepth, maxBins); - -//
spark git commit: [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)
Repository: spark Updated Branches: refs/heads/branch-1.6 1562ef10f -> a387cef3a [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python) Remove duplicate mllib example (DT/RF/GBT in Java/Python). Since we have tutorial code for DT/RF/GBT classification/regression in Scala/Java/Python and example applications for DT/RF/GBT in Scala, so we mark these as duplicated and remove them. mengxr Author: Yanbo LiangCloses #9954 from yanboliang/SPARK-11975. (cherry picked from commit de64b65f7cf2ac58c1abc310ba547637fdbb8557) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a387cef3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a387cef3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a387cef3 Branch: refs/heads/branch-1.6 Commit: a387cef3a40d47a8ca7fa9c6aa2842318700df49 Parents: 1562ef1 Author: Yanbo Liang Authored: Mon Nov 30 15:01:08 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 15:01:16 2015 -0800 -- .../spark/examples/mllib/JavaDecisionTree.java | 116 --- .../mllib/JavaGradientBoostedTreesRunner.java | 126 .../examples/mllib/JavaRandomForestExample.java | 139 -- .../main/python/mllib/decision_tree_runner.py | 144 --- .../main/python/mllib/gradient_boosted_trees.py | 77 -- .../main/python/mllib/random_forest_example.py | 90 6 files changed, 692 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a387cef3/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java b/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java deleted file mode 100644 index 1f82e3f..000 --- a/examples/src/main/java/org/apache/spark/examples/mllib/JavaDecisionTree.java +++ /dev/null @@ -1,116 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.examples.mllib; - -import java.util.HashMap; - -import scala.Tuple2; - -import org.apache.spark.api.java.function.Function2; -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.api.java.JavaSparkContext; -import org.apache.spark.api.java.function.Function; -import org.apache.spark.api.java.function.PairFunction; -import org.apache.spark.mllib.regression.LabeledPoint; -import org.apache.spark.mllib.tree.DecisionTree; -import org.apache.spark.mllib.tree.model.DecisionTreeModel; -import org.apache.spark.mllib.util.MLUtils; -import org.apache.spark.SparkConf; - -/** - * Classification and regression using decision trees. - */ -public final class JavaDecisionTree { - - public static void main(String[] args) { -String datapath = "data/mllib/sample_libsvm_data.txt"; -if (args.length == 1) { - datapath = args[0]; -} else if (args.length > 1) { - System.err.println("Usage: JavaDecisionTree "); - System.exit(1); -} -SparkConf sparkConf = new SparkConf().setAppName("JavaDecisionTree"); -JavaSparkContext sc = new JavaSparkContext(sparkConf); - -JavaRDD data = MLUtils.loadLibSVMFile(sc.sc(), datapath).toJavaRDD().cache(); - -// Compute the number of classes from the data. -Integer numClasses = data.map(new Function () { - @Override public Double call(LabeledPoint p) { -return p.label(); - } -}).countByValue().size(); - -// Set parameters. -// Empty categoricalFeaturesInfo indicates all features are continuous. -HashMap categoricalFeaturesInfo = new HashMap (); -String impurity = "gini"; -Integer maxDepth = 5; -Integer maxBins = 32; - -// Train a DecisionTree model for classification. -final DecisionTreeModel
spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
Repository: spark Updated Branches: refs/heads/master a8ceec5e8 -> e232720a6 [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao YangCloses #9974 from hhbyyh/ldaMLExample. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e232720a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e232720a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e232720a Branch: refs/heads/master Commit: e232720a65dfb9ae6135cbb7674e35eddd88d625 Parents: a8ceec5 Author: Yuhao Yang Authored: Mon Nov 30 14:56:51 2015 -0800 Committer: Xiangrui Meng Committed: Mon Nov 30 14:56:51 2015 -0800 -- docs/ml-clustering.md | 31 +++ docs/ml-guide.md| 3 +- docs/mllib-guide.md | 1 + .../spark/examples/ml/JavaLDAExample.java | 97 .../apache/spark/examples/ml/LDAExample.scala | 77 5 files changed, 208 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/docs/ml-clustering.md -- diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md new file mode 100644 index 000..cfefb5d --- /dev/null +++ b/docs/ml-clustering.md @@ -0,0 +1,31 @@ +--- +layout: global +title: Clustering - ML +displayTitle: ML - Clustering +--- + +In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html). + +## Latent Dirichlet allocation (LDA) + +`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, +and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by +`EMLDAOptimizer` to a `DistributedLDAModel` if needed. + + + + + +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details. + +{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %} + + + + +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details. + +{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %} + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index be18a05..6f35b30 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g., provide class probabilities, and linear models provide model summaries. * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision Trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) @@ -950,4 +951,4 @@ model.transform(test) {% endhighlight %} - + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 91e50cc..54e35fc 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -69,6 +69,7 @@ We list major functionality from both below, with links to detailed guides. concepts. It also contains sections on using algorithms within the Pipelines API, for example: * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/e232720a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java new file mode 100644 index 000..3a5d323 --- /dev/null +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
spark git commit: fix Maven build
Repository: spark Updated Branches: refs/heads/master 553588893 -> ecc00ec3f fix Maven build Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ecc00ec3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ecc00ec3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ecc00ec3 Branch: refs/heads/master Commit: ecc00ec3faf09b0e21bc4b468cb45e45d05cf482 Parents: 5535888 Author: Davies LiuAuthored: Mon Nov 30 15:42:10 2015 -0800 Committer: Davies Liu Committed: Mon Nov 30 15:42:10 2015 -0800 -- .../src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ecc00ec3/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala index 507641f..a781777 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala @@ -43,7 +43,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ * populated by the query planning infrastructure. */ @transient - protected[spark] final val sqlContext = SQLContext.getActive().get + protected[spark] final val sqlContext = SQLContext.getActive().getOrElse(null) protected def sparkContext = sqlContext.sparkContext - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12000] Fix API doc generation issues
Repository: spark Updated Branches: refs/heads/master edb26e7f4 -> d3ca8cfac [SPARK-12000] Fix API doc generation issues This pull request fixes multiple issues with API doc generation. - Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures. - Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation. - Incorporate doc changes from thunterdb (in #10048). Closes #10048. Author: Josh RosenAuthor: Timothy Hunter Closes #10049 from JoshRosen/fix-doc-build. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3ca8cfa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3ca8cfa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3ca8cfa Branch: refs/heads/master Commit: d3ca8cfac286ae19f8bedc736877ea9d0a0a072c Parents: edb26e7 Author: Josh Rosen Authored: Mon Nov 30 16:37:27 2015 -0800 Committer: Josh Rosen Committed: Mon Nov 30 16:37:27 2015 -0800 -- docs/_plugins/copy_api_dirs.rb | 6 +++--- .../org/apache/spark/network/client/StreamCallback.java | 4 ++-- .../java/org/apache/spark/network/server/RpcHandler.java | 2 +- project/SparkBuild.scala | 11 --- 4 files changed, 14 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d3ca8cfa/docs/_plugins/copy_api_dirs.rb -- diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb index 01718d9..f2f3e2e 100644 --- a/docs/_plugins/copy_api_dirs.rb +++ b/docs/_plugins/copy_api_dirs.rb @@ -27,7 +27,7 @@ if not (ENV['SKIP_API'] == '1') cd("..") puts "Running 'build/sbt -Pkinesis-asl clean compile unidoc' from " + pwd + "; this may take a few minutes..." -puts `build/sbt -Pkinesis-asl clean compile unidoc` +system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc generation failed") puts "Moving back into docs dir." cd("docs") @@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1') puts "Moving to python/docs directory and building sphinx." cd("../python/docs") - puts `make html` + system(make html) || raise("Python doc generation failed") puts "Moving back into home dir." cd("../../") @@ -131,7 +131,7 @@ if not (ENV['SKIP_API'] == '1') # Build SparkR API docs puts "Moving to R directory and building roxygen docs." cd("R") - puts `./create-docs.sh` + system("./create-docs.sh") || raise("R doc generation failed") puts "Moving back into home dir." cd("../") http://git-wip-us.apache.org/repos/asf/spark/blob/d3ca8cfa/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java -- diff --git a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java index 093fada..51d34ca 100644 --- a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java +++ b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java @@ -21,8 +21,8 @@ import java.io.IOException; import java.nio.ByteBuffer; /** - * Callback for streaming data. Stream data will be offered to the {@link onData(ByteBuffer)} - * method as it arrives. Once all the stream data is received, {@link onComplete()} will be + * Callback for streaming data. Stream data will be offered to the {@link onData(String, ByteBuffer)} + * method as it arrives. Once all the stream data is received, {@link onComplete(String)} will be * called. * * The network library guarantees that a single thread will call these methods at a time, but http://git-wip-us.apache.org/repos/asf/spark/blob/d3ca8cfa/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java -- diff --git a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java index 65109dd..1a11f7b 100644 --- a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java +++ b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java @@ -55,7 +55,7 @@ public abstract class RpcHandler { /** * Receives an RPC message that does not expect a reply. The default implementation will - * call "{@link receive(TransportClient, byte[],
spark git commit: [SPARK-12000] Fix API doc generation issues
Repository: spark Updated Branches: refs/heads/branch-1.6 436151780 -> 43ffa0373 [SPARK-12000] Fix API doc generation issues This pull request fixes multiple issues with API doc generation. - Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures. - Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation. - Incorporate doc changes from thunterdb (in #10048). Closes #10048. Author: Josh RosenAuthor: Timothy Hunter Closes #10049 from JoshRosen/fix-doc-build. (cherry picked from commit d3ca8cfac286ae19f8bedc736877ea9d0a0a072c) Signed-off-by: Josh Rosen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43ffa037 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43ffa037 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43ffa037 Branch: refs/heads/branch-1.6 Commit: 43ffa03738d1ffdf99604ac18de137c60c930550 Parents: 4361517 Author: Josh Rosen Authored: Mon Nov 30 16:37:27 2015 -0800 Committer: Josh Rosen Committed: Mon Nov 30 16:37:53 2015 -0800 -- docs/_plugins/copy_api_dirs.rb | 6 +++--- .../org/apache/spark/network/client/StreamCallback.java | 4 ++-- .../java/org/apache/spark/network/server/RpcHandler.java | 2 +- project/SparkBuild.scala | 11 --- 4 files changed, 14 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/43ffa037/docs/_plugins/copy_api_dirs.rb -- diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb index 01718d9..f2f3e2e 100644 --- a/docs/_plugins/copy_api_dirs.rb +++ b/docs/_plugins/copy_api_dirs.rb @@ -27,7 +27,7 @@ if not (ENV['SKIP_API'] == '1') cd("..") puts "Running 'build/sbt -Pkinesis-asl clean compile unidoc' from " + pwd + "; this may take a few minutes..." -puts `build/sbt -Pkinesis-asl clean compile unidoc` +system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc generation failed") puts "Moving back into docs dir." cd("docs") @@ -117,7 +117,7 @@ if not (ENV['SKIP_API'] == '1') puts "Moving to python/docs directory and building sphinx." cd("../python/docs") - puts `make html` + system(make html) || raise("Python doc generation failed") puts "Moving back into home dir." cd("../../") @@ -131,7 +131,7 @@ if not (ENV['SKIP_API'] == '1') # Build SparkR API docs puts "Moving to R directory and building roxygen docs." cd("R") - puts `./create-docs.sh` + system("./create-docs.sh") || raise("R doc generation failed") puts "Moving back into home dir." cd("../") http://git-wip-us.apache.org/repos/asf/spark/blob/43ffa037/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java -- diff --git a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java index 093fada..51d34ca 100644 --- a/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java +++ b/network/common/src/main/java/org/apache/spark/network/client/StreamCallback.java @@ -21,8 +21,8 @@ import java.io.IOException; import java.nio.ByteBuffer; /** - * Callback for streaming data. Stream data will be offered to the {@link onData(ByteBuffer)} - * method as it arrives. Once all the stream data is received, {@link onComplete()} will be + * Callback for streaming data. Stream data will be offered to the {@link onData(String, ByteBuffer)} + * method as it arrives. Once all the stream data is received, {@link onComplete(String)} will be * called. * * The network library guarantees that a single thread will call these methods at a time, but http://git-wip-us.apache.org/repos/asf/spark/blob/43ffa037/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java -- diff --git a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java index 65109dd..1a11f7b 100644 --- a/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java +++ b/network/common/src/main/java/org/apache/spark/network/server/RpcHandler.java @@ -55,7 +55,7 @@ public abstract class RpcHandler { /** * Receives an RPC
spark git commit: [SPARK-12058][HOTFIX] Disable KinesisStreamTests
Repository: spark Updated Branches: refs/heads/master ecc00ec3f -> edb26e7f4 [SPARK-12058][HOTFIX] Disable KinesisStreamTests KinesisStreamTests in test.py is broken because of #9403. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46896/testReport/(root)/KinesisStreamTests/test_kinesis_stream/ Because Streaming Python didnât work when merging https://github.com/apache/spark/pull/9403, the PR build didnât report the Python test failure actually. This PR just disabled the test to unblock #10039 Author: Shixiong ZhuCloses #10047 from zsxwing/disable-python-kinesis-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/edb26e7f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/edb26e7f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/edb26e7f Branch: refs/heads/master Commit: edb26e7f4e1164645971c9a139eb29ddec8acc5d Parents: ecc00ec Author: Shixiong Zhu Authored: Mon Nov 30 16:31:59 2015 -0800 Committer: Shixiong Zhu Committed: Mon Nov 30 16:31:59 2015 -0800 -- python/pyspark/streaming/tests.py | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/edb26e7f/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py b/python/pyspark/streaming/tests.py index d380d69..a647e6b 100644 --- a/python/pyspark/streaming/tests.py +++ b/python/pyspark/streaming/tests.py @@ -1409,6 +1409,7 @@ class KinesisStreamTests(PySparkStreamingTestCase): InitialPositionInStream.LATEST, 2, StorageLevel.MEMORY_AND_DISK_2, "awsAccessKey", "awsSecretKey") +@unittest.skip("Enable it when we fix SPAKR-12058") def test_kinesis_stream(self): if not are_kinesis_tests_enabled: sys.stderr.write( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12018][SQL] Refactor common subexpression elimination code
Repository: spark Updated Branches: refs/heads/master f73379be2 -> 9693b0d5a [SPARK-12018][SQL] Refactor common subexpression elimination code JIRA: https://issues.apache.org/jira/browse/SPARK-12018 The code of common subexpression elimination can be factored and simplified. Some unnecessary variables can be removed. Author: Liang-Chi HsiehCloses #10009 from viirya/refactor-subexpr-eliminate. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9693b0d5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9693b0d5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9693b0d5 Branch: refs/heads/master Commit: 9693b0d5a55bc1d2da96f04fe2c6de59a8dfcc1b Parents: f73379b Author: Liang-Chi Hsieh Authored: Mon Nov 30 20:56:42 2015 -0800 Committer: Michael Armbrust Committed: Mon Nov 30 20:56:42 2015 -0800 -- .../sql/catalyst/expressions/Expression.scala | 10 ++ .../expressions/codegen/CodeGenerator.scala | 34 ++-- .../codegen/GenerateUnsafeProjection.scala | 4 +-- 3 files changed, 14 insertions(+), 34 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9693b0d5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala index 169435a..b55d365 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala @@ -94,13 +94,9 @@ abstract class Expression extends TreeNode[Expression] { def gen(ctx: CodeGenContext): GeneratedExpressionCode = { ctx.subExprEliminationExprs.get(this).map { subExprState => // This expression is repeated meaning the code to evaluated has already been added - // as a function, `subExprState.fnName`. Just call that. - val code = -s""" - |/* $this */ - |${subExprState.fnName}(${ctx.INPUT_ROW}); - """.stripMargin.trim - GeneratedExpressionCode(code, subExprState.code.isNull, subExprState.code.value) + // as a function and called in advance. Just use it. + val code = s"/* $this */" + GeneratedExpressionCode(code, subExprState.isNull, subExprState.value) }.getOrElse { val isNull = ctx.freshName("isNull") val primitive = ctx.freshName("primitive") http://git-wip-us.apache.org/repos/asf/spark/blob/9693b0d5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 2f3d6ae..440c7d2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -104,16 +104,13 @@ class CodeGenContext { val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions // State used for subexpression elimination. - case class SubExprEliminationState( - isLoaded: String, - code: GeneratedExpressionCode, - fnName: String) + case class SubExprEliminationState(isNull: String, value: String) // Foreach expression that is participating in subexpression elimination, the state to use. val subExprEliminationExprs = mutable.HashMap.empty[Expression, SubExprEliminationState] - // The collection of isLoaded variables that need to be reset on each row. - val subExprIsLoadedVariables = mutable.ArrayBuffer.empty[String] + // The collection of sub-exression result resetting methods that need to be called on each row. + val subExprResetVariables = mutable.ArrayBuffer.empty[String] final val JAVA_BOOLEAN = "boolean" final val JAVA_BYTE = "byte" @@ -408,7 +405,6 @@ class CodeGenContext { val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) commonExprs.foreach(e => { val expr = e.head - val isLoaded = freshName("isLoaded") val isNull = freshName("isNull") val value = freshName("value") val fnName = freshName("evalExpr") @@ -417,18 +413,12 @@ class CodeGenContext { val code = expr.gen(this) val fn = s""" - |private void $fnName(InternalRow ${INPUT_ROW})
spark git commit: [SPARK-12018][SQL] Refactor common subexpression elimination code
Repository: spark Updated Branches: refs/heads/branch-1.6 86a46ce68 -> 1aa39bdb1 [SPARK-12018][SQL] Refactor common subexpression elimination code JIRA: https://issues.apache.org/jira/browse/SPARK-12018 The code of common subexpression elimination can be factored and simplified. Some unnecessary variables can be removed. Author: Liang-Chi HsiehCloses #10009 from viirya/refactor-subexpr-eliminate. (cherry picked from commit 9693b0d5a55bc1d2da96f04fe2c6de59a8dfcc1b) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1aa39bdb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1aa39bdb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1aa39bdb Branch: refs/heads/branch-1.6 Commit: 1aa39bdb136ace5dcd4f7fdf62b8f324c14261f2 Parents: 86a46ce Author: Liang-Chi Hsieh Authored: Mon Nov 30 20:56:42 2015 -0800 Committer: Michael Armbrust Committed: Mon Nov 30 20:56:54 2015 -0800 -- .../sql/catalyst/expressions/Expression.scala | 10 ++ .../expressions/codegen/CodeGenerator.scala | 34 ++-- .../codegen/GenerateUnsafeProjection.scala | 4 +-- 3 files changed, 14 insertions(+), 34 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1aa39bdb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala index 169435a..b55d365 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala @@ -94,13 +94,9 @@ abstract class Expression extends TreeNode[Expression] { def gen(ctx: CodeGenContext): GeneratedExpressionCode = { ctx.subExprEliminationExprs.get(this).map { subExprState => // This expression is repeated meaning the code to evaluated has already been added - // as a function, `subExprState.fnName`. Just call that. - val code = -s""" - |/* $this */ - |${subExprState.fnName}(${ctx.INPUT_ROW}); - """.stripMargin.trim - GeneratedExpressionCode(code, subExprState.code.isNull, subExprState.code.value) + // as a function and called in advance. Just use it. + val code = s"/* $this */" + GeneratedExpressionCode(code, subExprState.isNull, subExprState.value) }.getOrElse { val isNull = ctx.freshName("isNull") val primitive = ctx.freshName("primitive") http://git-wip-us.apache.org/repos/asf/spark/blob/1aa39bdb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 2f3d6ae..440c7d2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -104,16 +104,13 @@ class CodeGenContext { val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions // State used for subexpression elimination. - case class SubExprEliminationState( - isLoaded: String, - code: GeneratedExpressionCode, - fnName: String) + case class SubExprEliminationState(isNull: String, value: String) // Foreach expression that is participating in subexpression elimination, the state to use. val subExprEliminationExprs = mutable.HashMap.empty[Expression, SubExprEliminationState] - // The collection of isLoaded variables that need to be reset on each row. - val subExprIsLoadedVariables = mutable.ArrayBuffer.empty[String] + // The collection of sub-exression result resetting methods that need to be called on each row. + val subExprResetVariables = mutable.ArrayBuffer.empty[String] final val JAVA_BOOLEAN = "boolean" final val JAVA_BYTE = "byte" @@ -408,7 +405,6 @@ class CodeGenContext { val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) commonExprs.foreach(e => { val expr = e.head - val isLoaded = freshName("isLoaded") val isNull = freshName("isNull") val value = freshName("value") val fnName = freshName("evalExpr") @@ -417,18 +413,12 @@ class
spark git commit: [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions
Repository: spark Updated Branches: refs/heads/master bf0e85a70 -> 2db4662fe [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions Fixes [SPARK-11989](https://issues.apache.org/jira/browse/SPARK-11989) Author: CK50Author: Christian Kurz Closes #9973 from CK50/branch-1.6_non-transactional. (cherry picked from commit a589736a1b237ef2f3bd59fbaeefe143ddcc8f4e) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2db4662f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2db4662f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2db4662f Branch: refs/heads/master Commit: 2db4662fe2d72749c06ad5f11f641a388343f77c Parents: bf0e85a Author: CK50 Authored: Mon Nov 30 20:08:49 2015 +0800 Committer: Reynold Xin Committed: Mon Nov 30 20:09:05 2015 +0800 -- .../execution/datasources/jdbc/JdbcUtils.scala | 22 +--- 1 file changed, 19 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2db4662f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index 7375a5c..252f1cf 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -21,6 +21,7 @@ import java.sql.{Connection, PreparedStatement} import java.util.Properties import scala.util.Try +import scala.util.control.NonFatal import org.apache.spark.Logging import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcType, JdbcDialects} @@ -125,8 +126,19 @@ object JdbcUtils extends Logging { dialect: JdbcDialect): Iterator[Byte] = { val conn = getConnection() var committed = false +val supportsTransactions = try { + conn.getMetaData().supportsDataManipulationTransactionsOnly() || + conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions() +} catch { + case NonFatal(e) => +logWarning("Exception while detecting transaction support", e) +true +} + try { - conn.setAutoCommit(false) // Everything in the same db transaction. + if (supportsTransactions) { +conn.setAutoCommit(false) // Everything in the same db transaction. + } val stmt = insertStatement(conn, table, rddSchema) try { var rowCount = 0 @@ -175,14 +187,18 @@ object JdbcUtils extends Logging { } finally { stmt.close() } - conn.commit() + if (supportsTransactions) { +conn.commit() + } committed = true } finally { if (!committed) { // The stage must fail. We got here through an exception path, so // let the exception through unless rollback() or close() want to // tell the user about another problem. -conn.rollback() +if (supportsTransactions) { + conn.rollback() +} conn.close() } else { // The stage must succeed. We cannot propagate any exception close() might throw. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions
Repository: spark Updated Branches: refs/heads/branch-1.6 33cd171b2 -> a589736a1 [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions Fixes [SPARK-11989](https://issues.apache.org/jira/browse/SPARK-11989) Author: CK50Author: Christian Kurz Closes #9973 from CK50/branch-1.6_non-transactional. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a589736a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a589736a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a589736a Branch: refs/heads/branch-1.6 Commit: a589736a1b237ef2f3bd59fbaeefe143ddcc8f4e Parents: 33cd171 Author: CK50 Authored: Mon Nov 30 20:08:49 2015 +0800 Committer: Reynold Xin Committed: Mon Nov 30 20:08:49 2015 +0800 -- .../execution/datasources/jdbc/JdbcUtils.scala | 22 +--- 1 file changed, 19 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a589736a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index 7375a5c..252f1cf 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -21,6 +21,7 @@ import java.sql.{Connection, PreparedStatement} import java.util.Properties import scala.util.Try +import scala.util.control.NonFatal import org.apache.spark.Logging import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcType, JdbcDialects} @@ -125,8 +126,19 @@ object JdbcUtils extends Logging { dialect: JdbcDialect): Iterator[Byte] = { val conn = getConnection() var committed = false +val supportsTransactions = try { + conn.getMetaData().supportsDataManipulationTransactionsOnly() || + conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions() +} catch { + case NonFatal(e) => +logWarning("Exception while detecting transaction support", e) +true +} + try { - conn.setAutoCommit(false) // Everything in the same db transaction. + if (supportsTransactions) { +conn.setAutoCommit(false) // Everything in the same db transaction. + } val stmt = insertStatement(conn, table, rddSchema) try { var rowCount = 0 @@ -175,14 +187,18 @@ object JdbcUtils extends Logging { } finally { stmt.close() } - conn.commit() + if (supportsTransactions) { +conn.commit() + } committed = true } finally { if (!committed) { // The stage must fail. We got here through an exception path, so // let the exception through unless rollback() or close() want to // tell the user about another problem. -conn.rollback() +if (supportsTransactions) { + conn.rollback() +} conn.close() } else { // The stage must succeed. We cannot propagate any exception close() might throw. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [DOC] Explicitly state that top maintains the order of elements
Repository: spark Updated Branches: refs/heads/branch-1.6 e9653921e -> aaf835f1d [DOC] Explicitly state that top maintains the order of elements Top is implemented in terms of takeOrdered, which already maintains the order, so top should, too. Author: Wieland HoffmannCloses #10013 from mineo/top-order. (cherry picked from commit 26c3581f17f475fab2f3b5301b8f253ff2fa6438) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aaf835f1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aaf835f1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aaf835f1 Branch: refs/heads/branch-1.6 Commit: aaf835f1d67470d859fae3c0cc3143e4ccaaf3d0 Parents: e965392 Author: Wieland Hoffmann Authored: Mon Nov 30 09:32:48 2015 + Committer: Sean Owen Committed: Mon Nov 30 09:32:58 2015 + -- core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala | 4 ++-- core/src/main/scala/org/apache/spark/rdd/RDD.scala | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/aaf835f1/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala index 871be0b..1e9d4f1 100644 --- a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala +++ b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala @@ -556,7 +556,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable { /** * Returns the top k (largest) elements from this RDD as defined by - * the specified Comparator[T]. + * the specified Comparator[T] and maintains the order. * @param num k, the number of top elements to return * @param comp the comparator that defines the order * @return an array of top elements @@ -567,7 +567,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable { /** * Returns the top k (largest) elements from this RDD using the - * natural ordering for T. + * natural ordering for T and maintains the order. * @param num k, the number of top elements to return * @return an array of top elements */ http://git-wip-us.apache.org/repos/asf/spark/blob/aaf835f1/core/src/main/scala/org/apache/spark/rdd/RDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala b/core/src/main/scala/org/apache/spark/rdd/RDD.scala index 2aeb5ee..8b3731d 100644 --- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala @@ -1327,7 +1327,8 @@ abstract class RDD[T: ClassTag]( /** * Returns the top k (largest) elements from this RDD as defined by the specified - * implicit Ordering[T]. This does the opposite of [[takeOrdered]]. For example: + * implicit Ordering[T] and maintains the ordering. This does the opposite of + * [[takeOrdered]]. For example: * {{{ * sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1) * // returns Array(12) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [DOC] Explicitly state that top maintains the order of elements
Repository: spark Updated Branches: refs/heads/master 953e8e6dc -> 26c3581f1 [DOC] Explicitly state that top maintains the order of elements Top is implemented in terms of takeOrdered, which already maintains the order, so top should, too. Author: Wieland HoffmannCloses #10013 from mineo/top-order. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/26c3581f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/26c3581f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/26c3581f Branch: refs/heads/master Commit: 26c3581f17f475fab2f3b5301b8f253ff2fa6438 Parents: 953e8e6 Author: Wieland Hoffmann Authored: Mon Nov 30 09:32:48 2015 + Committer: Sean Owen Committed: Mon Nov 30 09:32:48 2015 + -- core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala | 4 ++-- core/src/main/scala/org/apache/spark/rdd/RDD.scala | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/26c3581f/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala index 871be0b..1e9d4f1 100644 --- a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala +++ b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala @@ -556,7 +556,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable { /** * Returns the top k (largest) elements from this RDD as defined by - * the specified Comparator[T]. + * the specified Comparator[T] and maintains the order. * @param num k, the number of top elements to return * @param comp the comparator that defines the order * @return an array of top elements @@ -567,7 +567,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable { /** * Returns the top k (largest) elements from this RDD using the - * natural ordering for T. + * natural ordering for T and maintains the order. * @param num k, the number of top elements to return * @return an array of top elements */ http://git-wip-us.apache.org/repos/asf/spark/blob/26c3581f/core/src/main/scala/org/apache/spark/rdd/RDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala b/core/src/main/scala/org/apache/spark/rdd/RDD.scala index 2aeb5ee..8b3731d 100644 --- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala @@ -1327,7 +1327,8 @@ abstract class RDD[T: ClassTag]( /** * Returns the top k (largest) elements from this RDD as defined by the specified - * implicit Ordering[T]. This does the opposite of [[takeOrdered]]. For example: + * implicit Ordering[T] and maintains the ordering. This does the opposite of + * [[takeOrdered]]. For example: * {{{ * sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1) * // returns Array(12) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only.
Repository: spark Updated Branches: refs/heads/master e07494420 -> 953e8e6dc [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only. Author: Prashant SharmaCloses #10012 from ScrapCodes/minor-build-comment. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/953e8e6d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/953e8e6d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/953e8e6d Branch: refs/heads/master Commit: 953e8e6dcb32cd88005834e9c3720740e201826c Parents: e074944 Author: Prashant Sharma Authored: Mon Nov 30 09:30:58 2015 + Committer: Sean Owen Committed: Mon Nov 30 09:30:58 2015 + -- project/project/SparkPluginBuild.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/953e8e6d/project/project/SparkPluginBuild.scala -- diff --git a/project/project/SparkPluginBuild.scala b/project/project/SparkPluginBuild.scala index 471d00b..cbb88dc 100644 --- a/project/project/SparkPluginBuild.scala +++ b/project/project/SparkPluginBuild.scala @@ -19,9 +19,8 @@ import sbt._ import sbt.Keys._ /** - * This plugin project is there to define new scala style rules for spark. This is - * a plugin project so that this gets compiled first and is put on the classpath and - * becomes available for scalastyle sbt plugin. + * This plugin project is there because we use our custom fork of sbt-pom-reader plugin. This is + * a plugin project so that this gets compiled first and is available on the classpath for SBT build. */ object SparkPluginDef extends Build { lazy val root = Project("plugins", file(".")) dependsOn(sbtPomReader) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only.
Repository: spark Updated Branches: refs/heads/branch-1.6 12d97b0c5 -> e9653921e [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only. Author: Prashant SharmaCloses #10012 from ScrapCodes/minor-build-comment. (cherry picked from commit 953e8e6dcb32cd88005834e9c3720740e201826c) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9653921 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9653921 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9653921 Branch: refs/heads/branch-1.6 Commit: e9653921e84092f36988e3d73618c58b0c10e938 Parents: 12d97b0 Author: Prashant Sharma Authored: Mon Nov 30 09:30:58 2015 + Committer: Sean Owen Committed: Mon Nov 30 09:31:09 2015 + -- project/project/SparkPluginBuild.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e9653921/project/project/SparkPluginBuild.scala -- diff --git a/project/project/SparkPluginBuild.scala b/project/project/SparkPluginBuild.scala index 471d00b..cbb88dc 100644 --- a/project/project/SparkPluginBuild.scala +++ b/project/project/SparkPluginBuild.scala @@ -19,9 +19,8 @@ import sbt._ import sbt.Keys._ /** - * This plugin project is there to define new scala style rules for spark. This is - * a plugin project so that this gets compiled first and is put on the classpath and - * becomes available for scalastyle sbt plugin. + * This plugin project is there because we use our custom fork of sbt-pom-reader plugin. This is + * a plugin project so that this gets compiled first and is available on the classpath for SBT build. */ object SparkPluginDef extends Build { lazy val root = Project("plugins", file(".")) dependsOn(sbtPomReader) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org