spark git commit: Fixing a few basic typos in the Programming Guide.
Repository: spark Updated Branches: refs/heads/branch-1.4 a1d896b85 - 0748263a2 Fixing a few basic typos in the Programming Guide. Just a few minor fixes in the guide, so a new JIRA issue was not created per the guidelines. Author: Mike Dusenberry dusenberr...@gmail.com Closes #6240 from dusenberrymw/Fix_Programming_Guide_Typos and squashes the following commits: ffa76eb [Mike Dusenberry] Fixing a few basic typos in the Programming Guide. (cherry picked from commit 61f164d3fdd1c8dcdba8c9d66df05ff4069aa6e6) Signed-off-by: Sean Owen so...@cloudera.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0748263a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0748263a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0748263a Branch: refs/heads/branch-1.4 Commit: 0748263a2e36e9aef172808e3b6208a1f4d4fdb8 Parents: a1d896b Author: Mike Dusenberry dusenberr...@gmail.com Authored: Tue May 19 08:59:45 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Tue May 19 09:00:19 2015 +0100 -- docs/programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0748263a/docs/programming-guide.md -- diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 2781651..0c27376 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -1071,7 +1071,7 @@ for details. /tr tr td bsaveAsSequenceFile/b(ipath/i) br / (Java and Scala) /td - td Write the elements of the dataset as a Hadoop SequenceFile in a given path in the local filesystem, HDFS or any other Hadoop-supported file system. This is available on RDDs of key-value pairs that either implement Hadoop's Writable interface. In Scala, it is also + td Write the elements of the dataset as a Hadoop SequenceFile in a given path in the local filesystem, HDFS or any other Hadoop-supported file system. This is available on RDDs of key-value pairs that implement Hadoop's Writable interface. In Scala, it is also available on types that are implicitly convertible to Writable (Spark includes conversions for basic types like Int, Double, String, etc). /td /tr tr @@ -1122,7 +1122,7 @@ ordered data following shuffle then it's possible to use: * `sortBy` to make a globally ordered RDD Operations which can cause a shuffle include **repartition** operations like -[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), **'ByKey** operations +[`repartition`](#RepartitionLink) and [`coalesce`](#CoalesceLink), **'ByKey** operations (except for counting) like [`groupByKey`](#GroupByLink) and [`reduceByKey`](#ReduceByLink), and **join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink). @@ -1138,7 +1138,7 @@ read the relevant sorted blocks. Certain shuffle operations can consume significant amounts of heap memory since they employ in-memory data structures to organize records before or after transferring them. Specifically, -`reduceByKey` and `aggregateByKey` create these structures on the map side and `'ByKey` operations +`reduceByKey` and `aggregateByKey` create these structures on the map side, and `'ByKey` operations generate these on the reduce side. When data does not fit in memory Spark will spill these tables to disk, incurring the additional overhead of disk I/O and increased garbage collection. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Fixing a few basic typos in the Programming Guide.
Repository: spark Updated Branches: refs/heads/master 6008ec14e - 61f164d3f Fixing a few basic typos in the Programming Guide. Just a few minor fixes in the guide, so a new JIRA issue was not created per the guidelines. Author: Mike Dusenberry dusenberr...@gmail.com Closes #6240 from dusenberrymw/Fix_Programming_Guide_Typos and squashes the following commits: ffa76eb [Mike Dusenberry] Fixing a few basic typos in the Programming Guide. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/61f164d3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/61f164d3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/61f164d3 Branch: refs/heads/master Commit: 61f164d3fdd1c8dcdba8c9d66df05ff4069aa6e6 Parents: 6008ec1 Author: Mike Dusenberry dusenberr...@gmail.com Authored: Tue May 19 08:59:45 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Tue May 19 08:59:45 2015 +0100 -- docs/programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/61f164d3/docs/programming-guide.md -- diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 2781651..0c27376 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -1071,7 +1071,7 @@ for details. /tr tr td bsaveAsSequenceFile/b(ipath/i) br / (Java and Scala) /td - td Write the elements of the dataset as a Hadoop SequenceFile in a given path in the local filesystem, HDFS or any other Hadoop-supported file system. This is available on RDDs of key-value pairs that either implement Hadoop's Writable interface. In Scala, it is also + td Write the elements of the dataset as a Hadoop SequenceFile in a given path in the local filesystem, HDFS or any other Hadoop-supported file system. This is available on RDDs of key-value pairs that implement Hadoop's Writable interface. In Scala, it is also available on types that are implicitly convertible to Writable (Spark includes conversions for basic types like Int, Double, String, etc). /td /tr tr @@ -1122,7 +1122,7 @@ ordered data following shuffle then it's possible to use: * `sortBy` to make a globally ordered RDD Operations which can cause a shuffle include **repartition** operations like -[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), **'ByKey** operations +[`repartition`](#RepartitionLink) and [`coalesce`](#CoalesceLink), **'ByKey** operations (except for counting) like [`groupByKey`](#GroupByLink) and [`reduceByKey`](#ReduceByLink), and **join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink). @@ -1138,7 +1138,7 @@ read the relevant sorted blocks. Certain shuffle operations can consume significant amounts of heap memory since they employ in-memory data structures to organize records before or after transferring them. Specifically, -`reduceByKey` and `aggregateByKey` create these structures on the map side and `'ByKey` operations +`reduceByKey` and `aggregateByKey` create these structures on the map side, and `'ByKey` operations generate these on the reduce side. When data does not fit in memory Spark will spill these tables to disk, incurring the additional overhead of disk I/O and increased garbage collection. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6
Repository: spark Updated Branches: refs/heads/branch-1.4 586ede6b3 - 31f5d53e9 [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6 This reverts commit a11c8683c76c67f45749a1b50a0912a731fd2487. For more information see: https://issues.apache.org/jira/browse/SPARK-7726 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/31f5d53e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/31f5d53e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/31f5d53e Branch: refs/heads/branch-1.4 Commit: 31f5d53e9efea3c9728a51fe65e8baa589ddfa6f Parents: 586ede6 Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 02:28:41 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 02:28:41 2015 -0700 -- pom.xml | 4 ++-- .../src/main/scala/org/apache/spark/repl/SparkIMain.scala| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/31f5d53e/pom.xml -- diff --git a/pom.xml b/pom.xml index 6f525b6..68edf03 100644 --- a/pom.xml +++ b/pom.xml @@ -1799,9 +1799,9 @@ propertynamescala-2.11/name/property /activation properties -scala.version2.11.6/scala.version +scala.version2.11.2/scala.version scala.binary.version2.11/scala.binary.version -jline.version2.12.1/jline.version +jline.version2.12/jline.version jline.groupidjline/jline.groupid /properties /profile http://git-wip-us.apache.org/repos/asf/spark/blob/31f5d53e/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala index 1cb910f..1bb62c8 100644 --- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala +++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala @@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: ScriptEngineFactory, initialSettings def apply(line: String): Result = debugging(sparse($line)) { var isIncomplete = false - currentRun.parsing.withIncompleteHandler((_, _) = isIncomplete = true) { + currentRun.reporting.withIncompleteHandler((_, _) = isIncomplete = true) { reporter.reset() val trees = newUnitParser(line).parseStats() if (reporter.hasErrors) Error - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6
Repository: spark Updated Branches: refs/heads/master 61f164d3f - 27fa88b9b [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6 This reverts commit a11c8683c76c67f45749a1b50a0912a731fd2487. For more information see: https://issues.apache.org/jira/browse/SPARK-7726 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27fa88b9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27fa88b9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27fa88b9 Branch: refs/heads/master Commit: 27fa88b9ba320cd0d95703aa3437151ba7c86f98 Parents: 61f164d Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 02:28:41 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 02:29:38 2015 -0700 -- pom.xml | 4 ++-- .../src/main/scala/org/apache/spark/repl/SparkIMain.scala| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/27fa88b9/pom.xml -- diff --git a/pom.xml b/pom.xml index c72d7cb..d903f02 100644 --- a/pom.xml +++ b/pom.xml @@ -1799,9 +1799,9 @@ propertynamescala-2.11/name/property /activation properties -scala.version2.11.6/scala.version +scala.version2.11.2/scala.version scala.binary.version2.11/scala.binary.version -jline.version2.12.1/jline.version +jline.version2.12/jline.version jline.groupidjline/jline.groupid /properties /profile http://git-wip-us.apache.org/repos/asf/spark/blob/27fa88b9/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala index 1cb910f..1bb62c8 100644 --- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala +++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala @@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: ScriptEngineFactory, initialSettings def apply(line: String): Result = debugging(sparse($line)) { var isIncomplete = false - currentRun.parsing.withIncompleteHandler((_, _) = isIncomplete = true) { + currentRun.reporting.withIncompleteHandler((_, _) = isIncomplete = true) { reporter.reset() val trees = newUnitParser(line).parseStats() if (reporter.hasErrors) Error - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: CHANGES.txt updates
Repository: spark Updated Branches: refs/heads/branch-1.4 6834d1af4 - f9f2aafbf CHANGES.txt updates Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f9f2aafb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f9f2aafb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f9f2aafb Branch: refs/heads/branch-1.4 Commit: f9f2aafbf1f208344e0efd78893d0b6c9932293c Parents: 6834d1a Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 02:32:32 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 02:32:53 2015 -0700 -- CHANGES.txt | 35 +++ 1 file changed, 35 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f9f2aafb/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 6660580..8c99404 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -3,6 +3,41 @@ Spark Change Log Release 1.4.0 + [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6 + Patrick Wendell patr...@databricks.com + 2015-05-19 02:28:41 -0700 + Commit: 31f5d53 + + Revert Preparing Spark release v1.4.0-rc1 + Patrick Wendell patr...@databricks.com + 2015-05-19 02:27:14 -0700 + Commit: 586ede6 + + Revert Preparing development version 1.4.1-SNAPSHOT + Patrick Wendell patr...@databricks.com + 2015-05-19 02:27:07 -0700 + Commit: e7309ec + + Fixing a few basic typos in the Programming Guide. + Mike Dusenberry dusenberr...@gmail.com + 2015-05-19 08:59:45 +0100 + Commit: 0748263, github.com/apache/spark/pull/6240 + + Preparing development version 1.4.1-SNAPSHOT + Patrick Wendell patr...@databricks.com + 2015-05-19 07:13:24 + + Commit: a1d896b + + Preparing Spark release v1.4.0-rc1 + Patrick Wendell patr...@databricks.com + 2015-05-19 07:13:24 + + Commit: 79fb01a + + Updating CHANGES.txt for Spark 1.4 + Patrick Wendell patr...@databricks.com + 2015-05-19 00:12:20 -0700 + Commit: 30bf333 + Revert Preparing Spark release v1.4.0-rc1 Patrick Wendell patr...@databricks.com 2015-05-19 00:10:39 -0700 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: Revert Preparing Spark release v1.4.0-rc1
Revert Preparing Spark release v1.4.0-rc1 This reverts commit 79fb01a3be07b5086134a6fe103248e9a33a9500. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/586ede6b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/586ede6b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/586ede6b Branch: refs/heads/branch-1.4 Commit: 586ede6b32790dc15b6111836bd5955c61b53bac Parents: e7309ec Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 02:27:14 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 02:27:14 2015 -0700 -- assembly/pom.xml | 2 +- bagel/pom.xml | 2 +- core/pom.xml | 2 +- examples/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-assembly/pom.xml | 2 +- external/kafka/pom.xml| 2 +- external/mqtt/pom.xml | 2 +- external/twitter/pom.xml | 2 +- external/zeromq/pom.xml | 2 +- extras/java8-tests/pom.xml| 2 +- extras/kinesis-asl/pom.xml| 2 +- extras/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib/pom.xml | 2 +- network/common/pom.xml| 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml | 2 +- pom.xml | 2 +- repl/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- unsafe/pom.xml| 2 +- yarn/pom.xml | 2 +- 30 files changed, 30 insertions(+), 30 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index b8a821d..626c857 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.0-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/bagel/pom.xml -- diff --git a/bagel/pom.xml b/bagel/pom.xml index c1aa32b..1f3dec9 100644 --- a/bagel/pom.xml +++ b/bagel/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.0-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index 8acb923..bfa49d0 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.0-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index 706a97d..5b04b4f 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.0-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/external/flume-sink/pom.xml -- diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml index e8784eb..1f3e619 100644 --- a/external/flume-sink/pom.xml +++ b/external/flume-sink/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.0-SNAPSHOT/version relativePath../../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/external/flume/pom.xml -- diff --git a/external/flume/pom.xml b/external/flume/pom.xml index 1794f3e..8df7edb 100644 --- a/external/flume/pom.xml +++ b/external/flume/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId
[1/2] spark git commit: Revert Preparing development version 1.4.1-SNAPSHOT
Repository: spark Updated Branches: refs/heads/branch-1.4 0748263a2 - 586ede6b3 Revert Preparing development version 1.4.1-SNAPSHOT This reverts commit a1d896b85bd3fb88284f8b6758d7e5f0a1bb9eb3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e7309ec7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e7309ec7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e7309ec7 Branch: refs/heads/branch-1.4 Commit: e7309ec729607e485525c90166a56bfac18b625e Parents: 0748263 Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 02:27:07 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 02:27:07 2015 -0700 -- assembly/pom.xml | 2 +- bagel/pom.xml | 2 +- core/pom.xml | 2 +- examples/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-assembly/pom.xml | 2 +- external/kafka/pom.xml| 2 +- external/mqtt/pom.xml | 2 +- external/twitter/pom.xml | 2 +- external/zeromq/pom.xml | 2 +- extras/java8-tests/pom.xml| 2 +- extras/kinesis-asl/pom.xml| 2 +- extras/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib/pom.xml | 2 +- network/common/pom.xml| 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml | 2 +- pom.xml | 2 +- repl/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- unsafe/pom.xml| 2 +- yarn/pom.xml | 2 +- 30 files changed, 30 insertions(+), 30 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index b53d7c3..b8a821d 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.1-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/bagel/pom.xml -- diff --git a/bagel/pom.xml b/bagel/pom.xml index d631ff5..c1aa32b 100644 --- a/bagel/pom.xml +++ b/bagel/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.1-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index adbb7c2..8acb923 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.1-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index bf804bb..706a97d 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.1-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/external/flume-sink/pom.xml -- diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml index 076ddaa..e8784eb 100644 --- a/external/flume-sink/pom.xml +++ b/external/flume-sink/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.1-SNAPSHOT/version +version1.4.0/version relativePath../../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/external/flume/pom.xml -- diff --git a/external/flume/pom.xml b/external/flume/pom.xml index 2491c97..1794f3e 100644 --- a/external/flume/pom.xml +++
[2/2] spark git commit: Preparing Spark release v1.4.0-rc1
Preparing Spark release v1.4.0-rc1 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/777a0816 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/777a0816 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/777a0816 Branch: refs/heads/branch-1.4 Commit: 777a08166f1fb144146ba32581d4632c3466541e Parents: f9f2aaf Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 09:35:12 2015 + Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 09:35:12 2015 + -- assembly/pom.xml | 2 +- bagel/pom.xml | 2 +- core/pom.xml | 2 +- examples/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-assembly/pom.xml | 2 +- external/kafka/pom.xml| 2 +- external/mqtt/pom.xml | 2 +- external/twitter/pom.xml | 2 +- external/zeromq/pom.xml | 2 +- extras/java8-tests/pom.xml| 2 +- extras/kinesis-asl/pom.xml| 2 +- extras/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib/pom.xml | 2 +- network/common/pom.xml| 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml | 2 +- pom.xml | 2 +- repl/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- unsafe/pom.xml| 2 +- yarn/pom.xml | 2 +- 30 files changed, 30 insertions(+), 30 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 626c857..b8a821d 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/bagel/pom.xml -- diff --git a/bagel/pom.xml b/bagel/pom.xml index 1f3dec9..c1aa32b 100644 --- a/bagel/pom.xml +++ b/bagel/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index bfa49d0..8acb923 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index 5b04b4f..706a97d 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0-SNAPSHOT/version +version1.4.0/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/external/flume-sink/pom.xml -- diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml index 1f3e619..e8784eb 100644 --- a/external/flume-sink/pom.xml +++ b/external/flume-sink/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0-SNAPSHOT/version +version1.4.0/version relativePath../../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/external/flume/pom.xml -- diff --git a/external/flume/pom.xml b/external/flume/pom.xml index 8df7edb..1794f3e 100644 --- a/external/flume/pom.xml +++ b/external/flume/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0-SNAPSHOT/version +
[1/2] spark git commit: Preparing development version 1.4.1-SNAPSHOT
Repository: spark Updated Branches: refs/heads/branch-1.4 f9f2aafbf - ac3197e1b Preparing development version 1.4.1-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac3197e1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac3197e1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac3197e1 Branch: refs/heads/branch-1.4 Commit: ac3197e1b94f25508a21b5de81d1ff47e6293ab1 Parents: 777a081 Author: Patrick Wendell patr...@databricks.com Authored: Tue May 19 09:35:12 2015 + Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 09:35:12 2015 + -- assembly/pom.xml | 2 +- bagel/pom.xml | 2 +- core/pom.xml | 2 +- examples/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-assembly/pom.xml | 2 +- external/kafka/pom.xml| 2 +- external/mqtt/pom.xml | 2 +- external/twitter/pom.xml | 2 +- external/zeromq/pom.xml | 2 +- extras/java8-tests/pom.xml| 2 +- extras/kinesis-asl/pom.xml| 2 +- extras/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib/pom.xml | 2 +- network/common/pom.xml| 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml | 2 +- pom.xml | 2 +- repl/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- unsafe/pom.xml| 2 +- yarn/pom.xml | 2 +- 30 files changed, 30 insertions(+), 30 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index b8a821d..b53d7c3 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.1-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/bagel/pom.xml -- diff --git a/bagel/pom.xml b/bagel/pom.xml index c1aa32b..d631ff5 100644 --- a/bagel/pom.xml +++ b/bagel/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.1-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index 8acb923..adbb7c2 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.1-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index 706a97d..bf804bb 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.1-SNAPSHOT/version relativePath../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/external/flume-sink/pom.xml -- diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml index e8784eb..076ddaa 100644 --- a/external/flume-sink/pom.xml +++ b/external/flume-sink/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId artifactIdspark-parent_2.10/artifactId -version1.4.0/version +version1.4.1-SNAPSHOT/version relativePath../../pom.xml/relativePath /parent http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/external/flume/pom.xml -- diff --git a/external/flume/pom.xml b/external/flume/pom.xml index 1794f3e..2491c97 100644 --- a/external/flume/pom.xml +++ b/external/flume/pom.xml @@ -21,7 +21,7 @@ parent groupIdorg.apache.spark/groupId
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.4.0-rc1 [created] 777a08166 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7723] Fix string interpolation in pipeline examples
Repository: spark Updated Branches: refs/heads/branch-1.4 31f5d53e9 - 6834d1af4 [SPARK-7723] Fix string interpolation in pipeline examples https://issues.apache.org/jira/browse/SPARK-7723 Author: Saleem Ansari tux...@gmail.com Closes #6258 from tuxdna/master and squashes the following commits: 2bb5a42 [Saleem Ansari] Merge branch 'master' into mllib-pipeline e39db9c [Saleem Ansari] Fix string interpolation in pipeline examples (cherry picked from commit df34793ad4e76214fc4c0a22af1eb89b171a32e4) Signed-off-by: Sean Owen so...@cloudera.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6834d1af Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6834d1af Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6834d1af Branch: refs/heads/branch-1.4 Commit: 6834d1af4c370d6e5aa98d8d91d0cfff24e4a594 Parents: 31f5d53 Author: Saleem Ansari tux...@gmail.com Authored: Tue May 19 10:31:11 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Tue May 19 10:31:20 2015 +0100 -- docs/ml-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6834d1af/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index b7b6376..cac7056 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -237,7 +237,7 @@ model2.transform(test.toDF) .select(features, label, myProbability, prediction) .collect() .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) = -println(($features, $label) - prob=$prob, prediction=$prediction) +println(s($features, $label) - prob=$prob, prediction=$prediction) } sc.stop() @@ -391,7 +391,7 @@ model.transform(test.toDF) .select(id, text, probability, prediction) .collect() .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) = -println(($id, $text) -- prob=$prob, prediction=$prediction) +println(s($id, $text) -- prob=$prob, prediction=$prediction) } sc.stop() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7723] Fix string interpolation in pipeline examples
Repository: spark Updated Branches: refs/heads/master 27fa88b9b - df34793ad [SPARK-7723] Fix string interpolation in pipeline examples https://issues.apache.org/jira/browse/SPARK-7723 Author: Saleem Ansari tux...@gmail.com Closes #6258 from tuxdna/master and squashes the following commits: 2bb5a42 [Saleem Ansari] Merge branch 'master' into mllib-pipeline e39db9c [Saleem Ansari] Fix string interpolation in pipeline examples Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df34793a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df34793a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df34793a Branch: refs/heads/master Commit: df34793ad4e76214fc4c0a22af1eb89b171a32e4 Parents: 27fa88b Author: Saleem Ansari tux...@gmail.com Authored: Tue May 19 10:31:11 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Tue May 19 10:31:11 2015 +0100 -- docs/ml-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/df34793a/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index b7b6376..cac7056 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -237,7 +237,7 @@ model2.transform(test.toDF) .select(features, label, myProbability, prediction) .collect() .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) = -println(($features, $label) - prob=$prob, prediction=$prediction) +println(s($features, $label) - prob=$prob, prediction=$prediction) } sc.stop() @@ -391,7 +391,7 @@ model.transform(test.toDF) .select(id, text, probability, prediction) .collect() .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) = -println(($id, $text) -- prob=$prob, prediction=$prediction) +println(s($id, $text) -- prob=$prob, prediction=$prediction) } sc.stop() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6246] [EC2] fixed support for more than 100 nodes
Repository: spark Updated Branches: refs/heads/master bcb1ff814 - 2bc5e0616 [SPARK-6246] [EC2] fixed support for more than 100 nodes This is a small fix. But it is important for amazon users because as the ticket states, spark-ec2 can't handle clusters with 100 nodes now. Author: alyaxey oleksii.sliusare...@grammarly.com Closes #6267 from alyaxey/ec2_100_nodes_fix and squashes the following commits: 1e0d747 [alyaxey] [SPARK-6246] fixed support for more than 100 nodes Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2bc5e061 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2bc5e061 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2bc5e061 Branch: refs/heads/master Commit: 2bc5e0616d878b09daa8e31a7a1fdb7127bca079 Parents: bcb1ff8 Author: alyaxey oleksii.sliusare...@grammarly.com Authored: Tue May 19 16:45:52 2015 -0700 Committer: Shivaram Venkataraman shiva...@cs.berkeley.edu Committed: Tue May 19 16:45:52 2015 -0700 -- ec2/spark_ec2.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2bc5e061/ec2/spark_ec2.py -- diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py index be92d5f..c6d5a1f 100755 --- a/ec2/spark_ec2.py +++ b/ec2/spark_ec2.py @@ -864,7 +864,11 @@ def wait_for_cluster_state(conn, opts, cluster_instances, cluster_state): for i in cluster_instances: i.update() -statuses = conn.get_all_instance_status(instance_ids=[i.id for i in cluster_instances]) +max_batch = 100 +statuses = [] +for j in xrange(0, len(cluster_instances), max_batch): +batch = [i.id for i in cluster_instances[j:j + max_batch]] +statuses.extend(conn.get_all_instance_status(instance_ids=batch)) if cluster_state == 'ssh-ready': if all(i.state == 'running' for i in cluster_instances) and \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7656] [SQL] use CatalystConf in FunctionRegistry
Repository: spark Updated Branches: refs/heads/branch-1.4 2ef04a162 - 86893390c [SPARK-7656] [SQL] use CatalystConf in FunctionRegistry follow up for #5806 Author: scwf wangf...@huawei.com Closes #6164 from scwf/FunctionRegistry and squashes the following commits: 15e6697 [scwf] use catalogconf in FunctionRegistry (cherry picked from commit 60336e3bc02a2587fdf315f9011bbe7c9d3a58c4) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86893390 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86893390 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86893390 Branch: refs/heads/branch-1.4 Commit: 86893390cfd31d36ff03c2e062a13196a1f7a6fa Parents: 2ef04a1 Author: scwf wangf...@huawei.com Authored: Tue May 19 17:36:00 2015 -0700 Committer: Michael Armbrust mich...@databricks.com Committed: Tue May 19 17:36:33 2015 -0700 -- .../spark/sql/catalyst/analysis/FunctionRegistry.scala | 12 +++- .../main/scala/org/apache/spark/sql/SQLContext.scala| 2 +- .../scala/org/apache/spark/sql/hive/HiveContext.scala | 2 +- 3 files changed, 9 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/86893390/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index 16ca5bc..0849faa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.catalyst.analysis +import org.apache.spark.sql.catalyst.CatalystConf import org.apache.spark.sql.catalyst.expressions.Expression import scala.collection.mutable @@ -28,12 +29,12 @@ trait FunctionRegistry { def lookupFunction(name: String, children: Seq[Expression]): Expression - def caseSensitive: Boolean + def conf: CatalystConf } trait OverrideFunctionRegistry extends FunctionRegistry { - val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive) + val functionBuilders = StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis) override def registerFunction(name: String, builder: FunctionBuilder): Unit = { functionBuilders.put(name, builder) @@ -44,8 +45,9 @@ trait OverrideFunctionRegistry extends FunctionRegistry { } } -class SimpleFunctionRegistry(val caseSensitive: Boolean) extends FunctionRegistry { - val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive) +class SimpleFunctionRegistry(val conf: CatalystConf) extends FunctionRegistry { + + val functionBuilders = StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis) override def registerFunction(name: String, builder: FunctionBuilder): Unit = { functionBuilders.put(name, builder) @@ -69,7 +71,7 @@ object EmptyFunctionRegistry extends FunctionRegistry { throw new UnsupportedOperationException } - override def caseSensitive: Boolean = throw new UnsupportedOperationException + override def conf: CatalystConf = throw new UnsupportedOperationException } /** http://git-wip-us.apache.org/repos/asf/spark/blob/86893390/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala index 316ef7d..304e958 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala @@ -121,7 +121,7 @@ class SQLContext(@transient val sparkContext: SparkContext) // TODO how to handle the temp function per user session? @transient - protected[sql] lazy val functionRegistry: FunctionRegistry = new SimpleFunctionRegistry(true) + protected[sql] lazy val functionRegistry: FunctionRegistry = new SimpleFunctionRegistry(conf) @transient protected[sql] lazy val analyzer: Analyzer = http://git-wip-us.apache.org/repos/asf/spark/blob/86893390/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala index 2733ebd..863a5db 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala +++
spark git commit: [SPARK-7656] [SQL] use CatalystConf in FunctionRegistry
Repository: spark Updated Branches: refs/heads/master 386052063 - 60336e3bc [SPARK-7656] [SQL] use CatalystConf in FunctionRegistry follow up for #5806 Author: scwf wangf...@huawei.com Closes #6164 from scwf/FunctionRegistry and squashes the following commits: 15e6697 [scwf] use catalogconf in FunctionRegistry Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60336e3b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60336e3b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60336e3b Branch: refs/heads/master Commit: 60336e3bc02a2587fdf315f9011bbe7c9d3a58c4 Parents: 3860520 Author: scwf wangf...@huawei.com Authored: Tue May 19 17:36:00 2015 -0700 Committer: Michael Armbrust mich...@databricks.com Committed: Tue May 19 17:36:00 2015 -0700 -- .../spark/sql/catalyst/analysis/FunctionRegistry.scala | 12 +++- .../main/scala/org/apache/spark/sql/SQLContext.scala| 2 +- .../scala/org/apache/spark/sql/hive/HiveContext.scala | 2 +- 3 files changed, 9 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/60336e3b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index 16ca5bc..0849faa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.catalyst.analysis +import org.apache.spark.sql.catalyst.CatalystConf import org.apache.spark.sql.catalyst.expressions.Expression import scala.collection.mutable @@ -28,12 +29,12 @@ trait FunctionRegistry { def lookupFunction(name: String, children: Seq[Expression]): Expression - def caseSensitive: Boolean + def conf: CatalystConf } trait OverrideFunctionRegistry extends FunctionRegistry { - val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive) + val functionBuilders = StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis) override def registerFunction(name: String, builder: FunctionBuilder): Unit = { functionBuilders.put(name, builder) @@ -44,8 +45,9 @@ trait OverrideFunctionRegistry extends FunctionRegistry { } } -class SimpleFunctionRegistry(val caseSensitive: Boolean) extends FunctionRegistry { - val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive) +class SimpleFunctionRegistry(val conf: CatalystConf) extends FunctionRegistry { + + val functionBuilders = StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis) override def registerFunction(name: String, builder: FunctionBuilder): Unit = { functionBuilders.put(name, builder) @@ -69,7 +71,7 @@ object EmptyFunctionRegistry extends FunctionRegistry { throw new UnsupportedOperationException } - override def caseSensitive: Boolean = throw new UnsupportedOperationException + override def conf: CatalystConf = throw new UnsupportedOperationException } /** http://git-wip-us.apache.org/repos/asf/spark/blob/60336e3b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala index 316ef7d..304e958 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala @@ -121,7 +121,7 @@ class SQLContext(@transient val sparkContext: SparkContext) // TODO how to handle the temp function per user session? @transient - protected[sql] lazy val functionRegistry: FunctionRegistry = new SimpleFunctionRegistry(true) + protected[sql] lazy val functionRegistry: FunctionRegistry = new SimpleFunctionRegistry(conf) @transient protected[sql] lazy val analyzer: Analyzer = http://git-wip-us.apache.org/repos/asf/spark/blob/60336e3b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala index 2733ebd..863a5db 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala @@ -357,7 +357,7 @@ class HiveContext(sc: SparkContext) extends
spark git commit: [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered.
Repository: spark Updated Branches: refs/heads/branch-1.3 fc1b4a414 - a64e097f1 [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered. The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. Author: Mike Dusenberry dusenberr...@gmail.com Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs and squashes the following commits: 6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. (cherry picked from commit 3860520633770cc5719b2cdebe6dc3608798386d) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a64e097f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a64e097f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a64e097f Branch: refs/heads/branch-1.3 Commit: a64e097f128d3638fdc507ba4b62d93862ca69d1 Parents: fc1b4a4 Author: Mike Dusenberry dusenberr...@gmail.com Authored: Tue May 19 17:18:08 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 17:18:29 2015 -0700 -- docs/mllib-data-types.md | 128 +- 1 file changed, 64 insertions(+), 64 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a64e097f/docs/mllib-data-types.md -- diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md index 4f2a2f7..5f448e7 100644 --- a/docs/mllib-data-types.md +++ b/docs/mllib-data-types.md @@ -296,70 +296,6 @@ backed by an RDD of its entries. The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size. In general the use of non-deterministic RDDs can lead to errors. -### BlockMatrix - -A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where a `MatrixBlock` is -a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is -the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`. -`BlockMatrix` supports methods such as `add` and `multiply` with another `BlockMatrix`. -`BlockMatrix` also has a helper function `validate` which can be used to check whether the -`BlockMatrix` is set up properly. - -div class=codetabs -div data-lang=scala markdown=1 - -A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be -most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`. -`toBlockMatrix` creates blocks of size 1024 x 1024 by default. -Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`. - -{% highlight scala %} -import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry} - -val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries -// Create a CoordinateMatrix from an RDD[MatrixEntry]. -val coordMat: CoordinateMatrix = new CoordinateMatrix(entries) -// Transform the CoordinateMatrix to a BlockMatrix -val matA: BlockMatrix = coordMat.toBlockMatrix().cache() - -// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. -// Nothing happens if it is valid. -matA.validate() - -// Calculate A^T A. -val ata = matA.transpose.multiply(matA) -{% endhighlight %} -/div - -div data-lang=java markdown=1 - -A [`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) can be -most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`. -`toBlockMatrix` creates blocks of size 1024 x 1024 by default. -Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`. - -{% highlight java %} -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.mllib.linalg.distributed.BlockMatrix; -import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix; -import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix; - -JavaRDDMatrixEntry entries = ... // a JavaRDD of (i, j, v) Matrix Entries -// Create a CoordinateMatrix from a JavaRDDMatrixEntry. -CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd()); -// Transform
spark git commit: [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered.
Repository: spark Updated Branches: refs/heads/branch-1.4 62b4c7392 - 2ef04a162 [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered. The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. Author: Mike Dusenberry dusenberr...@gmail.com Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs and squashes the following commits: 6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. (cherry picked from commit 3860520633770cc5719b2cdebe6dc3608798386d) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ef04a16 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2ef04a16 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2ef04a16 Branch: refs/heads/branch-1.4 Commit: 2ef04a1627bd0c377dde642ac7ce140429755cca Parents: 62b4c73 Author: Mike Dusenberry dusenberr...@gmail.com Authored: Tue May 19 17:18:08 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 17:18:20 2015 -0700 -- docs/mllib-data-types.md | 128 +- 1 file changed, 64 insertions(+), 64 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2ef04a16/docs/mllib-data-types.md -- diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md index acec042..d824dab 100644 --- a/docs/mllib-data-types.md +++ b/docs/mllib-data-types.md @@ -296,70 +296,6 @@ backed by an RDD of its entries. The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size. In general the use of non-deterministic RDDs can lead to errors. -### BlockMatrix - -A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where a `MatrixBlock` is -a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is -the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`. -`BlockMatrix` supports methods such as `add` and `multiply` with another `BlockMatrix`. -`BlockMatrix` also has a helper function `validate` which can be used to check whether the -`BlockMatrix` is set up properly. - -div class=codetabs -div data-lang=scala markdown=1 - -A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be -most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`. -`toBlockMatrix` creates blocks of size 1024 x 1024 by default. -Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`. - -{% highlight scala %} -import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry} - -val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries -// Create a CoordinateMatrix from an RDD[MatrixEntry]. -val coordMat: CoordinateMatrix = new CoordinateMatrix(entries) -// Transform the CoordinateMatrix to a BlockMatrix -val matA: BlockMatrix = coordMat.toBlockMatrix().cache() - -// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. -// Nothing happens if it is valid. -matA.validate() - -// Calculate A^T A. -val ata = matA.transpose.multiply(matA) -{% endhighlight %} -/div - -div data-lang=java markdown=1 - -A [`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) can be -most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`. -`toBlockMatrix` creates blocks of size 1024 x 1024 by default. -Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`. - -{% highlight java %} -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.mllib.linalg.distributed.BlockMatrix; -import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix; -import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix; - -JavaRDDMatrixEntry entries = ... // a JavaRDD of (i, j, v) Matrix Entries -// Create a CoordinateMatrix from a JavaRDDMatrixEntry. -CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd()); -// Transform
spark git commit: [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered.
Repository: spark Updated Branches: refs/heads/master 2bc5e0616 - 386052063 [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered. The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. Author: Mike Dusenberry dusenberr...@gmail.com Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs and squashes the following commits: 6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/38605206 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/38605206 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/38605206 Branch: refs/heads/master Commit: 3860520633770cc5719b2cdebe6dc3608798386d Parents: 2bc5e06 Author: Mike Dusenberry dusenberr...@gmail.com Authored: Tue May 19 17:18:08 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 17:18:08 2015 -0700 -- docs/mllib-data-types.md | 128 +- 1 file changed, 64 insertions(+), 64 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/38605206/docs/mllib-data-types.md -- diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md index acec042..d824dab 100644 --- a/docs/mllib-data-types.md +++ b/docs/mllib-data-types.md @@ -296,70 +296,6 @@ backed by an RDD of its entries. The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size. In general the use of non-deterministic RDDs can lead to errors. -### BlockMatrix - -A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where a `MatrixBlock` is -a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is -the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`. -`BlockMatrix` supports methods such as `add` and `multiply` with another `BlockMatrix`. -`BlockMatrix` also has a helper function `validate` which can be used to check whether the -`BlockMatrix` is set up properly. - -div class=codetabs -div data-lang=scala markdown=1 - -A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be -most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`. -`toBlockMatrix` creates blocks of size 1024 x 1024 by default. -Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`. - -{% highlight scala %} -import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry} - -val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries -// Create a CoordinateMatrix from an RDD[MatrixEntry]. -val coordMat: CoordinateMatrix = new CoordinateMatrix(entries) -// Transform the CoordinateMatrix to a BlockMatrix -val matA: BlockMatrix = coordMat.toBlockMatrix().cache() - -// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. -// Nothing happens if it is valid. -matA.validate() - -// Calculate A^T A. -val ata = matA.transpose.multiply(matA) -{% endhighlight %} -/div - -div data-lang=java markdown=1 - -A [`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) can be -most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`. -`toBlockMatrix` creates blocks of size 1024 x 1024 by default. -Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`. - -{% highlight java %} -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.mllib.linalg.distributed.BlockMatrix; -import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix; -import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix; - -JavaRDDMatrixEntry entries = ... // a JavaRDD of (i, j, v) Matrix Entries -// Create a CoordinateMatrix from a JavaRDDMatrixEntry. -CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd()); -// Transform the CoordinateMatrix to a BlockMatrix -BlockMatrix matA = coordMat.toBlockMatrix().cache(); - -// Validate whether the
spark git commit: [SPARK-7681] [MLLIB] remove mima excludes for 1.3
Repository: spark Updated Branches: refs/heads/branch-1.4 ac3197e1b - 2cce6bfea [SPARK-7681] [MLLIB] remove mima excludes for 1.3 There excludes are unnecessary for 1.3 because the changes were made in 1.4.x. Author: Xiangrui Meng m...@databricks.com Closes #6254 from mengxr/SPARK-7681-mima and squashes the following commits: 7f0cea0 [Xiangrui Meng] remove mima excludes for 1.3 (cherry picked from commit 6845cb2ff475fd794b30b01af5ebc80714b880f0) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cce6bfe Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cce6bfe Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cce6bfe Branch: refs/heads/branch-1.4 Commit: 2cce6bfeab1713bd5ea90064df4987496595aedd Parents: ac3197e Author: Xiangrui Meng m...@databricks.com Authored: Tue May 19 08:24:57 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 08:25:06 2015 -0700 -- project/MimaExcludes.scala | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2cce6bfe/project/MimaExcludes.scala -- diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index f8d0160..03e93a2 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -187,14 +187,7 @@ object MimaExcludes { ProblemFilters.exclude[MissingMethodProblem]( org.apache.spark.mllib.linalg.Matrix.isTransposed), ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.Matrix.foreachActive), -// SPARK-7681 add SparseVector support for gemv -ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.Matrix.multiply), -ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.DenseMatrix.multiply), -ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.SparseMatrix.multiply) + org.apache.spark.mllib.linalg.Matrix.foreachActive) ) ++ Seq( // SPARK-5540 ProblemFilters.exclude[MissingMethodProblem]( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7681] [MLLIB] remove mima excludes for 1.3
Repository: spark Updated Branches: refs/heads/master df34793ad - 6845cb2ff [SPARK-7681] [MLLIB] remove mima excludes for 1.3 There excludes are unnecessary for 1.3 because the changes were made in 1.4.x. Author: Xiangrui Meng m...@databricks.com Closes #6254 from mengxr/SPARK-7681-mima and squashes the following commits: 7f0cea0 [Xiangrui Meng] remove mima excludes for 1.3 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6845cb2f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6845cb2f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6845cb2f Branch: refs/heads/master Commit: 6845cb2ff475fd794b30b01af5ebc80714b880f0 Parents: df34793 Author: Xiangrui Meng m...@databricks.com Authored: Tue May 19 08:24:57 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 08:24:57 2015 -0700 -- project/MimaExcludes.scala | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6845cb2f/project/MimaExcludes.scala -- diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index f8d0160..03e93a2 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -187,14 +187,7 @@ object MimaExcludes { ProblemFilters.exclude[MissingMethodProblem]( org.apache.spark.mllib.linalg.Matrix.isTransposed), ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.Matrix.foreachActive), -// SPARK-7681 add SparseVector support for gemv -ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.Matrix.multiply), -ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.DenseMatrix.multiply), -ProblemFilters.exclude[MissingMethodProblem]( - org.apache.spark.mllib.linalg.SparseMatrix.multiply) + org.apache.spark.mllib.linalg.Matrix.foreachActive) ) ++ Seq( // SPARK-5540 ProblemFilters.exclude[MissingMethodProblem]( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7704] Updating Programming Guides per SPARK-4397
Repository: spark Updated Branches: refs/heads/branch-1.4 2cce6bfea - 8567d29ef [SPARK-7704] Updating Programming Guides per SPARK-4397 The change per SPARK-4397 makes implicit objects in SparkContext to be found by the compiler automatically. So that we don't need to import the o.a.s.SparkContext._ explicitly any more and can remove some statements around the implicit conversions from the latest Programming Guides (1.3.0 and higher) Author: Dice poleon...@gmail.com Closes #6234 from daisukebe/patch-1 and squashes the following commits: b77ecd9 [Dice] fix a typo 45dfcd3 [Dice] rewording per Sean's advice a094bcf [Dice] Adding a note for users on any previous releases a29be5f [Dice] Updating Programming Guides per SPARK-4397 (cherry picked from commit 32fa611b19c6b95d4563be631c5a8ff0cdf3438f) Signed-off-by: Sean Owen so...@cloudera.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8567d29e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8567d29e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8567d29e Branch: refs/heads/branch-1.4 Commit: 8567d29ef03f49f8d3d18b8c858cca3dd7dfeb04 Parents: 2cce6bf Author: Dice poleon...@gmail.com Authored: Tue May 19 18:12:05 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Tue May 19 18:14:47 2015 +0100 -- docs/programming-guide.md | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8567d29e/docs/programming-guide.md -- diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 0c27376..07a4d29 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -41,14 +41,15 @@ In addition, if you wish to access an HDFS cluster, you need to add a dependency artifactId = hadoop-client version = your-hdfs-version -Finally, you need to import some Spark classes and implicit conversions into your program. Add the following lines: +Finally, you need to import some Spark classes into your program. Add the following lines: {% highlight scala %} import org.apache.spark.SparkContext -import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf {% endhighlight %} +(Before Spark 1.3.0, you need to explicitly `import org.apache.spark.SparkContext._` to enable essential implicit conversions.) + /div div data-lang=java markdown=1 @@ -821,11 +822,9 @@ by a key. In Scala, these operations are automatically available on RDDs containing [Tuple2](http://www.scala-lang.org/api/{{site.SCALA_VERSION}}/index.html#scala.Tuple2) objects -(the built-in tuples in the language, created by simply writing `(a, b)`), as long as you -import `org.apache.spark.SparkContext._` in your program to enable Spark's implicit -conversions. The key-value pair operations are available in the +(the built-in tuples in the language, created by simply writing `(a, b)`). The key-value pair operations are available in the [PairRDDFunctions](api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions) class, -which automatically wraps around an RDD of tuples if you import the conversions. +which automatically wraps around an RDD of tuples. For example, the following code uses the `reduceByKey` operation on key-value pairs to count how many times each line of text occurs in a file: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package
Repository: spark Updated Branches: refs/heads/master 3c4c1f964 - 68fb2a46e [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package CC jkbradley. JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586). Author: Xusen Yin yinxu...@gmail.com Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits: 77014c5 [Xusen Yin] comment fix 57a4c07 [Xusen Yin] small fix for docs 1178c8f [Xusen Yin] remove the correctness check in java suite 1c3f389 [Xusen Yin] delete sbt commit 1af152b [Xusen Yin] check python example code 1b5369e [Xusen Yin] add docs of word2vec Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68fb2a46 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68fb2a46 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68fb2a46 Branch: refs/heads/master Commit: 68fb2a46edc95f867d4b28597d20da2597f008c1 Parents: 3c4c1f9 Author: Xusen Yin yinxu...@gmail.com Authored: Tue May 19 13:43:48 2015 -0700 Committer: Joseph K. Bradley jos...@databricks.com Committed: Tue May 19 13:43:48 2015 -0700 -- docs/ml-features.md | 89 .../spark/ml/feature/JavaWord2VecSuite.java | 76 + 2 files changed, 165 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/68fb2a46/docs/ml-features.md -- diff --git a/docs/ml-features.md b/docs/ml-features.md index e86f9ed..63ea3e5 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -106,6 +106,95 @@ for features_label in featurized.select(features, label).take(3): /div /div +## Word2Vec + +`Word2Vec` is an `Estimator` which takes sequences of words that represents documents and trains a `Word2VecModel`. The model is a `Map(String, Vector)` essentially, which maps each word to an unique fix-sized vector. The `Word2VecModel` transforms each documents into a vector using the average of all words in the document, which aims to other computations of documents such as similarity calculation consequencely. Please refer to the [MLlib user guide on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on Word2Vec. + +Word2Vec is implemented in [Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the following code segment, we start with a set of documents, each of them is represented as a sequence of words. For each document, we transform it into a feature vector. This feature vector could then be passed to a learning algorithm. + +div class=codetabs +div data-lang=scala markdown=1 +{% highlight scala %} +import org.apache.spark.ml.feature.Word2Vec + +// Input data: Each row is a bag of words from a sentence or document. +val documentDF = sqlContext.createDataFrame(Seq( + Hi I heard about Spark.split( ), + I wish Java could use case classes.split( ), + Logistic regression models are neat.split( ) +).map(Tuple1.apply)).toDF(text) + +// Learn a mapping from words to Vectors. +val word2Vec = new Word2Vec() + .setInputCol(text) + .setOutputCol(result) + .setVectorSize(3) + .setMinCount(0) +val model = word2Vec.fit(documentDF) +val result = model.transform(documentDF) +result.select(result).take(3).foreach(println) +{% endhighlight %} +/div + +div data-lang=java markdown=1 +{% highlight java %} +import com.google.common.collect.Lists; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.*; + +JavaSparkContext jsc = ... +SQLContext sqlContext = ... + +// Input data: Each row is a bag of words from a sentence or document. +JavaRDDRow jrdd = jsc.parallelize(Lists.newArrayList( + RowFactory.create(Lists.newArrayList(Hi I heard about Spark.split( ))), + RowFactory.create(Lists.newArrayList(I wish Java could use case classes.split( ))), + RowFactory.create(Lists.newArrayList(Logistic regression models are neat.split( ))) +)); +StructType schema = new StructType(new StructField[]{ + new StructField(text, new ArrayType(DataTypes.StringType, true), false, Metadata.empty()) +}); +DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema); + +// Learn a mapping from words to Vectors. +Word2Vec word2Vec = new Word2Vec() + .setInputCol(text) + .setOutputCol(result) + .setVectorSize(3) + .setMinCount(0); +Word2VecModel model = word2Vec.fit(documentDF); +DataFrame result = model.transform(documentDF); +for (Row r: result.select(result).take(3)) { + System.out.println(r); +} +{% endhighlight %} +/div + +div data-lang=python markdown=1 +{% highlight python %} +from pyspark.ml.feature import Word2Vec + +#
spark git commit: [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package
Repository: spark Updated Branches: refs/heads/branch-1.4 ee012e0ed - c3871eeb2 [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package CC jkbradley. JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586). Author: Xusen Yin yinxu...@gmail.com Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits: 77014c5 [Xusen Yin] comment fix 57a4c07 [Xusen Yin] small fix for docs 1178c8f [Xusen Yin] remove the correctness check in java suite 1c3f389 [Xusen Yin] delete sbt commit 1af152b [Xusen Yin] check python example code 1b5369e [Xusen Yin] add docs of word2vec (cherry picked from commit 68fb2a46edc95f867d4b28597d20da2597f008c1) Signed-off-by: Joseph K. Bradley jos...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3871eeb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3871eeb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3871eeb Branch: refs/heads/branch-1.4 Commit: c3871eeb25ca9e1547385148025981372e14ea53 Parents: ee012e0 Author: Xusen Yin yinxu...@gmail.com Authored: Tue May 19 13:43:48 2015 -0700 Committer: Joseph K. Bradley jos...@databricks.com Committed: Tue May 19 13:44:06 2015 -0700 -- docs/ml-features.md | 89 .../spark/ml/feature/JavaWord2VecSuite.java | 76 + 2 files changed, 165 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c3871eeb/docs/ml-features.md -- diff --git a/docs/ml-features.md b/docs/ml-features.md index e86f9ed..63ea3e5 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -106,6 +106,95 @@ for features_label in featurized.select(features, label).take(3): /div /div +## Word2Vec + +`Word2Vec` is an `Estimator` which takes sequences of words that represents documents and trains a `Word2VecModel`. The model is a `Map(String, Vector)` essentially, which maps each word to an unique fix-sized vector. The `Word2VecModel` transforms each documents into a vector using the average of all words in the document, which aims to other computations of documents such as similarity calculation consequencely. Please refer to the [MLlib user guide on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on Word2Vec. + +Word2Vec is implemented in [Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the following code segment, we start with a set of documents, each of them is represented as a sequence of words. For each document, we transform it into a feature vector. This feature vector could then be passed to a learning algorithm. + +div class=codetabs +div data-lang=scala markdown=1 +{% highlight scala %} +import org.apache.spark.ml.feature.Word2Vec + +// Input data: Each row is a bag of words from a sentence or document. +val documentDF = sqlContext.createDataFrame(Seq( + Hi I heard about Spark.split( ), + I wish Java could use case classes.split( ), + Logistic regression models are neat.split( ) +).map(Tuple1.apply)).toDF(text) + +// Learn a mapping from words to Vectors. +val word2Vec = new Word2Vec() + .setInputCol(text) + .setOutputCol(result) + .setVectorSize(3) + .setMinCount(0) +val model = word2Vec.fit(documentDF) +val result = model.transform(documentDF) +result.select(result).take(3).foreach(println) +{% endhighlight %} +/div + +div data-lang=java markdown=1 +{% highlight java %} +import com.google.common.collect.Lists; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.*; + +JavaSparkContext jsc = ... +SQLContext sqlContext = ... + +// Input data: Each row is a bag of words from a sentence or document. +JavaRDDRow jrdd = jsc.parallelize(Lists.newArrayList( + RowFactory.create(Lists.newArrayList(Hi I heard about Spark.split( ))), + RowFactory.create(Lists.newArrayList(I wish Java could use case classes.split( ))), + RowFactory.create(Lists.newArrayList(Logistic regression models are neat.split( ))) +)); +StructType schema = new StructType(new StructField[]{ + new StructField(text, new ArrayType(DataTypes.StringType, true), false, Metadata.empty()) +}); +DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema); + +// Learn a mapping from words to Vectors. +Word2Vec word2Vec = new Word2Vec() + .setInputCol(text) + .setOutputCol(result) + .setVectorSize(3) + .setMinCount(0); +Word2VecModel model = word2Vec.fit(documentDF); +DataFrame result = model.transform(documentDF); +for (Row r: result.select(result).take(3)) { + System.out.println(r); +}
spark git commit: [SPARK-7662] [SQL] Resolve correct names for generator in projection
Repository: spark Updated Branches: refs/heads/branch-1.4 87fa8ccd2 - 62b4c7392 [SPARK-7662] [SQL] Resolve correct names for generator in projection ``` select explode(map(value, key)) from src; ``` Throws exception ``` org.apache.spark.sql.AnalysisException: The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 2 aliases but got _c0 ; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:43) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGenerate$$makeGeneratorOutput(Analyzer.scala:605) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16$$anonfun$22.apply(Analyzer.scala:562) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16$$anonfun$22.apply(Analyzer.scala:548) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16.applyOrElse(Analyzer.scala:548) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16.applyOrElse(Analyzer.scala:538) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) ``` Author: Cheng Hao hao.ch...@intel.com Closes #6178 from chenghao-intel/explode and squashes the following commits: 916fbe9 [Cheng Hao] add more strict rules for TGF alias 5c3f2c5 [Cheng Hao] fix bug in unit test e1d93ab [Cheng Hao] Add more unit test 19db09e [Cheng Hao] resolve names for generator in projection (cherry picked from commit bcb1ff81468eb4afc7c03b2bca18e99cc1ccf6b8) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/62b4c739 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/62b4c739 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/62b4c739 Branch: refs/heads/branch-1.4 Commit: 62b4c7392ad8711b9b0f20dba95dfce2a4864de2 Parents: 87fa8cc Author: Cheng Hao hao.ch...@intel.com Authored: Tue May 19 15:20:46 2015 -0700 Committer: Michael Armbrust mich...@databricks.com Committed: Tue May 19 15:21:03 2015 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 15 .../sql/hive/execution/HiveQuerySuite.scala | 6 ++--- .../sql/hive/execution/SQLQuerySuite.scala | 25 +++- 3 files changed, 42 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/62b4c739/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index dfa4215..c239e83 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -561,6 +561,21 @@ class Analyzer( /** Extracts a [[Generator]] expression and any names assigned by aliases to their output. */ private object AliasedGenerator { def unapply(e: Expression): Option[(Generator, Seq[String])] = e match { +case Alias(g: Generator, name) + if g.elementTypes.size 1 java.util.regex.Pattern.matches(_c[0-9]+, name) = { + // Assume the default name given by parser is _c[0-9]+, + // TODO in long term, move the naming logic from Parser to Analyzer. + // In projection, Parser gave default name for TGF as does for normal UDF, + // but the TGF probably have multiple output columns/names. + //e.g. SELECT explode(map(key, value)) FROM src; + // Let's simply ignore the default given name for this case. + Some((g, Nil)) +} +case Alias(g: Generator, name) if g.elementTypes.size 1 = + // If not given the default names, and the TGF with multiple output columns + failAnalysis( +sExpect multiple names given for ${g.getClass.getName}, + |but only single name '${name}' specified.stripMargin) case Alias(g:
spark git commit: [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python
Repository: spark Updated Branches: refs/heads/master c12dff9b8 - 4de74d260 [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python cc rxin, please take a quick look, I'm working on tests. Author: Davies Liu dav...@databricks.com Closes #6238 from davies/readwrite and squashes the following commits: c7200eb [Davies Liu] update tests 9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite f0c5a04 [Davies Liu] use sqlContext.read.load 5f68bc8 [Davies Liu] update tests 6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite bcc6668 [Davies Liu] add reader amd writer API in Python Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4de74d26 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4de74d26 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4de74d26 Branch: refs/heads/master Commit: 4de74d2602f6577c3c8458aa85377e89c19724ca Parents: c12dff9 Author: Davies Liu dav...@databricks.com Authored: Tue May 19 14:23:28 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Tue May 19 14:23:28 2015 -0700 -- .../apache/spark/api/python/PythonUtils.scala | 11 +- python/pyspark/sql/__init__.py | 1 + python/pyspark/sql/context.py | 28 +- python/pyspark/sql/dataframe.py | 67 ++-- python/pyspark/sql/readwriter.py| 338 +++ python/pyspark/sql/tests.py | 77 ++--- 6 files changed, 430 insertions(+), 92 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4de74d26/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index efb6b93..90dacae 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -50,8 +50,15 @@ private[spark] object PythonUtils { /** * Convert list of T into seq of T (for calling API with varargs) */ - def toSeq[T](cols: JList[T]): Seq[T] = { -cols.toList.toSeq + def toSeq[T](vs: JList[T]): Seq[T] = { +vs.toList.toSeq + } + + /** + * Convert list of T into array of T (for calling API with array) + */ + def toArray[T](vs: JList[T]): Array[T] = { +vs.toArray().asInstanceOf[Array[T]] } /** http://git-wip-us.apache.org/repos/asf/spark/blob/4de74d26/python/pyspark/sql/__init__.py -- diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py index 19805e2..634c575 100644 --- a/python/pyspark/sql/__init__.py +++ b/python/pyspark/sql/__init__.py @@ -58,6 +58,7 @@ from pyspark.sql.context import SQLContext, HiveContext from pyspark.sql.column import Column from pyspark.sql.dataframe import DataFrame, SchemaRDD, DataFrameNaFunctions, DataFrameStatFunctions from pyspark.sql.group import GroupedData +from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter __all__ = [ 'SQLContext', 'HiveContext', 'DataFrame', 'GroupedData', 'Column', 'Row', http://git-wip-us.apache.org/repos/asf/spark/blob/4de74d26/python/pyspark/sql/context.py -- diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py index 9f26d13..7543475 100644 --- a/python/pyspark/sql/context.py +++ b/python/pyspark/sql/context.py @@ -31,6 +31,7 @@ from pyspark.serializers import AutoBatchedSerializer, PickleSerializer from pyspark.sql.types import Row, StringType, StructType, _verify_type, \ _infer_schema, _has_nulltype, _merge_type, _create_converter, _python_to_sql_converter from pyspark.sql.dataframe import DataFrame +from pyspark.sql.readwriter import DataFrameReader try: import pandas @@ -457,19 +458,7 @@ class SQLContext(object): Optionally, a schema can be provided as the schema of the returned DataFrame. -if path is not None: -options[path] = path -if source is None: -source = self.getConf(spark.sql.sources.default, - org.apache.spark.sql.parquet) -if schema is None: -df = self._ssql_ctx.load(source, options) -else: -if not isinstance(schema, StructType): -raise TypeError(schema should be StructType) -scala_datatype = self._ssql_ctx.parseDataType(schema.json()) -df = self._ssql_ctx.load(source, scala_datatype, options) -return DataFrame(df, self) +return
spark git commit: [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python
Repository: spark Updated Branches: refs/heads/branch-1.4 5643499d2 - 87fa8ccd2 [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python cc rxin, please take a quick look, I'm working on tests. Author: Davies Liu dav...@databricks.com Closes #6238 from davies/readwrite and squashes the following commits: c7200eb [Davies Liu] update tests 9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite f0c5a04 [Davies Liu] use sqlContext.read.load 5f68bc8 [Davies Liu] update tests 6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite bcc6668 [Davies Liu] add reader amd writer API in Python (cherry picked from commit 4de74d2602f6577c3c8458aa85377e89c19724ca) Signed-off-by: Reynold Xin r...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87fa8ccd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87fa8ccd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87fa8ccd Branch: refs/heads/branch-1.4 Commit: 87fa8ccd2bd245ee16bb7e3577c1afcd7dc9730d Parents: 5643499 Author: Davies Liu dav...@databricks.com Authored: Tue May 19 14:23:28 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Tue May 19 14:23:35 2015 -0700 -- .../apache/spark/api/python/PythonUtils.scala | 11 +- python/pyspark/sql/__init__.py | 1 + python/pyspark/sql/context.py | 28 +- python/pyspark/sql/dataframe.py | 67 ++-- python/pyspark/sql/readwriter.py| 338 +++ python/pyspark/sql/tests.py | 77 ++--- 6 files changed, 430 insertions(+), 92 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/87fa8ccd/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index efb6b93..90dacae 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -50,8 +50,15 @@ private[spark] object PythonUtils { /** * Convert list of T into seq of T (for calling API with varargs) */ - def toSeq[T](cols: JList[T]): Seq[T] = { -cols.toList.toSeq + def toSeq[T](vs: JList[T]): Seq[T] = { +vs.toList.toSeq + } + + /** + * Convert list of T into array of T (for calling API with array) + */ + def toArray[T](vs: JList[T]): Array[T] = { +vs.toArray().asInstanceOf[Array[T]] } /** http://git-wip-us.apache.org/repos/asf/spark/blob/87fa8ccd/python/pyspark/sql/__init__.py -- diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py index 19805e2..634c575 100644 --- a/python/pyspark/sql/__init__.py +++ b/python/pyspark/sql/__init__.py @@ -58,6 +58,7 @@ from pyspark.sql.context import SQLContext, HiveContext from pyspark.sql.column import Column from pyspark.sql.dataframe import DataFrame, SchemaRDD, DataFrameNaFunctions, DataFrameStatFunctions from pyspark.sql.group import GroupedData +from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter __all__ = [ 'SQLContext', 'HiveContext', 'DataFrame', 'GroupedData', 'Column', 'Row', http://git-wip-us.apache.org/repos/asf/spark/blob/87fa8ccd/python/pyspark/sql/context.py -- diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py index 9f26d13..7543475 100644 --- a/python/pyspark/sql/context.py +++ b/python/pyspark/sql/context.py @@ -31,6 +31,7 @@ from pyspark.serializers import AutoBatchedSerializer, PickleSerializer from pyspark.sql.types import Row, StringType, StructType, _verify_type, \ _infer_schema, _has_nulltype, _merge_type, _create_converter, _python_to_sql_converter from pyspark.sql.dataframe import DataFrame +from pyspark.sql.readwriter import DataFrameReader try: import pandas @@ -457,19 +458,7 @@ class SQLContext(object): Optionally, a schema can be provided as the schema of the returned DataFrame. -if path is not None: -options[path] = path -if source is None: -source = self.getConf(spark.sql.sources.default, - org.apache.spark.sql.parquet) -if schema is None: -df = self._ssql_ctx.load(source, options) -else: -if not isinstance(schema, StructType): -raise TypeError(schema should be StructType) -scala_datatype = self._ssql_ctx.parseDataType(schema.json()) -
spark git commit: [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS
Repository: spark Updated Branches: refs/heads/master 68fb2a46e - c12dff9b8 [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS JIRA: https://issues.apache.org/jira/browse/SPARK-7652 Author: Liang-Chi Hsieh vii...@gmail.com Closes #6189 from viirya/naive_bayes_blas_prediction and squashes the following commits: ab611fd [Liang-Chi Hsieh] Remove unnecessary space. ddc48b9 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into naive_bayes_blas_prediction b5772b4 [Liang-Chi Hsieh] Fix binary compatibility. 2f65186 [Liang-Chi Hsieh] Remove toDense. 1b6cdfe [Liang-Chi Hsieh] Update the implementation of naive Bayes prediction with BLAS. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c12dff9b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c12dff9b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c12dff9b Branch: refs/heads/master Commit: c12dff9b82e4869f866a9b96ce0bf05503dd7dda Parents: 68fb2a4 Author: Liang-Chi Hsieh vii...@gmail.com Authored: Tue May 19 13:53:08 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 13:53:08 2015 -0700 -- .../spark/mllib/classification/NaiveBayes.scala | 41 1 file changed, 24 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c12dff9b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala index ac0ebec..53fb2cb 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala @@ -21,13 +21,11 @@ import java.lang.{Iterable = JIterable} import scala.collection.JavaConverters._ -import breeze.linalg.{Axis, DenseMatrix = BDM, DenseVector = BDV, argmax = brzArgmax, sum = brzSum} -import breeze.numerics.{exp = brzExp, log = brzLog} import org.json4s.JsonDSL._ import org.json4s.jackson.JsonMethods._ import org.apache.spark.{Logging, SparkContext, SparkException} -import org.apache.spark.mllib.linalg.{BLAS, DenseVector, SparseVector, Vector} +import org.apache.spark.mllib.linalg.{BLAS, DenseMatrix, DenseVector, SparseVector, Vector, Vectors} import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.util.{Loader, Saveable} import org.apache.spark.rdd.RDD @@ -50,6 +48,9 @@ class NaiveBayesModel private[mllib] ( val modelType: String) extends ClassificationModel with Serializable with Saveable { + private val piVector = new DenseVector(pi) + private val thetaMatrix = new DenseMatrix(labels.size, theta(0).size, theta.flatten, true) + private[mllib] def this(labels: Array[Double], pi: Array[Double], theta: Array[Array[Double]]) = this(labels, pi, theta, Multinomial) @@ -60,17 +61,18 @@ class NaiveBayesModel private[mllib] ( theta: JIterable[JIterable[Double]]) = this(labels.asScala.toArray, pi.asScala.toArray, theta.asScala.toArray.map(_.asScala.toArray)) - private val brzPi = new BDV[Double](pi) - private val brzTheta = new BDM(theta(0).length, theta.length, theta.flatten).t - // Bernoulli scoring requires log(condprob) if 1, log(1-condprob) if 0. - // This precomputes log(1.0 - exp(theta)) and its sum which are used for the linear algebra + // This precomputes log(1.0 - exp(theta)) and its sum which are used for the linear algebra // application of this condition (in predict function). - private val (brzNegTheta, brzNegThetaSum) = modelType match { + private val (thetaMinusNegTheta, negThetaSum) = modelType match { case Multinomial = (None, None) case Bernoulli = - val negTheta = brzLog((brzExp(brzTheta.copy) :*= (-1.0)) :+= 1.0) // log(1.0 - exp(x)) - (Option(negTheta), Option(brzSum(negTheta, Axis._1))) + val negTheta = thetaMatrix.map(value = math.log(1.0 - math.exp(value))) + val ones = new DenseVector(Array.fill(thetaMatrix.numCols){1.0}) + val thetaMinusNegTheta = thetaMatrix.map { value = +value - math.log(1.0 - math.exp(value)) + } + (Option(thetaMinusNegTheta), Option(negTheta.multiply(ones))) case _ = // This should never happen. throw new UnknownError(sNaiveBayesModel was created with an unknown ModelType: $modelType) @@ -85,17 +87,22 @@ class NaiveBayesModel private[mllib] ( } override def predict(testData: Vector): Double = { -val brzData = testData.toBreeze modelType match { case Multinomial = -labels(brzArgmax(brzPi + brzTheta * brzData)) +val
spark git commit: [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS
Repository: spark Updated Branches: refs/heads/branch-1.4 c3871eeb2 - 5643499d2 [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS JIRA: https://issues.apache.org/jira/browse/SPARK-7652 Author: Liang-Chi Hsieh vii...@gmail.com Closes #6189 from viirya/naive_bayes_blas_prediction and squashes the following commits: ab611fd [Liang-Chi Hsieh] Remove unnecessary space. ddc48b9 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into naive_bayes_blas_prediction b5772b4 [Liang-Chi Hsieh] Fix binary compatibility. 2f65186 [Liang-Chi Hsieh] Remove toDense. 1b6cdfe [Liang-Chi Hsieh] Update the implementation of naive Bayes prediction with BLAS. (cherry picked from commit c12dff9b82e4869f866a9b96ce0bf05503dd7dda) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5643499d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5643499d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5643499d Branch: refs/heads/branch-1.4 Commit: 5643499d220d2f8ee67f405875ce878f4b8e029d Parents: c3871ee Author: Liang-Chi Hsieh vii...@gmail.com Authored: Tue May 19 13:53:08 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 13:53:16 2015 -0700 -- .../spark/mllib/classification/NaiveBayes.scala | 41 1 file changed, 24 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5643499d/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala index ac0ebec..53fb2cb 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala @@ -21,13 +21,11 @@ import java.lang.{Iterable = JIterable} import scala.collection.JavaConverters._ -import breeze.linalg.{Axis, DenseMatrix = BDM, DenseVector = BDV, argmax = brzArgmax, sum = brzSum} -import breeze.numerics.{exp = brzExp, log = brzLog} import org.json4s.JsonDSL._ import org.json4s.jackson.JsonMethods._ import org.apache.spark.{Logging, SparkContext, SparkException} -import org.apache.spark.mllib.linalg.{BLAS, DenseVector, SparseVector, Vector} +import org.apache.spark.mllib.linalg.{BLAS, DenseMatrix, DenseVector, SparseVector, Vector, Vectors} import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.util.{Loader, Saveable} import org.apache.spark.rdd.RDD @@ -50,6 +48,9 @@ class NaiveBayesModel private[mllib] ( val modelType: String) extends ClassificationModel with Serializable with Saveable { + private val piVector = new DenseVector(pi) + private val thetaMatrix = new DenseMatrix(labels.size, theta(0).size, theta.flatten, true) + private[mllib] def this(labels: Array[Double], pi: Array[Double], theta: Array[Array[Double]]) = this(labels, pi, theta, Multinomial) @@ -60,17 +61,18 @@ class NaiveBayesModel private[mllib] ( theta: JIterable[JIterable[Double]]) = this(labels.asScala.toArray, pi.asScala.toArray, theta.asScala.toArray.map(_.asScala.toArray)) - private val brzPi = new BDV[Double](pi) - private val brzTheta = new BDM(theta(0).length, theta.length, theta.flatten).t - // Bernoulli scoring requires log(condprob) if 1, log(1-condprob) if 0. - // This precomputes log(1.0 - exp(theta)) and its sum which are used for the linear algebra + // This precomputes log(1.0 - exp(theta)) and its sum which are used for the linear algebra // application of this condition (in predict function). - private val (brzNegTheta, brzNegThetaSum) = modelType match { + private val (thetaMinusNegTheta, negThetaSum) = modelType match { case Multinomial = (None, None) case Bernoulli = - val negTheta = brzLog((brzExp(brzTheta.copy) :*= (-1.0)) :+= 1.0) // log(1.0 - exp(x)) - (Option(negTheta), Option(brzSum(negTheta, Axis._1))) + val negTheta = thetaMatrix.map(value = math.log(1.0 - math.exp(value))) + val ones = new DenseVector(Array.fill(thetaMatrix.numCols){1.0}) + val thetaMinusNegTheta = thetaMatrix.map { value = +value - math.log(1.0 - math.exp(value)) + } + (Option(thetaMinusNegTheta), Option(negTheta.multiply(ones))) case _ = // This should never happen. throw new UnknownError(sNaiveBayesModel was created with an unknown ModelType: $modelType) @@ -85,17 +87,22 @@ class NaiveBayesModel private[mllib] ( } override def predict(testData: Vector): Double = { -val brzData =
spark git commit: [SPARK-7047] [ML] ml.Model optional parent support
Repository: spark Updated Branches: refs/heads/branch-1.4 8567d29ef - 24cb323e7 [SPARK-7047] [ML] ml.Model optional parent support Made Model.parent transient. Added Model.hasParent to test for null parent CC: mengxr Author: Joseph K. Bradley jos...@databricks.com Closes #5914 from jkbradley/parent-optional and squashes the following commits: d501774 [Joseph K. Bradley] Made Model.parent transient. Added Model.hasParent to test for null parent (cherry picked from commit fb90273212dc7241c9a0c3446e25e0e0b9377750) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24cb323e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24cb323e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24cb323e Branch: refs/heads/branch-1.4 Commit: 24cb323e767a342496cf24e0d06398b5af38ac80 Parents: 8567d29 Author: Joseph K. Bradley jos...@databricks.com Authored: Tue May 19 10:55:21 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 10:55:32 2015 -0700 -- mllib/src/main/scala/org/apache/spark/ml/Model.scala| 5 - .../spark/ml/classification/LogisticRegressionSuite.scala | 1 + .../spark/ml/classification/RandomForestClassifierSuite.scala | 2 ++ 3 files changed, 7 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/24cb323e/mllib/src/main/scala/org/apache/spark/ml/Model.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/Model.scala b/mllib/src/main/scala/org/apache/spark/ml/Model.scala index 7fd5153..70e7495 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/Model.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/Model.scala @@ -32,7 +32,7 @@ abstract class Model[M : Model[M]] extends Transformer { * The parent estimator that produced this model. * Note: For ensembles' component Models, this value can be null. */ - var parent: Estimator[M] = _ + @transient var parent: Estimator[M] = _ /** * Sets the parent of this model (Java API). @@ -42,6 +42,9 @@ abstract class Model[M : Model[M]] extends Transformer { this.asInstanceOf[M] } + /** Indicates whether this [[Model]] has a corresponding parent. */ + def hasParent: Boolean = parent != null + override def copy(extra: ParamMap): M = { // The default implementation of Params.copy doesn't work for models. throw new NotImplementedError(s${this.getClass} doesn't implement copy(extra: ParamMap)) http://git-wip-us.apache.org/repos/asf/spark/blob/24cb323e/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala index 4376524..97f9749 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala @@ -83,6 +83,7 @@ class LogisticRegressionSuite extends FunSuite with MLlibTestSparkContext { assert(model.getRawPredictionCol === rawPrediction) assert(model.getProbabilityCol === probability) assert(model.intercept !== 0.0) +assert(model.hasParent) } test(logistic regression doesn't fit intercept when fitIntercept is off) { http://git-wip-us.apache.org/repos/asf/spark/blob/24cb323e/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala index 08f86fa..cdbbaca 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala @@ -162,5 +162,7 @@ private object RandomForestClassifierSuite { val oldModelAsNew = RandomForestClassificationModel.fromOld( oldModel, newModel.parent.asInstanceOf[RandomForestClassifier], categoricalFeatures) TreeTests.checkEqual(oldModelAsNew, newModel) +assert(newModel.hasParent) + assert(!newModel.trees.head.asInstanceOf[DecisionTreeClassificationModel].hasParent) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7678] [ML] Fix default random seed in HasSeed
Repository: spark Updated Branches: refs/heads/master fb9027321 - 7b16e9f21 [SPARK-7678] [ML] Fix default random seed in HasSeed Changed shared param HasSeed to have default based on hashCode of class name, instead of random number. Also, removed fixed random seeds from Word2Vec and ALS. CC: mengxr Author: Joseph K. Bradley jos...@databricks.com Closes #6251 from jkbradley/scala-fixed-seed and squashes the following commits: 0e37184 [Joseph K. Bradley] Fixed Word2VecSuite, ALSSuite in spark.ml to use original fixed random seeds 678ec3a [Joseph K. Bradley] Removed fixed random seeds from Word2Vec and ALS. Changed shared param HasSeed to have default based on hashCode of class name, instead of random number. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b16e9f2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b16e9f2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b16e9f2 Branch: refs/heads/master Commit: 7b16e9f2118fbfbb1c0ba957161fe500c9aff82a Parents: fb90273 Author: Joseph K. Bradley jos...@databricks.com Authored: Tue May 19 10:57:47 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 10:57:47 2015 -0700 -- .../org/apache/spark/ml/feature/Word2Vec.scala | 1 - .../spark/ml/param/shared/SharedParamsCodeGen.scala | 2 +- .../apache/spark/ml/param/shared/sharedParams.scala | 4 ++-- .../org/apache/spark/ml/recommendation/ALS.scala| 2 +- .../org/apache/spark/ml/feature/Word2VecSuite.scala | 1 + .../apache/spark/ml/recommendation/ALSSuite.scala | 16 +--- 6 files changed, 14 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7b16e9f2/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala index 8ace8c5..90f0be7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala @@ -68,7 +68,6 @@ private[feature] trait Word2VecBase extends Params setDefault(stepSize - 0.025) setDefault(maxIter - 1) - setDefault(seed - 42L) /** * Validate and transform the input schema. http://git-wip-us.apache.org/repos/asf/spark/blob/7b16e9f2/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala index 5085b79..8b8cb81 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala @@ -53,7 +53,7 @@ private[shared] object SharedParamsCodeGen { ParamDesc[Int](checkpointInterval, checkpoint interval (= 1), isValid = ParamValidators.gtEq(1)), ParamDesc[Boolean](fitIntercept, whether to fit an intercept term, Some(true)), - ParamDesc[Long](seed, random seed, Some(Utils.random.nextLong())), + ParamDesc[Long](seed, random seed, Some(this.getClass.getName.hashCode.toLong)), ParamDesc[Double](elasticNetParam, the ElasticNet mixing parameter, in range [0, 1]. + For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty., isValid = ParamValidators.inRange(0, 1)), http://git-wip-us.apache.org/repos/asf/spark/blob/7b16e9f2/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala index 7525d37..3a4976d 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala @@ -232,7 +232,7 @@ private[ml] trait HasFitIntercept extends Params { } /** - * (private[ml]) Trait for shared param seed (default: Utils.random.nextLong()). + * (private[ml]) Trait for shared param seed (default: this.getClass.getName.hashCode.toLong). */ private[ml] trait HasSeed extends Params { @@ -242,7 +242,7 @@ private[ml] trait HasSeed extends Params { */ final val seed: LongParam = new LongParam(this, seed, random seed) - setDefault(seed, Utils.random.nextLong()) + setDefault(seed, this.getClass.getName.hashCode.toLong) /** @group getParam */ final def getSeed: Long = $(seed)
spark git commit: [SPARK-7678] [ML] Fix default random seed in HasSeed
Repository: spark Updated Branches: refs/heads/branch-1.4 24cb323e7 - cd3093e70 [SPARK-7678] [ML] Fix default random seed in HasSeed Changed shared param HasSeed to have default based on hashCode of class name, instead of random number. Also, removed fixed random seeds from Word2Vec and ALS. CC: mengxr Author: Joseph K. Bradley jos...@databricks.com Closes #6251 from jkbradley/scala-fixed-seed and squashes the following commits: 0e37184 [Joseph K. Bradley] Fixed Word2VecSuite, ALSSuite in spark.ml to use original fixed random seeds 678ec3a [Joseph K. Bradley] Removed fixed random seeds from Word2Vec and ALS. Changed shared param HasSeed to have default based on hashCode of class name, instead of random number. (cherry picked from commit 7b16e9f2118fbfbb1c0ba957161fe500c9aff82a) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cd3093e7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cd3093e7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cd3093e7 Branch: refs/heads/branch-1.4 Commit: cd3093e705b184df1291cd8f03331a9618993693 Parents: 24cb323 Author: Joseph K. Bradley jos...@databricks.com Authored: Tue May 19 10:57:47 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Tue May 19 10:57:54 2015 -0700 -- .../org/apache/spark/ml/feature/Word2Vec.scala | 1 - .../spark/ml/param/shared/SharedParamsCodeGen.scala | 2 +- .../apache/spark/ml/param/shared/sharedParams.scala | 4 ++-- .../org/apache/spark/ml/recommendation/ALS.scala| 2 +- .../org/apache/spark/ml/feature/Word2VecSuite.scala | 1 + .../apache/spark/ml/recommendation/ALSSuite.scala | 16 +--- 6 files changed, 14 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cd3093e7/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala index 8ace8c5..90f0be7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala @@ -68,7 +68,6 @@ private[feature] trait Word2VecBase extends Params setDefault(stepSize - 0.025) setDefault(maxIter - 1) - setDefault(seed - 42L) /** * Validate and transform the input schema. http://git-wip-us.apache.org/repos/asf/spark/blob/cd3093e7/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala index 5085b79..8b8cb81 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala @@ -53,7 +53,7 @@ private[shared] object SharedParamsCodeGen { ParamDesc[Int](checkpointInterval, checkpoint interval (= 1), isValid = ParamValidators.gtEq(1)), ParamDesc[Boolean](fitIntercept, whether to fit an intercept term, Some(true)), - ParamDesc[Long](seed, random seed, Some(Utils.random.nextLong())), + ParamDesc[Long](seed, random seed, Some(this.getClass.getName.hashCode.toLong)), ParamDesc[Double](elasticNetParam, the ElasticNet mixing parameter, in range [0, 1]. + For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty., isValid = ParamValidators.inRange(0, 1)), http://git-wip-us.apache.org/repos/asf/spark/blob/cd3093e7/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala index 7525d37..3a4976d 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala @@ -232,7 +232,7 @@ private[ml] trait HasFitIntercept extends Params { } /** - * (private[ml]) Trait for shared param seed (default: Utils.random.nextLong()). + * (private[ml]) Trait for shared param seed (default: this.getClass.getName.hashCode.toLong). */ private[ml] trait HasSeed extends Params { @@ -242,7 +242,7 @@ private[ml] trait HasSeed extends Params { */ final val seed: LongParam = new LongParam(this, seed, random seed) - setDefault(seed, Utils.random.nextLong()) +
spark git commit: [SPARK-7726] Fix Scaladoc false errors
Repository: spark Updated Branches: refs/heads/master 7b16e9f21 - 3c4c1f964 [SPARK-7726] Fix Scaladoc false errors Visibility rules for static members are different in Scala and Java, and this case requires an explicit static import. Even though these are Java files, they are run through scaladoc, which enforces Scala rules. Also reverted the commit that reverts the upgrade to 2.11.6 Author: Iulian Dragos jagua...@gmail.com Closes #6260 from dragos/issue/scaladoc-false-error and squashes the following commits: f2e998e [Iulian Dragos] Revert [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6 0bad052 [Iulian Dragos] Fix scaladoc faux-error. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3c4c1f96 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3c4c1f96 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3c4c1f96 Branch: refs/heads/master Commit: 3c4c1f96474b3e66fa1d44ac0177f548cf5a3a10 Parents: 7b16e9f Author: Iulian Dragos jagua...@gmail.com Authored: Tue May 19 12:14:48 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Tue May 19 12:14:48 2015 -0700 -- .../org/apache/spark/network/shuffle/protocol/OpenBlocks.java| 3 +++ .../apache/spark/network/shuffle/protocol/RegisterExecutor.java | 3 +++ .../org/apache/spark/network/shuffle/protocol/StreamHandle.java | 3 +++ .../org/apache/spark/network/shuffle/protocol/UploadBlock.java | 3 +++ pom.xml | 4 ++-- .../src/main/scala/org/apache/spark/repl/SparkIMain.scala| 2 +- 6 files changed, 15 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3c4c1f96/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java -- diff --git a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java index 60485ba..ce954b8 100644 --- a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java +++ b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java @@ -24,6 +24,9 @@ import io.netty.buffer.ByteBuf; import org.apache.spark.network.protocol.Encoders; +// Needed by ScalaDoc. See SPARK-7726 +import static org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type; + /** Request to read a set of blocks. Returns {@link StreamHandle}. */ public class OpenBlocks extends BlockTransferMessage { public final String appId; http://git-wip-us.apache.org/repos/asf/spark/blob/3c4c1f96/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java -- diff --git a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java index 38acae3..cca8b17 100644 --- a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java +++ b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java @@ -22,6 +22,9 @@ import io.netty.buffer.ByteBuf; import org.apache.spark.network.protocol.Encoders; +// Needed by ScalaDoc. See SPARK-7726 +import static org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type; + /** * Initial registration message between an executor and its local shuffle server. * Returns nothing (empty bye array). http://git-wip-us.apache.org/repos/asf/spark/blob/3c4c1f96/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java -- diff --git a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java index 9a92202..1915295 100644 --- a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java +++ b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java @@ -20,6 +20,9 @@ package org.apache.spark.network.shuffle.protocol; import com.google.common.base.Objects; import io.netty.buffer.ByteBuf; +// Needed by ScalaDoc. See SPARK-7726 +import static org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type; + /** * Identifier for a fixed number of chunks to read from a stream created by an open blocks * message. This is used by {@link org.apache.spark.network.shuffle.OneForOneBlockFetcher}.