spark git commit: Fixing a few basic typos in the Programming Guide.

2015-05-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 a1d896b85 - 0748263a2


Fixing a few basic typos in the Programming Guide.

Just a few minor fixes in the guide, so a new JIRA issue was not created per 
the guidelines.

Author: Mike Dusenberry dusenberr...@gmail.com

Closes #6240 from dusenberrymw/Fix_Programming_Guide_Typos and squashes the 
following commits:

ffa76eb [Mike Dusenberry] Fixing a few basic typos in the Programming Guide.

(cherry picked from commit 61f164d3fdd1c8dcdba8c9d66df05ff4069aa6e6)
Signed-off-by: Sean Owen so...@cloudera.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0748263a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0748263a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0748263a

Branch: refs/heads/branch-1.4
Commit: 0748263a2e36e9aef172808e3b6208a1f4d4fdb8
Parents: a1d896b
Author: Mike Dusenberry dusenberr...@gmail.com
Authored: Tue May 19 08:59:45 2015 +0100
Committer: Sean Owen so...@cloudera.com
Committed: Tue May 19 09:00:19 2015 +0100

--
 docs/programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0748263a/docs/programming-guide.md
--
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 2781651..0c27376 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1071,7 +1071,7 @@ for details.
 /tr
 tr
   td bsaveAsSequenceFile/b(ipath/i) br / (Java and Scala) /td
-  td Write the elements of the dataset as a Hadoop SequenceFile in a given 
path in the local filesystem, HDFS or any other Hadoop-supported file system. 
This is available on RDDs of key-value pairs that either implement Hadoop's 
Writable interface. In Scala, it is also
+  td Write the elements of the dataset as a Hadoop SequenceFile in a given 
path in the local filesystem, HDFS or any other Hadoop-supported file system. 
This is available on RDDs of key-value pairs that implement Hadoop's Writable 
interface. In Scala, it is also
available on types that are implicitly convertible to Writable (Spark 
includes conversions for basic types like Int, Double, String, etc). /td
 /tr
 tr
@@ -1122,7 +1122,7 @@ ordered data following shuffle then it's possible to use:
 * `sortBy` to make a globally ordered RDD
 
 Operations which can cause a shuffle include **repartition** operations like
-[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), **'ByKey** 
operations
+[`repartition`](#RepartitionLink) and [`coalesce`](#CoalesceLink), **'ByKey** 
operations
 (except for counting) like [`groupByKey`](#GroupByLink) and 
[`reduceByKey`](#ReduceByLink), and
 **join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink).
 
@@ -1138,7 +1138,7 @@ read the relevant sorted blocks.
 
 Certain shuffle operations can consume significant amounts of heap memory 
since they employ 
 in-memory data structures to organize records before or after transferring 
them. Specifically, 
-`reduceByKey` and `aggregateByKey` create these structures on the map side and 
`'ByKey` operations 
+`reduceByKey` and `aggregateByKey` create these structures on the map side, 
and `'ByKey` operations 
 generate these on the reduce side. When data does not fit in memory Spark will 
spill these tables 
 to disk, incurring the additional overhead of disk I/O and increased garbage 
collection.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: Fixing a few basic typos in the Programming Guide.

2015-05-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 6008ec14e - 61f164d3f


Fixing a few basic typos in the Programming Guide.

Just a few minor fixes in the guide, so a new JIRA issue was not created per 
the guidelines.

Author: Mike Dusenberry dusenberr...@gmail.com

Closes #6240 from dusenberrymw/Fix_Programming_Guide_Typos and squashes the 
following commits:

ffa76eb [Mike Dusenberry] Fixing a few basic typos in the Programming Guide.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/61f164d3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/61f164d3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/61f164d3

Branch: refs/heads/master
Commit: 61f164d3fdd1c8dcdba8c9d66df05ff4069aa6e6
Parents: 6008ec1
Author: Mike Dusenberry dusenberr...@gmail.com
Authored: Tue May 19 08:59:45 2015 +0100
Committer: Sean Owen so...@cloudera.com
Committed: Tue May 19 08:59:45 2015 +0100

--
 docs/programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/61f164d3/docs/programming-guide.md
--
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 2781651..0c27376 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1071,7 +1071,7 @@ for details.
 /tr
 tr
   td bsaveAsSequenceFile/b(ipath/i) br / (Java and Scala) /td
-  td Write the elements of the dataset as a Hadoop SequenceFile in a given 
path in the local filesystem, HDFS or any other Hadoop-supported file system. 
This is available on RDDs of key-value pairs that either implement Hadoop's 
Writable interface. In Scala, it is also
+  td Write the elements of the dataset as a Hadoop SequenceFile in a given 
path in the local filesystem, HDFS or any other Hadoop-supported file system. 
This is available on RDDs of key-value pairs that implement Hadoop's Writable 
interface. In Scala, it is also
available on types that are implicitly convertible to Writable (Spark 
includes conversions for basic types like Int, Double, String, etc). /td
 /tr
 tr
@@ -1122,7 +1122,7 @@ ordered data following shuffle then it's possible to use:
 * `sortBy` to make a globally ordered RDD
 
 Operations which can cause a shuffle include **repartition** operations like
-[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), **'ByKey** 
operations
+[`repartition`](#RepartitionLink) and [`coalesce`](#CoalesceLink), **'ByKey** 
operations
 (except for counting) like [`groupByKey`](#GroupByLink) and 
[`reduceByKey`](#ReduceByLink), and
 **join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink).
 
@@ -1138,7 +1138,7 @@ read the relevant sorted blocks.
 
 Certain shuffle operations can consume significant amounts of heap memory 
since they employ 
 in-memory data structures to organize records before or after transferring 
them. Specifically, 
-`reduceByKey` and `aggregateByKey` create these structures on the map side and 
`'ByKey` operations 
+`reduceByKey` and `aggregateByKey` create these structures on the map side, 
and `'ByKey` operations 
 generate these on the reduce side. When data does not fit in memory Spark will 
spill these tables 
 to disk, incurring the additional overhead of disk I/O and increased garbage 
collection.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6

2015-05-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 586ede6b3 - 31f5d53e9


[HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6

This reverts commit a11c8683c76c67f45749a1b50a0912a731fd2487.

For more information see:
https://issues.apache.org/jira/browse/SPARK-7726


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/31f5d53e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/31f5d53e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/31f5d53e

Branch: refs/heads/branch-1.4
Commit: 31f5d53e9efea3c9728a51fe65e8baa589ddfa6f
Parents: 586ede6
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 02:28:41 2015 -0700
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 02:28:41 2015 -0700

--
 pom.xml  | 4 ++--
 .../src/main/scala/org/apache/spark/repl/SparkIMain.scala| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/31f5d53e/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 6f525b6..68edf03 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1799,9 +1799,9 @@
 propertynamescala-2.11/name/property
   /activation
   properties
-scala.version2.11.6/scala.version
+scala.version2.11.2/scala.version
 scala.binary.version2.11/scala.binary.version
-jline.version2.12.1/jline.version
+jline.version2.12/jline.version
 jline.groupidjline/jline.groupid
   /properties
 /profile

http://git-wip-us.apache.org/repos/asf/spark/blob/31f5d53e/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git 
a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
index 1cb910f..1bb62c8 100644
--- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
+++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
@@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: 
ScriptEngineFactory, initialSettings
 
 def apply(line: String): Result = debugging(sparse($line))  {
   var isIncomplete = false
-  currentRun.parsing.withIncompleteHandler((_, _) = isIncomplete = true) {
+  currentRun.reporting.withIncompleteHandler((_, _) = isIncomplete = 
true) {
 reporter.reset()
 val trees = newUnitParser(line).parseStats()
 if (reporter.hasErrors) Error


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6

2015-05-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/master 61f164d3f - 27fa88b9b


[HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6

This reverts commit a11c8683c76c67f45749a1b50a0912a731fd2487.

For more information see:
https://issues.apache.org/jira/browse/SPARK-7726


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27fa88b9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27fa88b9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27fa88b9

Branch: refs/heads/master
Commit: 27fa88b9ba320cd0d95703aa3437151ba7c86f98
Parents: 61f164d
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 02:28:41 2015 -0700
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 02:29:38 2015 -0700

--
 pom.xml  | 4 ++--
 .../src/main/scala/org/apache/spark/repl/SparkIMain.scala| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/27fa88b9/pom.xml
--
diff --git a/pom.xml b/pom.xml
index c72d7cb..d903f02 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1799,9 +1799,9 @@
 propertynamescala-2.11/name/property
   /activation
   properties
-scala.version2.11.6/scala.version
+scala.version2.11.2/scala.version
 scala.binary.version2.11/scala.binary.version
-jline.version2.12.1/jline.version
+jline.version2.12/jline.version
 jline.groupidjline/jline.groupid
   /properties
 /profile

http://git-wip-us.apache.org/repos/asf/spark/blob/27fa88b9/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git 
a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
index 1cb910f..1bb62c8 100644
--- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
+++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
@@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: 
ScriptEngineFactory, initialSettings
 
 def apply(line: String): Result = debugging(sparse($line))  {
   var isIncomplete = false
-  currentRun.parsing.withIncompleteHandler((_, _) = isIncomplete = true) {
+  currentRun.reporting.withIncompleteHandler((_, _) = isIncomplete = 
true) {
 reporter.reset()
 val trees = newUnitParser(line).parseStats()
 if (reporter.hasErrors) Error


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: CHANGES.txt updates

2015-05-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 6834d1af4 - f9f2aafbf


CHANGES.txt updates


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f9f2aafb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f9f2aafb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f9f2aafb

Branch: refs/heads/branch-1.4
Commit: f9f2aafbf1f208344e0efd78893d0b6c9932293c
Parents: 6834d1a
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 02:32:32 2015 -0700
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 02:32:53 2015 -0700

--
 CHANGES.txt | 35 +++
 1 file changed, 35 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f9f2aafb/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 6660580..8c99404 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -3,6 +3,41 @@ Spark Change Log
 
 Release 1.4.0
 
+  [HOTFIX] Revert [SPARK-7092] Update spark scala version to 2.11.6
+  Patrick Wendell patr...@databricks.com
+  2015-05-19 02:28:41 -0700
+  Commit: 31f5d53
+
+  Revert Preparing Spark release v1.4.0-rc1
+  Patrick Wendell patr...@databricks.com
+  2015-05-19 02:27:14 -0700
+  Commit: 586ede6
+
+  Revert Preparing development version 1.4.1-SNAPSHOT
+  Patrick Wendell patr...@databricks.com
+  2015-05-19 02:27:07 -0700
+  Commit: e7309ec
+
+  Fixing a few basic typos in the Programming Guide.
+  Mike Dusenberry dusenberr...@gmail.com
+  2015-05-19 08:59:45 +0100
+  Commit: 0748263, github.com/apache/spark/pull/6240
+
+  Preparing development version 1.4.1-SNAPSHOT
+  Patrick Wendell patr...@databricks.com
+  2015-05-19 07:13:24 +
+  Commit: a1d896b
+
+  Preparing Spark release v1.4.0-rc1
+  Patrick Wendell patr...@databricks.com
+  2015-05-19 07:13:24 +
+  Commit: 79fb01a
+
+  Updating CHANGES.txt for Spark 1.4
+  Patrick Wendell patr...@databricks.com
+  2015-05-19 00:12:20 -0700
+  Commit: 30bf333
+
   Revert Preparing Spark release v1.4.0-rc1
   Patrick Wendell patr...@databricks.com
   2015-05-19 00:10:39 -0700


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[2/2] spark git commit: Revert Preparing Spark release v1.4.0-rc1

2015-05-19 Thread pwendell
Revert Preparing Spark release v1.4.0-rc1

This reverts commit 79fb01a3be07b5086134a6fe103248e9a33a9500.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/586ede6b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/586ede6b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/586ede6b

Branch: refs/heads/branch-1.4
Commit: 586ede6b32790dc15b6111836bd5955c61b53bac
Parents: e7309ec
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 02:27:14 2015 -0700
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 02:27:14 2015 -0700

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-assembly/pom.xml   | 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 unsafe/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 30 files changed, 30 insertions(+), 30 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b8a821d..626c857 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index c1aa32b..1f3dec9 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 8acb923..bfa49d0 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 706a97d..5b04b4f 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index e8784eb..1f3e619 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.0-SNAPSHOT/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/586ede6b/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index 1794f3e..8df7edb 100644
--- a/external/flume/pom.xml
+++ b/external/flume/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 

[1/2] spark git commit: Revert Preparing development version 1.4.1-SNAPSHOT

2015-05-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 0748263a2 - 586ede6b3


Revert Preparing development version 1.4.1-SNAPSHOT

This reverts commit a1d896b85bd3fb88284f8b6758d7e5f0a1bb9eb3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e7309ec7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e7309ec7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e7309ec7

Branch: refs/heads/branch-1.4
Commit: e7309ec729607e485525c90166a56bfac18b625e
Parents: 0748263
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 02:27:07 2015 -0700
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 02:27:07 2015 -0700

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-assembly/pom.xml   | 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 unsafe/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 30 files changed, 30 insertions(+), 30 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b53d7c3..b8a821d 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.1-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index d631ff5..c1aa32b 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.1-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index adbb7c2..8acb923 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.1-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index bf804bb..706a97d 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.1-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index 076ddaa..e8784eb 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.1-SNAPSHOT/version
+version1.4.0/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e7309ec7/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index 2491c97..1794f3e 100644
--- a/external/flume/pom.xml
+++ 

[2/2] spark git commit: Preparing Spark release v1.4.0-rc1

2015-05-19 Thread pwendell
Preparing Spark release v1.4.0-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/777a0816
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/777a0816
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/777a0816

Branch: refs/heads/branch-1.4
Commit: 777a08166f1fb144146ba32581d4632c3466541e
Parents: f9f2aaf
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 09:35:12 2015 +
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 09:35:12 2015 +

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-assembly/pom.xml   | 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 unsafe/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 30 files changed, 30 insertions(+), 30 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 626c857..b8a821d 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 1f3dec9..c1aa32b 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index bfa49d0..8acb923 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 5b04b4f..706a97d 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0-SNAPSHOT/version
+version1.4.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index 1f3e619..e8784eb 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0-SNAPSHOT/version
+version1.4.0/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/777a0816/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index 8df7edb..1794f3e 100644
--- a/external/flume/pom.xml
+++ b/external/flume/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0-SNAPSHOT/version
+

[1/2] spark git commit: Preparing development version 1.4.1-SNAPSHOT

2015-05-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 f9f2aafbf - ac3197e1b


Preparing development version 1.4.1-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac3197e1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac3197e1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac3197e1

Branch: refs/heads/branch-1.4
Commit: ac3197e1b94f25508a21b5de81d1ff47e6293ab1
Parents: 777a081
Author: Patrick Wendell patr...@databricks.com
Authored: Tue May 19 09:35:12 2015 +
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 09:35:12 2015 +

--
 assembly/pom.xml  | 2 +-
 bagel/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-assembly/pom.xml   | 2 +-
 external/kafka/pom.xml| 2 +-
 external/mqtt/pom.xml | 2 +-
 external/twitter/pom.xml  | 2 +-
 external/zeromq/pom.xml   | 2 +-
 extras/java8-tests/pom.xml| 2 +-
 extras/kinesis-asl/pom.xml| 2 +-
 extras/spark-ganglia-lgpl/pom.xml | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib/pom.xml | 2 +-
 network/common/pom.xml| 2 +-
 network/shuffle/pom.xml   | 2 +-
 network/yarn/pom.xml  | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 unsafe/pom.xml| 2 +-
 yarn/pom.xml  | 2 +-
 30 files changed, 30 insertions(+), 30 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b8a821d..b53d7c3 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index c1aa32b..d631ff5 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 8acb923..adbb7c2 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 706a97d..bf804bb 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index e8784eb..076ddaa 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.4.0/version
+version1.4.1-SNAPSHOT/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ac3197e1/external/flume/pom.xml
--
diff --git a/external/flume/pom.xml b/external/flume/pom.xml
index 1794f3e..2491c97 100644
--- a/external/flume/pom.xml
+++ b/external/flume/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId

Git Push Summary

2015-05-19 Thread pwendell
Repository: spark
Updated Tags:  refs/tags/v1.4.0-rc1 [created] 777a08166

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7723] Fix string interpolation in pipeline examples

2015-05-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 31f5d53e9 - 6834d1af4


[SPARK-7723] Fix string interpolation in pipeline examples

https://issues.apache.org/jira/browse/SPARK-7723

Author: Saleem Ansari tux...@gmail.com

Closes #6258 from tuxdna/master and squashes the following commits:

2bb5a42 [Saleem Ansari] Merge branch 'master' into mllib-pipeline
e39db9c [Saleem Ansari] Fix string interpolation in pipeline examples

(cherry picked from commit df34793ad4e76214fc4c0a22af1eb89b171a32e4)
Signed-off-by: Sean Owen so...@cloudera.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6834d1af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6834d1af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6834d1af

Branch: refs/heads/branch-1.4
Commit: 6834d1af4c370d6e5aa98d8d91d0cfff24e4a594
Parents: 31f5d53
Author: Saleem Ansari tux...@gmail.com
Authored: Tue May 19 10:31:11 2015 +0100
Committer: Sean Owen so...@cloudera.com
Committed: Tue May 19 10:31:20 2015 +0100

--
 docs/ml-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6834d1af/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index b7b6376..cac7056 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -237,7 +237,7 @@ model2.transform(test.toDF)
   .select(features, label, myProbability, prediction)
   .collect()
   .foreach { case Row(features: Vector, label: Double, prob: Vector, 
prediction: Double) =
-println(($features, $label) - prob=$prob, prediction=$prediction)
+println(s($features, $label) - prob=$prob, prediction=$prediction)
   }
 
 sc.stop()
@@ -391,7 +391,7 @@ model.transform(test.toDF)
   .select(id, text, probability, prediction)
   .collect()
   .foreach { case Row(id: Long, text: String, prob: Vector, prediction: 
Double) =
-println(($id, $text) -- prob=$prob, prediction=$prediction)
+println(s($id, $text) -- prob=$prob, prediction=$prediction)
   }
 
 sc.stop()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7723] Fix string interpolation in pipeline examples

2015-05-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 27fa88b9b - df34793ad


[SPARK-7723] Fix string interpolation in pipeline examples

https://issues.apache.org/jira/browse/SPARK-7723

Author: Saleem Ansari tux...@gmail.com

Closes #6258 from tuxdna/master and squashes the following commits:

2bb5a42 [Saleem Ansari] Merge branch 'master' into mllib-pipeline
e39db9c [Saleem Ansari] Fix string interpolation in pipeline examples


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df34793a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df34793a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df34793a

Branch: refs/heads/master
Commit: df34793ad4e76214fc4c0a22af1eb89b171a32e4
Parents: 27fa88b
Author: Saleem Ansari tux...@gmail.com
Authored: Tue May 19 10:31:11 2015 +0100
Committer: Sean Owen so...@cloudera.com
Committed: Tue May 19 10:31:11 2015 +0100

--
 docs/ml-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/df34793a/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index b7b6376..cac7056 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -237,7 +237,7 @@ model2.transform(test.toDF)
   .select(features, label, myProbability, prediction)
   .collect()
   .foreach { case Row(features: Vector, label: Double, prob: Vector, 
prediction: Double) =
-println(($features, $label) - prob=$prob, prediction=$prediction)
+println(s($features, $label) - prob=$prob, prediction=$prediction)
   }
 
 sc.stop()
@@ -391,7 +391,7 @@ model.transform(test.toDF)
   .select(id, text, probability, prediction)
   .collect()
   .foreach { case Row(id: Long, text: String, prob: Vector, prediction: 
Double) =
-println(($id, $text) -- prob=$prob, prediction=$prediction)
+println(s($id, $text) -- prob=$prob, prediction=$prediction)
   }
 
 sc.stop()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-6246] [EC2] fixed support for more than 100 nodes

2015-05-19 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master bcb1ff814 - 2bc5e0616


[SPARK-6246] [EC2] fixed support for more than 100 nodes

This is a small fix. But it is important for amazon users because as the ticket 
states, spark-ec2 can't handle clusters with  100 nodes now.

Author: alyaxey oleksii.sliusare...@grammarly.com

Closes #6267 from alyaxey/ec2_100_nodes_fix and squashes the following commits:

1e0d747 [alyaxey] [SPARK-6246] fixed support for more than 100 nodes


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2bc5e061
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2bc5e061
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2bc5e061

Branch: refs/heads/master
Commit: 2bc5e0616d878b09daa8e31a7a1fdb7127bca079
Parents: bcb1ff8
Author: alyaxey oleksii.sliusare...@grammarly.com
Authored: Tue May 19 16:45:52 2015 -0700
Committer: Shivaram Venkataraman shiva...@cs.berkeley.edu
Committed: Tue May 19 16:45:52 2015 -0700

--
 ec2/spark_ec2.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2bc5e061/ec2/spark_ec2.py
--
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index be92d5f..c6d5a1f 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -864,7 +864,11 @@ def wait_for_cluster_state(conn, opts, cluster_instances, 
cluster_state):
 for i in cluster_instances:
 i.update()
 
-statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
cluster_instances])
+max_batch = 100
+statuses = []
+for j in xrange(0, len(cluster_instances), max_batch):
+batch = [i.id for i in cluster_instances[j:j + max_batch]]
+statuses.extend(conn.get_all_instance_status(instance_ids=batch))
 
 if cluster_state == 'ssh-ready':
 if all(i.state == 'running' for i in cluster_instances) and \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7656] [SQL] use CatalystConf in FunctionRegistry

2015-05-19 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 2ef04a162 - 86893390c


[SPARK-7656] [SQL] use CatalystConf in FunctionRegistry

follow up for #5806

Author: scwf wangf...@huawei.com

Closes #6164 from scwf/FunctionRegistry and squashes the following commits:

15e6697 [scwf] use catalogconf in FunctionRegistry

(cherry picked from commit 60336e3bc02a2587fdf315f9011bbe7c9d3a58c4)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86893390
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86893390
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86893390

Branch: refs/heads/branch-1.4
Commit: 86893390cfd31d36ff03c2e062a13196a1f7a6fa
Parents: 2ef04a1
Author: scwf wangf...@huawei.com
Authored: Tue May 19 17:36:00 2015 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Tue May 19 17:36:33 2015 -0700

--
 .../spark/sql/catalyst/analysis/FunctionRegistry.scala  | 12 +++-
 .../main/scala/org/apache/spark/sql/SQLContext.scala|  2 +-
 .../scala/org/apache/spark/sql/hive/HiveContext.scala   |  2 +-
 3 files changed, 9 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/86893390/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 16ca5bc..0849faa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
+import org.apache.spark.sql.catalyst.CatalystConf
 import org.apache.spark.sql.catalyst.expressions.Expression
 import scala.collection.mutable
 
@@ -28,12 +29,12 @@ trait FunctionRegistry {
 
   def lookupFunction(name: String, children: Seq[Expression]): Expression
 
-  def caseSensitive: Boolean
+  def conf: CatalystConf
 }
 
 trait OverrideFunctionRegistry extends FunctionRegistry {
 
-  val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive)
+  val functionBuilders = 
StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis)
 
   override def registerFunction(name: String, builder: FunctionBuilder): Unit 
= {
 functionBuilders.put(name, builder)
@@ -44,8 +45,9 @@ trait OverrideFunctionRegistry extends FunctionRegistry {
   }
 }
 
-class SimpleFunctionRegistry(val caseSensitive: Boolean) extends 
FunctionRegistry {
-  val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive)
+class SimpleFunctionRegistry(val conf: CatalystConf) extends FunctionRegistry {
+
+  val functionBuilders = 
StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis)
 
   override def registerFunction(name: String, builder: FunctionBuilder): Unit 
= {
 functionBuilders.put(name, builder)
@@ -69,7 +71,7 @@ object EmptyFunctionRegistry extends FunctionRegistry {
 throw new UnsupportedOperationException
   }
 
-  override def caseSensitive: Boolean = throw new UnsupportedOperationException
+  override def conf: CatalystConf = throw new UnsupportedOperationException
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/86893390/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 316ef7d..304e958 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -121,7 +121,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
 
   // TODO how to handle the temp function per user session?
   @transient
-  protected[sql] lazy val functionRegistry: FunctionRegistry = new 
SimpleFunctionRegistry(true)
+  protected[sql] lazy val functionRegistry: FunctionRegistry = new 
SimpleFunctionRegistry(conf)
 
   @transient
   protected[sql] lazy val analyzer: Analyzer =

http://git-wip-us.apache.org/repos/asf/spark/blob/86893390/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
index 2733ebd..863a5db 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
+++ 

spark git commit: [SPARK-7656] [SQL] use CatalystConf in FunctionRegistry

2015-05-19 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/master 386052063 - 60336e3bc


[SPARK-7656] [SQL] use CatalystConf in FunctionRegistry

follow up for #5806

Author: scwf wangf...@huawei.com

Closes #6164 from scwf/FunctionRegistry and squashes the following commits:

15e6697 [scwf] use catalogconf in FunctionRegistry


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60336e3b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60336e3b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60336e3b

Branch: refs/heads/master
Commit: 60336e3bc02a2587fdf315f9011bbe7c9d3a58c4
Parents: 3860520
Author: scwf wangf...@huawei.com
Authored: Tue May 19 17:36:00 2015 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Tue May 19 17:36:00 2015 -0700

--
 .../spark/sql/catalyst/analysis/FunctionRegistry.scala  | 12 +++-
 .../main/scala/org/apache/spark/sql/SQLContext.scala|  2 +-
 .../scala/org/apache/spark/sql/hive/HiveContext.scala   |  2 +-
 3 files changed, 9 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60336e3b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 16ca5bc..0849faa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
+import org.apache.spark.sql.catalyst.CatalystConf
 import org.apache.spark.sql.catalyst.expressions.Expression
 import scala.collection.mutable
 
@@ -28,12 +29,12 @@ trait FunctionRegistry {
 
   def lookupFunction(name: String, children: Seq[Expression]): Expression
 
-  def caseSensitive: Boolean
+  def conf: CatalystConf
 }
 
 trait OverrideFunctionRegistry extends FunctionRegistry {
 
-  val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive)
+  val functionBuilders = 
StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis)
 
   override def registerFunction(name: String, builder: FunctionBuilder): Unit 
= {
 functionBuilders.put(name, builder)
@@ -44,8 +45,9 @@ trait OverrideFunctionRegistry extends FunctionRegistry {
   }
 }
 
-class SimpleFunctionRegistry(val caseSensitive: Boolean) extends 
FunctionRegistry {
-  val functionBuilders = StringKeyHashMap[FunctionBuilder](caseSensitive)
+class SimpleFunctionRegistry(val conf: CatalystConf) extends FunctionRegistry {
+
+  val functionBuilders = 
StringKeyHashMap[FunctionBuilder](conf.caseSensitiveAnalysis)
 
   override def registerFunction(name: String, builder: FunctionBuilder): Unit 
= {
 functionBuilders.put(name, builder)
@@ -69,7 +71,7 @@ object EmptyFunctionRegistry extends FunctionRegistry {
 throw new UnsupportedOperationException
   }
 
-  override def caseSensitive: Boolean = throw new UnsupportedOperationException
+  override def conf: CatalystConf = throw new UnsupportedOperationException
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/60336e3b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 316ef7d..304e958 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -121,7 +121,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
 
   // TODO how to handle the temp function per user session?
   @transient
-  protected[sql] lazy val functionRegistry: FunctionRegistry = new 
SimpleFunctionRegistry(true)
+  protected[sql] lazy val functionRegistry: FunctionRegistry = new 
SimpleFunctionRegistry(conf)
 
   @transient
   protected[sql] lazy val analyzer: Analyzer =

http://git-wip-us.apache.org/repos/asf/spark/blob/60336e3b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
index 2733ebd..863a5db 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
@@ -357,7 +357,7 @@ class HiveContext(sc: SparkContext) extends 

spark git commit: [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered.

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 fc1b4a414 - a64e097f1


[SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types 
documentation should be reordered.

The documentation for BlockMatrix should come after RowMatrix, 
IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later 
three types, and RowMatrix is considered the basic distributed matrix.  This 
will improve comprehensibility of the Distributed matrix section, especially 
for the new reader.

Author: Mike Dusenberry dusenberr...@gmail.com

Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs 
and squashes the following commits:

6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after 
RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references 
the later three types, and RowMatrix is considered the basic distributed 
matrix.  This will improve comprehensibility of the Distributed matrix 
section, especially for the new reader.

(cherry picked from commit 3860520633770cc5719b2cdebe6dc3608798386d)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a64e097f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a64e097f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a64e097f

Branch: refs/heads/branch-1.3
Commit: a64e097f128d3638fdc507ba4b62d93862ca69d1
Parents: fc1b4a4
Author: Mike Dusenberry dusenberr...@gmail.com
Authored: Tue May 19 17:18:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 17:18:29 2015 -0700

--
 docs/mllib-data-types.md | 128 +-
 1 file changed, 64 insertions(+), 64 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a64e097f/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 4f2a2f7..5f448e7 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -296,70 +296,6 @@ backed by an RDD of its entries.
 The underlying RDDs of a distributed matrix must be deterministic, because we 
cache the matrix size.
 In general the use of non-deterministic RDDs can lead to errors.
 
-### BlockMatrix
-
-A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, 
where a `MatrixBlock` is
-a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the 
block, and `Matrix` is
-the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
-`BlockMatrix` supports methods such as `add` and `multiply` with another 
`BlockMatrix`.
-`BlockMatrix` also has a helper function `validate` which can be used to check 
whether the
-`BlockMatrix` is set up properly.
-
-div class=codetabs
-div data-lang=scala markdown=1
-
-A 
[`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix)
 can be
-most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by 
calling `toBlockMatrix`.
-`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
-Users may change the block size by supplying the values through 
`toBlockMatrix(rowsPerBlock, colsPerBlock)`.
-
-{% highlight scala %}
-import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, 
CoordinateMatrix, MatrixEntry}
-
-val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
-// Create a CoordinateMatrix from an RDD[MatrixEntry].
-val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
-// Transform the CoordinateMatrix to a BlockMatrix
-val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
-
-// Validate whether the BlockMatrix is set up properly. Throws an Exception 
when it is not valid.
-// Nothing happens if it is valid.
-matA.validate()
-
-// Calculate A^T A.
-val ata = matA.transpose.multiply(matA)
-{% endhighlight %}
-/div
-
-div data-lang=java markdown=1
-
-A 
[`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html)
 can be
-most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by 
calling `toBlockMatrix`.
-`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
-Users may change the block size by supplying the values through 
`toBlockMatrix(rowsPerBlock, colsPerBlock)`.
-
-{% highlight java %}
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
-import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
-import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
-
-JavaRDDMatrixEntry entries = ... // a JavaRDD of (i, j, v) Matrix Entries
-// Create a CoordinateMatrix from a JavaRDDMatrixEntry.
-CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
-// Transform 

spark git commit: [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered.

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 62b4c7392 - 2ef04a162


[SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types 
documentation should be reordered.

The documentation for BlockMatrix should come after RowMatrix, 
IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later 
three types, and RowMatrix is considered the basic distributed matrix.  This 
will improve comprehensibility of the Distributed matrix section, especially 
for the new reader.

Author: Mike Dusenberry dusenberr...@gmail.com

Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs 
and squashes the following commits:

6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after 
RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references 
the later three types, and RowMatrix is considered the basic distributed 
matrix.  This will improve comprehensibility of the Distributed matrix 
section, especially for the new reader.

(cherry picked from commit 3860520633770cc5719b2cdebe6dc3608798386d)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ef04a16
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2ef04a16
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2ef04a16

Branch: refs/heads/branch-1.4
Commit: 2ef04a1627bd0c377dde642ac7ce140429755cca
Parents: 62b4c73
Author: Mike Dusenberry dusenberr...@gmail.com
Authored: Tue May 19 17:18:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 17:18:20 2015 -0700

--
 docs/mllib-data-types.md | 128 +-
 1 file changed, 64 insertions(+), 64 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2ef04a16/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index acec042..d824dab 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -296,70 +296,6 @@ backed by an RDD of its entries.
 The underlying RDDs of a distributed matrix must be deterministic, because we 
cache the matrix size.
 In general the use of non-deterministic RDDs can lead to errors.
 
-### BlockMatrix
-
-A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, 
where a `MatrixBlock` is
-a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the 
block, and `Matrix` is
-the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
-`BlockMatrix` supports methods such as `add` and `multiply` with another 
`BlockMatrix`.
-`BlockMatrix` also has a helper function `validate` which can be used to check 
whether the
-`BlockMatrix` is set up properly.
-
-div class=codetabs
-div data-lang=scala markdown=1
-
-A 
[`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix)
 can be
-most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by 
calling `toBlockMatrix`.
-`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
-Users may change the block size by supplying the values through 
`toBlockMatrix(rowsPerBlock, colsPerBlock)`.
-
-{% highlight scala %}
-import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, 
CoordinateMatrix, MatrixEntry}
-
-val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
-// Create a CoordinateMatrix from an RDD[MatrixEntry].
-val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
-// Transform the CoordinateMatrix to a BlockMatrix
-val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
-
-// Validate whether the BlockMatrix is set up properly. Throws an Exception 
when it is not valid.
-// Nothing happens if it is valid.
-matA.validate()
-
-// Calculate A^T A.
-val ata = matA.transpose.multiply(matA)
-{% endhighlight %}
-/div
-
-div data-lang=java markdown=1
-
-A 
[`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html)
 can be
-most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by 
calling `toBlockMatrix`.
-`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
-Users may change the block size by supplying the values through 
`toBlockMatrix(rowsPerBlock, colsPerBlock)`.
-
-{% highlight java %}
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
-import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
-import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
-
-JavaRDDMatrixEntry entries = ... // a JavaRDD of (i, j, v) Matrix Entries
-// Create a CoordinateMatrix from a JavaRDDMatrixEntry.
-CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
-// Transform 

spark git commit: [SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types documentation should be reordered.

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 2bc5e0616 - 386052063


[SPARK-7744] [DOCS] [MLLIB] Distributed matrix section in MLlib Data Types 
documentation should be reordered.

The documentation for BlockMatrix should come after RowMatrix, 
IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later 
three types, and RowMatrix is considered the basic distributed matrix.  This 
will improve comprehensibility of the Distributed matrix section, especially 
for the new reader.

Author: Mike Dusenberry dusenberr...@gmail.com

Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs 
and squashes the following commits:

6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after 
RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references 
the later three types, and RowMatrix is considered the basic distributed 
matrix.  This will improve comprehensibility of the Distributed matrix 
section, especially for the new reader.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/38605206
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/38605206
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/38605206

Branch: refs/heads/master
Commit: 3860520633770cc5719b2cdebe6dc3608798386d
Parents: 2bc5e06
Author: Mike Dusenberry dusenberr...@gmail.com
Authored: Tue May 19 17:18:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 17:18:08 2015 -0700

--
 docs/mllib-data-types.md | 128 +-
 1 file changed, 64 insertions(+), 64 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/38605206/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index acec042..d824dab 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -296,70 +296,6 @@ backed by an RDD of its entries.
 The underlying RDDs of a distributed matrix must be deterministic, because we 
cache the matrix size.
 In general the use of non-deterministic RDDs can lead to errors.
 
-### BlockMatrix
-
-A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, 
where a `MatrixBlock` is
-a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the 
block, and `Matrix` is
-the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
-`BlockMatrix` supports methods such as `add` and `multiply` with another 
`BlockMatrix`.
-`BlockMatrix` also has a helper function `validate` which can be used to check 
whether the
-`BlockMatrix` is set up properly.
-
-div class=codetabs
-div data-lang=scala markdown=1
-
-A 
[`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix)
 can be
-most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by 
calling `toBlockMatrix`.
-`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
-Users may change the block size by supplying the values through 
`toBlockMatrix(rowsPerBlock, colsPerBlock)`.
-
-{% highlight scala %}
-import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, 
CoordinateMatrix, MatrixEntry}
-
-val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
-// Create a CoordinateMatrix from an RDD[MatrixEntry].
-val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
-// Transform the CoordinateMatrix to a BlockMatrix
-val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
-
-// Validate whether the BlockMatrix is set up properly. Throws an Exception 
when it is not valid.
-// Nothing happens if it is valid.
-matA.validate()
-
-// Calculate A^T A.
-val ata = matA.transpose.multiply(matA)
-{% endhighlight %}
-/div
-
-div data-lang=java markdown=1
-
-A 
[`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html)
 can be
-most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by 
calling `toBlockMatrix`.
-`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
-Users may change the block size by supplying the values through 
`toBlockMatrix(rowsPerBlock, colsPerBlock)`.
-
-{% highlight java %}
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
-import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
-import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
-
-JavaRDDMatrixEntry entries = ... // a JavaRDD of (i, j, v) Matrix Entries
-// Create a CoordinateMatrix from a JavaRDDMatrixEntry.
-CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
-// Transform the CoordinateMatrix to a BlockMatrix
-BlockMatrix matA = coordMat.toBlockMatrix().cache();
-
-// Validate whether the 

spark git commit: [SPARK-7681] [MLLIB] remove mima excludes for 1.3

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 ac3197e1b - 2cce6bfea


[SPARK-7681] [MLLIB] remove mima excludes for 1.3

There excludes are unnecessary for 1.3 because the changes were made in 1.4.x.

Author: Xiangrui Meng m...@databricks.com

Closes #6254 from mengxr/SPARK-7681-mima and squashes the following commits:

7f0cea0 [Xiangrui Meng] remove mima excludes for 1.3

(cherry picked from commit 6845cb2ff475fd794b30b01af5ebc80714b880f0)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cce6bfe
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cce6bfe
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cce6bfe

Branch: refs/heads/branch-1.4
Commit: 2cce6bfeab1713bd5ea90064df4987496595aedd
Parents: ac3197e
Author: Xiangrui Meng m...@databricks.com
Authored: Tue May 19 08:24:57 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 08:25:06 2015 -0700

--
 project/MimaExcludes.scala | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2cce6bfe/project/MimaExcludes.scala
--
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index f8d0160..03e93a2 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -187,14 +187,7 @@ object MimaExcludes {
 ProblemFilters.exclude[MissingMethodProblem](
   org.apache.spark.mllib.linalg.Matrix.isTransposed),
 ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.Matrix.foreachActive),
-// SPARK-7681 add SparseVector support for gemv
-ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.Matrix.multiply),
-ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.DenseMatrix.multiply),
-ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.SparseMatrix.multiply)
+  org.apache.spark.mllib.linalg.Matrix.foreachActive)
   ) ++ Seq(
 // SPARK-5540
 ProblemFilters.exclude[MissingMethodProblem](


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7681] [MLLIB] remove mima excludes for 1.3

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master df34793ad - 6845cb2ff


[SPARK-7681] [MLLIB] remove mima excludes for 1.3

There excludes are unnecessary for 1.3 because the changes were made in 1.4.x.

Author: Xiangrui Meng m...@databricks.com

Closes #6254 from mengxr/SPARK-7681-mima and squashes the following commits:

7f0cea0 [Xiangrui Meng] remove mima excludes for 1.3


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6845cb2f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6845cb2f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6845cb2f

Branch: refs/heads/master
Commit: 6845cb2ff475fd794b30b01af5ebc80714b880f0
Parents: df34793
Author: Xiangrui Meng m...@databricks.com
Authored: Tue May 19 08:24:57 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 08:24:57 2015 -0700

--
 project/MimaExcludes.scala | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6845cb2f/project/MimaExcludes.scala
--
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index f8d0160..03e93a2 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -187,14 +187,7 @@ object MimaExcludes {
 ProblemFilters.exclude[MissingMethodProblem](
   org.apache.spark.mllib.linalg.Matrix.isTransposed),
 ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.Matrix.foreachActive),
-// SPARK-7681 add SparseVector support for gemv
-ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.Matrix.multiply),
-ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.DenseMatrix.multiply),
-ProblemFilters.exclude[MissingMethodProblem](
-  org.apache.spark.mllib.linalg.SparseMatrix.multiply)
+  org.apache.spark.mllib.linalg.Matrix.foreachActive)
   ) ++ Seq(
 // SPARK-5540
 ProblemFilters.exclude[MissingMethodProblem](


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7704] Updating Programming Guides per SPARK-4397

2015-05-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 2cce6bfea - 8567d29ef


[SPARK-7704] Updating Programming Guides per SPARK-4397

The change per SPARK-4397 makes implicit objects in SparkContext to be found by 
the compiler automatically. So that we don't need to import the 
o.a.s.SparkContext._ explicitly any more and can remove some statements around 
the implicit conversions from the latest Programming Guides (1.3.0 and higher)

Author: Dice poleon...@gmail.com

Closes #6234 from daisukebe/patch-1 and squashes the following commits:

b77ecd9 [Dice] fix a typo
45dfcd3 [Dice] rewording per Sean's advice
a094bcf [Dice] Adding a note for users on any previous releases
a29be5f [Dice] Updating Programming Guides per SPARK-4397

(cherry picked from commit 32fa611b19c6b95d4563be631c5a8ff0cdf3438f)
Signed-off-by: Sean Owen so...@cloudera.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8567d29e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8567d29e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8567d29e

Branch: refs/heads/branch-1.4
Commit: 8567d29ef03f49f8d3d18b8c858cca3dd7dfeb04
Parents: 2cce6bf
Author: Dice poleon...@gmail.com
Authored: Tue May 19 18:12:05 2015 +0100
Committer: Sean Owen so...@cloudera.com
Committed: Tue May 19 18:14:47 2015 +0100

--
 docs/programming-guide.md | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8567d29e/docs/programming-guide.md
--
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 0c27376..07a4d29 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -41,14 +41,15 @@ In addition, if you wish to access an HDFS cluster, you 
need to add a dependency
 artifactId = hadoop-client
 version = your-hdfs-version
 
-Finally, you need to import some Spark classes and implicit conversions into 
your program. Add the following lines:
+Finally, you need to import some Spark classes into your program. Add the 
following lines:
 
 {% highlight scala %}
 import org.apache.spark.SparkContext
-import org.apache.spark.SparkContext._
 import org.apache.spark.SparkConf
 {% endhighlight %}
 
+(Before Spark 1.3.0, you need to explicitly `import 
org.apache.spark.SparkContext._` to enable essential implicit conversions.)
+
 /div
 
 div data-lang=java  markdown=1
@@ -821,11 +822,9 @@ by a key.
 
 In Scala, these operations are automatically available on RDDs containing
 
[Tuple2](http://www.scala-lang.org/api/{{site.SCALA_VERSION}}/index.html#scala.Tuple2)
 objects
-(the built-in tuples in the language, created by simply writing `(a, b)`), as 
long as you
-import `org.apache.spark.SparkContext._` in your program to enable Spark's 
implicit
-conversions. The key-value pair operations are available in the
+(the built-in tuples in the language, created by simply writing `(a, b)`). The 
key-value pair operations are available in the
 [PairRDDFunctions](api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions) 
class,
-which automatically wraps around an RDD of tuples if you import the 
conversions.
+which automatically wraps around an RDD of tuples.
 
 For example, the following code uses the `reduceByKey` operation on key-value 
pairs to count how
 many times each line of text occurs in a file:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package

2015-05-19 Thread jkbradley
Repository: spark
Updated Branches:
  refs/heads/master 3c4c1f964 - 68fb2a46e


[SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package

CC jkbradley.

JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586).

Author: Xusen Yin yinxu...@gmail.com

Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits:

77014c5 [Xusen Yin] comment fix
57a4c07 [Xusen Yin] small fix for docs
1178c8f [Xusen Yin] remove the correctness check in java suite
1c3f389 [Xusen Yin] delete sbt commit
1af152b [Xusen Yin] check python example code
1b5369e [Xusen Yin] add docs of word2vec


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68fb2a46
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68fb2a46
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68fb2a46

Branch: refs/heads/master
Commit: 68fb2a46edc95f867d4b28597d20da2597f008c1
Parents: 3c4c1f9
Author: Xusen Yin yinxu...@gmail.com
Authored: Tue May 19 13:43:48 2015 -0700
Committer: Joseph K. Bradley jos...@databricks.com
Committed: Tue May 19 13:43:48 2015 -0700

--
 docs/ml-features.md | 89 
 .../spark/ml/feature/JavaWord2VecSuite.java | 76 +
 2 files changed, 165 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/68fb2a46/docs/ml-features.md
--
diff --git a/docs/ml-features.md b/docs/ml-features.md
index e86f9ed..63ea3e5 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -106,6 +106,95 @@ for features_label in featurized.select(features, 
label).take(3):
 /div
 /div
 
+## Word2Vec
+
+`Word2Vec` is an `Estimator` which takes sequences of words that represents 
documents and trains a `Word2VecModel`. The model is a `Map(String, Vector)` 
essentially, which maps each word to an unique fix-sized vector. The 
`Word2VecModel` transforms each documents into a vector using the average of 
all words in the document, which aims to other computations of documents such 
as similarity calculation consequencely. Please refer to the [MLlib user guide 
on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on 
Word2Vec.
+
+Word2Vec is implemented in 
[Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the 
following code segment, we start with a set of documents, each of them is 
represented as a sequence of words. For each document, we transform it into a 
feature vector. This feature vector could then be passed to a learning 
algorithm.
+
+div class=codetabs
+div data-lang=scala markdown=1
+{% highlight scala %}
+import org.apache.spark.ml.feature.Word2Vec
+
+// Input data: Each row is a bag of words from a sentence or document.
+val documentDF = sqlContext.createDataFrame(Seq(
+  Hi I heard about Spark.split( ),
+  I wish Java could use case classes.split( ),
+  Logistic regression models are neat.split( )
+).map(Tuple1.apply)).toDF(text)
+
+// Learn a mapping from words to Vectors.
+val word2Vec = new Word2Vec()
+  .setInputCol(text)
+  .setOutputCol(result)
+  .setVectorSize(3)
+  .setMinCount(0)
+val model = word2Vec.fit(documentDF)
+val result = model.transform(documentDF)
+result.select(result).take(3).foreach(println)
+{% endhighlight %}
+/div
+
+div data-lang=java markdown=1
+{% highlight java %}
+import com.google.common.collect.Lists;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.types.*;
+
+JavaSparkContext jsc = ...
+SQLContext sqlContext = ...
+
+// Input data: Each row is a bag of words from a sentence or document.
+JavaRDDRow jrdd = jsc.parallelize(Lists.newArrayList(
+  RowFactory.create(Lists.newArrayList(Hi I heard about Spark.split( ))),
+  RowFactory.create(Lists.newArrayList(I wish Java could use case 
classes.split( ))),
+  RowFactory.create(Lists.newArrayList(Logistic regression models are 
neat.split( )))
+));
+StructType schema = new StructType(new StructField[]{
+  new StructField(text, new ArrayType(DataTypes.StringType, true), false, 
Metadata.empty())
+});
+DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema);
+
+// Learn a mapping from words to Vectors.
+Word2Vec word2Vec = new Word2Vec()
+  .setInputCol(text)
+  .setOutputCol(result)
+  .setVectorSize(3)
+  .setMinCount(0);
+Word2VecModel model = word2Vec.fit(documentDF);
+DataFrame result = model.transform(documentDF);
+for (Row r: result.select(result).take(3)) {
+  System.out.println(r);
+}
+{% endhighlight %}
+/div
+
+div data-lang=python markdown=1
+{% highlight python %}
+from pyspark.ml.feature import Word2Vec
+
+# 

spark git commit: [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package

2015-05-19 Thread jkbradley
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 ee012e0ed - c3871eeb2


[SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package

CC jkbradley.

JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586).

Author: Xusen Yin yinxu...@gmail.com

Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits:

77014c5 [Xusen Yin] comment fix
57a4c07 [Xusen Yin] small fix for docs
1178c8f [Xusen Yin] remove the correctness check in java suite
1c3f389 [Xusen Yin] delete sbt commit
1af152b [Xusen Yin] check python example code
1b5369e [Xusen Yin] add docs of word2vec

(cherry picked from commit 68fb2a46edc95f867d4b28597d20da2597f008c1)
Signed-off-by: Joseph K. Bradley jos...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3871eeb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3871eeb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3871eeb

Branch: refs/heads/branch-1.4
Commit: c3871eeb25ca9e1547385148025981372e14ea53
Parents: ee012e0
Author: Xusen Yin yinxu...@gmail.com
Authored: Tue May 19 13:43:48 2015 -0700
Committer: Joseph K. Bradley jos...@databricks.com
Committed: Tue May 19 13:44:06 2015 -0700

--
 docs/ml-features.md | 89 
 .../spark/ml/feature/JavaWord2VecSuite.java | 76 +
 2 files changed, 165 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c3871eeb/docs/ml-features.md
--
diff --git a/docs/ml-features.md b/docs/ml-features.md
index e86f9ed..63ea3e5 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -106,6 +106,95 @@ for features_label in featurized.select(features, 
label).take(3):
 /div
 /div
 
+## Word2Vec
+
+`Word2Vec` is an `Estimator` which takes sequences of words that represents 
documents and trains a `Word2VecModel`. The model is a `Map(String, Vector)` 
essentially, which maps each word to an unique fix-sized vector. The 
`Word2VecModel` transforms each documents into a vector using the average of 
all words in the document, which aims to other computations of documents such 
as similarity calculation consequencely. Please refer to the [MLlib user guide 
on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on 
Word2Vec.
+
+Word2Vec is implemented in 
[Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the 
following code segment, we start with a set of documents, each of them is 
represented as a sequence of words. For each document, we transform it into a 
feature vector. This feature vector could then be passed to a learning 
algorithm.
+
+div class=codetabs
+div data-lang=scala markdown=1
+{% highlight scala %}
+import org.apache.spark.ml.feature.Word2Vec
+
+// Input data: Each row is a bag of words from a sentence or document.
+val documentDF = sqlContext.createDataFrame(Seq(
+  Hi I heard about Spark.split( ),
+  I wish Java could use case classes.split( ),
+  Logistic regression models are neat.split( )
+).map(Tuple1.apply)).toDF(text)
+
+// Learn a mapping from words to Vectors.
+val word2Vec = new Word2Vec()
+  .setInputCol(text)
+  .setOutputCol(result)
+  .setVectorSize(3)
+  .setMinCount(0)
+val model = word2Vec.fit(documentDF)
+val result = model.transform(documentDF)
+result.select(result).take(3).foreach(println)
+{% endhighlight %}
+/div
+
+div data-lang=java markdown=1
+{% highlight java %}
+import com.google.common.collect.Lists;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.types.*;
+
+JavaSparkContext jsc = ...
+SQLContext sqlContext = ...
+
+// Input data: Each row is a bag of words from a sentence or document.
+JavaRDDRow jrdd = jsc.parallelize(Lists.newArrayList(
+  RowFactory.create(Lists.newArrayList(Hi I heard about Spark.split( ))),
+  RowFactory.create(Lists.newArrayList(I wish Java could use case 
classes.split( ))),
+  RowFactory.create(Lists.newArrayList(Logistic regression models are 
neat.split( )))
+));
+StructType schema = new StructType(new StructField[]{
+  new StructField(text, new ArrayType(DataTypes.StringType, true), false, 
Metadata.empty())
+});
+DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema);
+
+// Learn a mapping from words to Vectors.
+Word2Vec word2Vec = new Word2Vec()
+  .setInputCol(text)
+  .setOutputCol(result)
+  .setVectorSize(3)
+  .setMinCount(0);
+Word2VecModel model = word2Vec.fit(documentDF);
+DataFrame result = model.transform(documentDF);
+for (Row r: result.select(result).take(3)) {
+  System.out.println(r);
+}

spark git commit: [SPARK-7662] [SQL] Resolve correct names for generator in projection

2015-05-19 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 87fa8ccd2 - 62b4c7392


[SPARK-7662] [SQL] Resolve correct names for generator in projection

```
select explode(map(value, key)) from src;
```
Throws exception
```
org.apache.spark.sql.AnalysisException: The number of aliases supplied in the 
AS clause does not match the number of columns output by the UDTF expected 2 
aliases but got _c0 ;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:43)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGenerate$$makeGeneratorOutput(Analyzer.scala:605)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16$$anonfun$22.apply(Analyzer.scala:562)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16$$anonfun$22.apply(Analyzer.scala:548)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16.applyOrElse(Analyzer.scala:548)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16.applyOrElse(Analyzer.scala:538)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
```

Author: Cheng Hao hao.ch...@intel.com

Closes #6178 from chenghao-intel/explode and squashes the following commits:

916fbe9 [Cheng Hao] add more strict rules for TGF alias
5c3f2c5 [Cheng Hao] fix bug in unit test
e1d93ab [Cheng Hao] Add more unit test
19db09e [Cheng Hao] resolve names for generator in projection

(cherry picked from commit bcb1ff81468eb4afc7c03b2bca18e99cc1ccf6b8)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/62b4c739
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/62b4c739
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/62b4c739

Branch: refs/heads/branch-1.4
Commit: 62b4c7392ad8711b9b0f20dba95dfce2a4864de2
Parents: 87fa8cc
Author: Cheng Hao hao.ch...@intel.com
Authored: Tue May 19 15:20:46 2015 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Tue May 19 15:21:03 2015 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  | 15 
 .../sql/hive/execution/HiveQuerySuite.scala |  6 ++---
 .../sql/hive/execution/SQLQuerySuite.scala  | 25 +++-
 3 files changed, 42 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/62b4c739/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index dfa4215..c239e83 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -561,6 +561,21 @@ class Analyzer(
 /** Extracts a [[Generator]] expression and any names assigned by aliases 
to their output. */
 private object AliasedGenerator {
   def unapply(e: Expression): Option[(Generator, Seq[String])] = e match {
+case Alias(g: Generator, name)
+  if g.elementTypes.size  1  
java.util.regex.Pattern.matches(_c[0-9]+, name) = {
+  // Assume the default name given by parser is _c[0-9]+,
+  // TODO in long term, move the naming logic from Parser to Analyzer.
+  // In projection, Parser gave default name for TGF as does for 
normal UDF,
+  // but the TGF probably have multiple output columns/names.
+  //e.g. SELECT explode(map(key, value)) FROM src;
+  // Let's simply ignore the default given name for this case.
+  Some((g, Nil))
+}
+case Alias(g: Generator, name) if g.elementTypes.size  1 =
+  // If not given the default names, and the TGF with multiple output 
columns
+  failAnalysis(
+sExpect multiple names given for ${g.getClass.getName},
+   |but only single name '${name}' specified.stripMargin)
 case Alias(g: 

spark git commit: [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python

2015-05-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master c12dff9b8 - 4de74d260


[SPARK-7738] [SQL] [PySpark] add reader and writer API in Python

cc rxin, please take a quick look, I'm working on tests.

Author: Davies Liu dav...@databricks.com

Closes #6238 from davies/readwrite and squashes the following commits:

c7200eb [Davies Liu] update tests
9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
readwrite
f0c5a04 [Davies Liu] use sqlContext.read.load
5f68bc8 [Davies Liu] update tests
6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
readwrite
bcc6668 [Davies Liu] add reader amd writer API in Python


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4de74d26
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4de74d26
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4de74d26

Branch: refs/heads/master
Commit: 4de74d2602f6577c3c8458aa85377e89c19724ca
Parents: c12dff9
Author: Davies Liu dav...@databricks.com
Authored: Tue May 19 14:23:28 2015 -0700
Committer: Reynold Xin r...@databricks.com
Committed: Tue May 19 14:23:28 2015 -0700

--
 .../apache/spark/api/python/PythonUtils.scala   |  11 +-
 python/pyspark/sql/__init__.py  |   1 +
 python/pyspark/sql/context.py   |  28 +-
 python/pyspark/sql/dataframe.py |  67 ++--
 python/pyspark/sql/readwriter.py| 338 +++
 python/pyspark/sql/tests.py |  77 ++---
 6 files changed, 430 insertions(+), 92 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4de74d26/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
index efb6b93..90dacae 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
@@ -50,8 +50,15 @@ private[spark] object PythonUtils {
   /**
* Convert list of T into seq of T (for calling API with varargs)
*/
-  def toSeq[T](cols: JList[T]): Seq[T] = {
-cols.toList.toSeq
+  def toSeq[T](vs: JList[T]): Seq[T] = {
+vs.toList.toSeq
+  }
+
+  /**
+   * Convert list of T into array of T (for calling API with array)
+   */
+  def toArray[T](vs: JList[T]): Array[T] = {
+vs.toArray().asInstanceOf[Array[T]]
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/4de74d26/python/pyspark/sql/__init__.py
--
diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py
index 19805e2..634c575 100644
--- a/python/pyspark/sql/__init__.py
+++ b/python/pyspark/sql/__init__.py
@@ -58,6 +58,7 @@ from pyspark.sql.context import SQLContext, HiveContext
 from pyspark.sql.column import Column
 from pyspark.sql.dataframe import DataFrame, SchemaRDD, DataFrameNaFunctions, 
DataFrameStatFunctions
 from pyspark.sql.group import GroupedData
+from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter
 
 __all__ = [
 'SQLContext', 'HiveContext', 'DataFrame', 'GroupedData', 'Column', 'Row',

http://git-wip-us.apache.org/repos/asf/spark/blob/4de74d26/python/pyspark/sql/context.py
--
diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py
index 9f26d13..7543475 100644
--- a/python/pyspark/sql/context.py
+++ b/python/pyspark/sql/context.py
@@ -31,6 +31,7 @@ from pyspark.serializers import AutoBatchedSerializer, 
PickleSerializer
 from pyspark.sql.types import Row, StringType, StructType, _verify_type, \
 _infer_schema, _has_nulltype, _merge_type, _create_converter, 
_python_to_sql_converter
 from pyspark.sql.dataframe import DataFrame
+from pyspark.sql.readwriter import DataFrameReader
 
 try:
 import pandas
@@ -457,19 +458,7 @@ class SQLContext(object):
 
 Optionally, a schema can be provided as the schema of the returned 
DataFrame.
 
-if path is not None:
-options[path] = path
-if source is None:
-source = self.getConf(spark.sql.sources.default,
-  org.apache.spark.sql.parquet)
-if schema is None:
-df = self._ssql_ctx.load(source, options)
-else:
-if not isinstance(schema, StructType):
-raise TypeError(schema should be StructType)
-scala_datatype = self._ssql_ctx.parseDataType(schema.json())
-df = self._ssql_ctx.load(source, scala_datatype, options)
-return DataFrame(df, self)
+return 

spark git commit: [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python

2015-05-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 5643499d2 - 87fa8ccd2


[SPARK-7738] [SQL] [PySpark] add reader and writer API in Python

cc rxin, please take a quick look, I'm working on tests.

Author: Davies Liu dav...@databricks.com

Closes #6238 from davies/readwrite and squashes the following commits:

c7200eb [Davies Liu] update tests
9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
readwrite
f0c5a04 [Davies Liu] use sqlContext.read.load
5f68bc8 [Davies Liu] update tests
6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
readwrite
bcc6668 [Davies Liu] add reader amd writer API in Python

(cherry picked from commit 4de74d2602f6577c3c8458aa85377e89c19724ca)
Signed-off-by: Reynold Xin r...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87fa8ccd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87fa8ccd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87fa8ccd

Branch: refs/heads/branch-1.4
Commit: 87fa8ccd2bd245ee16bb7e3577c1afcd7dc9730d
Parents: 5643499
Author: Davies Liu dav...@databricks.com
Authored: Tue May 19 14:23:28 2015 -0700
Committer: Reynold Xin r...@databricks.com
Committed: Tue May 19 14:23:35 2015 -0700

--
 .../apache/spark/api/python/PythonUtils.scala   |  11 +-
 python/pyspark/sql/__init__.py  |   1 +
 python/pyspark/sql/context.py   |  28 +-
 python/pyspark/sql/dataframe.py |  67 ++--
 python/pyspark/sql/readwriter.py| 338 +++
 python/pyspark/sql/tests.py |  77 ++---
 6 files changed, 430 insertions(+), 92 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/87fa8ccd/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
index efb6b93..90dacae 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
@@ -50,8 +50,15 @@ private[spark] object PythonUtils {
   /**
* Convert list of T into seq of T (for calling API with varargs)
*/
-  def toSeq[T](cols: JList[T]): Seq[T] = {
-cols.toList.toSeq
+  def toSeq[T](vs: JList[T]): Seq[T] = {
+vs.toList.toSeq
+  }
+
+  /**
+   * Convert list of T into array of T (for calling API with array)
+   */
+  def toArray[T](vs: JList[T]): Array[T] = {
+vs.toArray().asInstanceOf[Array[T]]
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/87fa8ccd/python/pyspark/sql/__init__.py
--
diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py
index 19805e2..634c575 100644
--- a/python/pyspark/sql/__init__.py
+++ b/python/pyspark/sql/__init__.py
@@ -58,6 +58,7 @@ from pyspark.sql.context import SQLContext, HiveContext
 from pyspark.sql.column import Column
 from pyspark.sql.dataframe import DataFrame, SchemaRDD, DataFrameNaFunctions, 
DataFrameStatFunctions
 from pyspark.sql.group import GroupedData
+from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter
 
 __all__ = [
 'SQLContext', 'HiveContext', 'DataFrame', 'GroupedData', 'Column', 'Row',

http://git-wip-us.apache.org/repos/asf/spark/blob/87fa8ccd/python/pyspark/sql/context.py
--
diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py
index 9f26d13..7543475 100644
--- a/python/pyspark/sql/context.py
+++ b/python/pyspark/sql/context.py
@@ -31,6 +31,7 @@ from pyspark.serializers import AutoBatchedSerializer, 
PickleSerializer
 from pyspark.sql.types import Row, StringType, StructType, _verify_type, \
 _infer_schema, _has_nulltype, _merge_type, _create_converter, 
_python_to_sql_converter
 from pyspark.sql.dataframe import DataFrame
+from pyspark.sql.readwriter import DataFrameReader
 
 try:
 import pandas
@@ -457,19 +458,7 @@ class SQLContext(object):
 
 Optionally, a schema can be provided as the schema of the returned 
DataFrame.
 
-if path is not None:
-options[path] = path
-if source is None:
-source = self.getConf(spark.sql.sources.default,
-  org.apache.spark.sql.parquet)
-if schema is None:
-df = self._ssql_ctx.load(source, options)
-else:
-if not isinstance(schema, StructType):
-raise TypeError(schema should be StructType)
-scala_datatype = self._ssql_ctx.parseDataType(schema.json())
- 

spark git commit: [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 68fb2a46e - c12dff9b8


[SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with 
BLAS

JIRA: https://issues.apache.org/jira/browse/SPARK-7652

Author: Liang-Chi Hsieh vii...@gmail.com

Closes #6189 from viirya/naive_bayes_blas_prediction and squashes the following 
commits:

ab611fd [Liang-Chi Hsieh] Remove unnecessary space.
ddc48b9 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into 
naive_bayes_blas_prediction
b5772b4 [Liang-Chi Hsieh] Fix binary compatibility.
2f65186 [Liang-Chi Hsieh] Remove toDense.
1b6cdfe [Liang-Chi Hsieh] Update the implementation of naive Bayes prediction 
with BLAS.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c12dff9b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c12dff9b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c12dff9b

Branch: refs/heads/master
Commit: c12dff9b82e4869f866a9b96ce0bf05503dd7dda
Parents: 68fb2a4
Author: Liang-Chi Hsieh vii...@gmail.com
Authored: Tue May 19 13:53:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 13:53:08 2015 -0700

--
 .../spark/mllib/classification/NaiveBayes.scala | 41 
 1 file changed, 24 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c12dff9b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
index ac0ebec..53fb2cb 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
@@ -21,13 +21,11 @@ import java.lang.{Iterable = JIterable}
 
 import scala.collection.JavaConverters._
 
-import breeze.linalg.{Axis, DenseMatrix = BDM, DenseVector = BDV, argmax = 
brzArgmax, sum = brzSum}
-import breeze.numerics.{exp = brzExp, log = brzLog}
 import org.json4s.JsonDSL._
 import org.json4s.jackson.JsonMethods._
 
 import org.apache.spark.{Logging, SparkContext, SparkException}
-import org.apache.spark.mllib.linalg.{BLAS, DenseVector, SparseVector, Vector}
+import org.apache.spark.mllib.linalg.{BLAS, DenseMatrix, DenseVector, 
SparseVector, Vector, Vectors}
 import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.util.{Loader, Saveable}
 import org.apache.spark.rdd.RDD
@@ -50,6 +48,9 @@ class NaiveBayesModel private[mllib] (
 val modelType: String)
   extends ClassificationModel with Serializable with Saveable {
 
+  private val piVector = new DenseVector(pi)
+  private val thetaMatrix = new DenseMatrix(labels.size, theta(0).size, 
theta.flatten, true)
+
   private[mllib] def this(labels: Array[Double], pi: Array[Double], theta: 
Array[Array[Double]]) =
 this(labels, pi, theta, Multinomial)
 
@@ -60,17 +61,18 @@ class NaiveBayesModel private[mllib] (
   theta: JIterable[JIterable[Double]]) =
 this(labels.asScala.toArray, pi.asScala.toArray, 
theta.asScala.toArray.map(_.asScala.toArray))
 
-  private val brzPi = new BDV[Double](pi)
-  private val brzTheta = new BDM(theta(0).length, theta.length, 
theta.flatten).t
-
   // Bernoulli scoring requires log(condprob) if 1, log(1-condprob) if 0.
-  // This precomputes log(1.0 - exp(theta)) and its sum  which are used for 
the  linear algebra
+  // This precomputes log(1.0 - exp(theta)) and its sum which are used for the 
linear algebra
   // application of this condition (in predict function).
-  private val (brzNegTheta, brzNegThetaSum) = modelType match {
+  private val (thetaMinusNegTheta, negThetaSum) = modelType match {
 case Multinomial = (None, None)
 case Bernoulli =
-  val negTheta = brzLog((brzExp(brzTheta.copy) :*= (-1.0)) :+= 1.0) // 
log(1.0 - exp(x))
-  (Option(negTheta), Option(brzSum(negTheta, Axis._1)))
+  val negTheta = thetaMatrix.map(value = math.log(1.0 - math.exp(value)))
+  val ones = new DenseVector(Array.fill(thetaMatrix.numCols){1.0})
+  val thetaMinusNegTheta = thetaMatrix.map { value =
+value - math.log(1.0 - math.exp(value))
+  }
+  (Option(thetaMinusNegTheta), Option(negTheta.multiply(ones)))
 case _ =
   // This should never happen.
   throw new UnknownError(sNaiveBayesModel was created with an unknown 
ModelType: $modelType)
@@ -85,17 +87,22 @@ class NaiveBayesModel private[mllib] (
   }
 
   override def predict(testData: Vector): Double = {
-val brzData = testData.toBreeze
 modelType match {
   case Multinomial =
-labels(brzArgmax(brzPi + brzTheta * brzData))
+val 

spark git commit: [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 c3871eeb2 - 5643499d2


[SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with 
BLAS

JIRA: https://issues.apache.org/jira/browse/SPARK-7652

Author: Liang-Chi Hsieh vii...@gmail.com

Closes #6189 from viirya/naive_bayes_blas_prediction and squashes the following 
commits:

ab611fd [Liang-Chi Hsieh] Remove unnecessary space.
ddc48b9 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into 
naive_bayes_blas_prediction
b5772b4 [Liang-Chi Hsieh] Fix binary compatibility.
2f65186 [Liang-Chi Hsieh] Remove toDense.
1b6cdfe [Liang-Chi Hsieh] Update the implementation of naive Bayes prediction 
with BLAS.

(cherry picked from commit c12dff9b82e4869f866a9b96ce0bf05503dd7dda)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5643499d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5643499d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5643499d

Branch: refs/heads/branch-1.4
Commit: 5643499d220d2f8ee67f405875ce878f4b8e029d
Parents: c3871ee
Author: Liang-Chi Hsieh vii...@gmail.com
Authored: Tue May 19 13:53:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 13:53:16 2015 -0700

--
 .../spark/mllib/classification/NaiveBayes.scala | 41 
 1 file changed, 24 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5643499d/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
index ac0ebec..53fb2cb 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
@@ -21,13 +21,11 @@ import java.lang.{Iterable = JIterable}
 
 import scala.collection.JavaConverters._
 
-import breeze.linalg.{Axis, DenseMatrix = BDM, DenseVector = BDV, argmax = 
brzArgmax, sum = brzSum}
-import breeze.numerics.{exp = brzExp, log = brzLog}
 import org.json4s.JsonDSL._
 import org.json4s.jackson.JsonMethods._
 
 import org.apache.spark.{Logging, SparkContext, SparkException}
-import org.apache.spark.mllib.linalg.{BLAS, DenseVector, SparseVector, Vector}
+import org.apache.spark.mllib.linalg.{BLAS, DenseMatrix, DenseVector, 
SparseVector, Vector, Vectors}
 import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.util.{Loader, Saveable}
 import org.apache.spark.rdd.RDD
@@ -50,6 +48,9 @@ class NaiveBayesModel private[mllib] (
 val modelType: String)
   extends ClassificationModel with Serializable with Saveable {
 
+  private val piVector = new DenseVector(pi)
+  private val thetaMatrix = new DenseMatrix(labels.size, theta(0).size, 
theta.flatten, true)
+
   private[mllib] def this(labels: Array[Double], pi: Array[Double], theta: 
Array[Array[Double]]) =
 this(labels, pi, theta, Multinomial)
 
@@ -60,17 +61,18 @@ class NaiveBayesModel private[mllib] (
   theta: JIterable[JIterable[Double]]) =
 this(labels.asScala.toArray, pi.asScala.toArray, 
theta.asScala.toArray.map(_.asScala.toArray))
 
-  private val brzPi = new BDV[Double](pi)
-  private val brzTheta = new BDM(theta(0).length, theta.length, 
theta.flatten).t
-
   // Bernoulli scoring requires log(condprob) if 1, log(1-condprob) if 0.
-  // This precomputes log(1.0 - exp(theta)) and its sum  which are used for 
the  linear algebra
+  // This precomputes log(1.0 - exp(theta)) and its sum which are used for the 
linear algebra
   // application of this condition (in predict function).
-  private val (brzNegTheta, brzNegThetaSum) = modelType match {
+  private val (thetaMinusNegTheta, negThetaSum) = modelType match {
 case Multinomial = (None, None)
 case Bernoulli =
-  val negTheta = brzLog((brzExp(brzTheta.copy) :*= (-1.0)) :+= 1.0) // 
log(1.0 - exp(x))
-  (Option(negTheta), Option(brzSum(negTheta, Axis._1)))
+  val negTheta = thetaMatrix.map(value = math.log(1.0 - math.exp(value)))
+  val ones = new DenseVector(Array.fill(thetaMatrix.numCols){1.0})
+  val thetaMinusNegTheta = thetaMatrix.map { value =
+value - math.log(1.0 - math.exp(value))
+  }
+  (Option(thetaMinusNegTheta), Option(negTheta.multiply(ones)))
 case _ =
   // This should never happen.
   throw new UnknownError(sNaiveBayesModel was created with an unknown 
ModelType: $modelType)
@@ -85,17 +87,22 @@ class NaiveBayesModel private[mllib] (
   }
 
   override def predict(testData: Vector): Double = {
-val brzData = 

spark git commit: [SPARK-7047] [ML] ml.Model optional parent support

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 8567d29ef - 24cb323e7


[SPARK-7047] [ML] ml.Model optional parent support

Made Model.parent transient.  Added Model.hasParent to test for null parent

CC: mengxr

Author: Joseph K. Bradley jos...@databricks.com

Closes #5914 from jkbradley/parent-optional and squashes the following commits:

d501774 [Joseph K. Bradley] Made Model.parent transient.  Added Model.hasParent 
to test for null parent

(cherry picked from commit fb90273212dc7241c9a0c3446e25e0e0b9377750)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24cb323e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24cb323e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24cb323e

Branch: refs/heads/branch-1.4
Commit: 24cb323e767a342496cf24e0d06398b5af38ac80
Parents: 8567d29
Author: Joseph K. Bradley jos...@databricks.com
Authored: Tue May 19 10:55:21 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 10:55:32 2015 -0700

--
 mllib/src/main/scala/org/apache/spark/ml/Model.scala| 5 -
 .../spark/ml/classification/LogisticRegressionSuite.scala   | 1 +
 .../spark/ml/classification/RandomForestClassifierSuite.scala   | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/24cb323e/mllib/src/main/scala/org/apache/spark/ml/Model.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/Model.scala 
b/mllib/src/main/scala/org/apache/spark/ml/Model.scala
index 7fd5153..70e7495 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/Model.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/Model.scala
@@ -32,7 +32,7 @@ abstract class Model[M : Model[M]] extends Transformer {
* The parent estimator that produced this model.
* Note: For ensembles' component Models, this value can be null.
*/
-  var parent: Estimator[M] = _
+  @transient var parent: Estimator[M] = _
 
   /**
* Sets the parent of this model (Java API).
@@ -42,6 +42,9 @@ abstract class Model[M : Model[M]] extends Transformer {
 this.asInstanceOf[M]
   }
 
+  /** Indicates whether this [[Model]] has a corresponding parent. */
+  def hasParent: Boolean = parent != null
+
   override def copy(extra: ParamMap): M = {
 // The default implementation of Params.copy doesn't work for models.
 throw new NotImplementedError(s${this.getClass} doesn't implement 
copy(extra: ParamMap))

http://git-wip-us.apache.org/repos/asf/spark/blob/24cb323e/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
index 4376524..97f9749 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
@@ -83,6 +83,7 @@ class LogisticRegressionSuite extends FunSuite with 
MLlibTestSparkContext {
 assert(model.getRawPredictionCol === rawPrediction)
 assert(model.getProbabilityCol === probability)
 assert(model.intercept !== 0.0)
+assert(model.hasParent)
   }
 
   test(logistic regression doesn't fit intercept when fitIntercept is off) {

http://git-wip-us.apache.org/repos/asf/spark/blob/24cb323e/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
index 08f86fa..cdbbaca 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
@@ -162,5 +162,7 @@ private object RandomForestClassifierSuite {
 val oldModelAsNew = RandomForestClassificationModel.fromOld(
   oldModel, newModel.parent.asInstanceOf[RandomForestClassifier], 
categoricalFeatures)
 TreeTests.checkEqual(oldModelAsNew, newModel)
+assert(newModel.hasParent)
+
assert(!newModel.trees.head.asInstanceOf[DecisionTreeClassificationModel].hasParent)
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7678] [ML] Fix default random seed in HasSeed

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master fb9027321 - 7b16e9f21


[SPARK-7678] [ML] Fix default random seed in HasSeed

Changed shared param HasSeed to have default based on hashCode of class name, 
instead of random number.
Also, removed fixed random seeds from Word2Vec and ALS.

CC: mengxr

Author: Joseph K. Bradley jos...@databricks.com

Closes #6251 from jkbradley/scala-fixed-seed and squashes the following commits:

0e37184 [Joseph K. Bradley] Fixed Word2VecSuite, ALSSuite in spark.ml to use 
original fixed random seeds
678ec3a [Joseph K. Bradley] Removed fixed random seeds from Word2Vec and ALS. 
Changed shared param HasSeed to have default based on hashCode of class name, 
instead of random number.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b16e9f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b16e9f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b16e9f2

Branch: refs/heads/master
Commit: 7b16e9f2118fbfbb1c0ba957161fe500c9aff82a
Parents: fb90273
Author: Joseph K. Bradley jos...@databricks.com
Authored: Tue May 19 10:57:47 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 10:57:47 2015 -0700

--
 .../org/apache/spark/ml/feature/Word2Vec.scala  |  1 -
 .../spark/ml/param/shared/SharedParamsCodeGen.scala |  2 +-
 .../apache/spark/ml/param/shared/sharedParams.scala |  4 ++--
 .../org/apache/spark/ml/recommendation/ALS.scala|  2 +-
 .../org/apache/spark/ml/feature/Word2VecSuite.scala |  1 +
 .../apache/spark/ml/recommendation/ALSSuite.scala   | 16 +---
 6 files changed, 14 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7b16e9f2/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
index 8ace8c5..90f0be7 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
@@ -68,7 +68,6 @@ private[feature] trait Word2VecBase extends Params
 
   setDefault(stepSize - 0.025)
   setDefault(maxIter - 1)
-  setDefault(seed - 42L)
 
   /**
* Validate and transform the input schema.

http://git-wip-us.apache.org/repos/asf/spark/blob/7b16e9f2/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
index 5085b79..8b8cb81 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
@@ -53,7 +53,7 @@ private[shared] object SharedParamsCodeGen {
   ParamDesc[Int](checkpointInterval, checkpoint interval (= 1),
 isValid = ParamValidators.gtEq(1)),
   ParamDesc[Boolean](fitIntercept, whether to fit an intercept term, 
Some(true)),
-  ParamDesc[Long](seed, random seed, Some(Utils.random.nextLong())),
+  ParamDesc[Long](seed, random seed, 
Some(this.getClass.getName.hashCode.toLong)),
   ParamDesc[Double](elasticNetParam, the ElasticNet mixing parameter, 
in range [0, 1]. +
  For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an 
L1 penalty.,
 isValid = ParamValidators.inRange(0, 1)),

http://git-wip-us.apache.org/repos/asf/spark/blob/7b16e9f2/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
index 7525d37..3a4976d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
@@ -232,7 +232,7 @@ private[ml] trait HasFitIntercept extends Params {
 }
 
 /**
- * (private[ml]) Trait for shared param seed (default: 
Utils.random.nextLong()).
+ * (private[ml]) Trait for shared param seed (default: 
this.getClass.getName.hashCode.toLong).
  */
 private[ml] trait HasSeed extends Params {
 
@@ -242,7 +242,7 @@ private[ml] trait HasSeed extends Params {
*/
   final val seed: LongParam = new LongParam(this, seed, random seed)
 
-  setDefault(seed, Utils.random.nextLong())
+  setDefault(seed, this.getClass.getName.hashCode.toLong)
 
   /** @group getParam */
   final def getSeed: Long = $(seed)


spark git commit: [SPARK-7678] [ML] Fix default random seed in HasSeed

2015-05-19 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 24cb323e7 - cd3093e70


[SPARK-7678] [ML] Fix default random seed in HasSeed

Changed shared param HasSeed to have default based on hashCode of class name, 
instead of random number.
Also, removed fixed random seeds from Word2Vec and ALS.

CC: mengxr

Author: Joseph K. Bradley jos...@databricks.com

Closes #6251 from jkbradley/scala-fixed-seed and squashes the following commits:

0e37184 [Joseph K. Bradley] Fixed Word2VecSuite, ALSSuite in spark.ml to use 
original fixed random seeds
678ec3a [Joseph K. Bradley] Removed fixed random seeds from Word2Vec and ALS. 
Changed shared param HasSeed to have default based on hashCode of class name, 
instead of random number.

(cherry picked from commit 7b16e9f2118fbfbb1c0ba957161fe500c9aff82a)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cd3093e7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cd3093e7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cd3093e7

Branch: refs/heads/branch-1.4
Commit: cd3093e705b184df1291cd8f03331a9618993693
Parents: 24cb323
Author: Joseph K. Bradley jos...@databricks.com
Authored: Tue May 19 10:57:47 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue May 19 10:57:54 2015 -0700

--
 .../org/apache/spark/ml/feature/Word2Vec.scala  |  1 -
 .../spark/ml/param/shared/SharedParamsCodeGen.scala |  2 +-
 .../apache/spark/ml/param/shared/sharedParams.scala |  4 ++--
 .../org/apache/spark/ml/recommendation/ALS.scala|  2 +-
 .../org/apache/spark/ml/feature/Word2VecSuite.scala |  1 +
 .../apache/spark/ml/recommendation/ALSSuite.scala   | 16 +---
 6 files changed, 14 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cd3093e7/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
index 8ace8c5..90f0be7 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
@@ -68,7 +68,6 @@ private[feature] trait Word2VecBase extends Params
 
   setDefault(stepSize - 0.025)
   setDefault(maxIter - 1)
-  setDefault(seed - 42L)
 
   /**
* Validate and transform the input schema.

http://git-wip-us.apache.org/repos/asf/spark/blob/cd3093e7/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
index 5085b79..8b8cb81 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
@@ -53,7 +53,7 @@ private[shared] object SharedParamsCodeGen {
   ParamDesc[Int](checkpointInterval, checkpoint interval (= 1),
 isValid = ParamValidators.gtEq(1)),
   ParamDesc[Boolean](fitIntercept, whether to fit an intercept term, 
Some(true)),
-  ParamDesc[Long](seed, random seed, Some(Utils.random.nextLong())),
+  ParamDesc[Long](seed, random seed, 
Some(this.getClass.getName.hashCode.toLong)),
   ParamDesc[Double](elasticNetParam, the ElasticNet mixing parameter, 
in range [0, 1]. +
  For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an 
L1 penalty.,
 isValid = ParamValidators.inRange(0, 1)),

http://git-wip-us.apache.org/repos/asf/spark/blob/cd3093e7/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
index 7525d37..3a4976d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
@@ -232,7 +232,7 @@ private[ml] trait HasFitIntercept extends Params {
 }
 
 /**
- * (private[ml]) Trait for shared param seed (default: 
Utils.random.nextLong()).
+ * (private[ml]) Trait for shared param seed (default: 
this.getClass.getName.hashCode.toLong).
  */
 private[ml] trait HasSeed extends Params {
 
@@ -242,7 +242,7 @@ private[ml] trait HasSeed extends Params {
*/
   final val seed: LongParam = new LongParam(this, seed, random seed)
 
-  setDefault(seed, Utils.random.nextLong())
+  

spark git commit: [SPARK-7726] Fix Scaladoc false errors

2015-05-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/master 7b16e9f21 - 3c4c1f964


[SPARK-7726] Fix Scaladoc false errors

Visibility rules for static members are different in Scala and Java, and this 
case requires an explicit static import. Even though these are Java files, they 
are run through scaladoc, which enforces Scala rules.

Also reverted the commit that reverts the upgrade to 2.11.6

Author: Iulian Dragos jagua...@gmail.com

Closes #6260 from dragos/issue/scaladoc-false-error and squashes the following 
commits:

f2e998e [Iulian Dragos] Revert [HOTFIX] Revert [SPARK-7092] Update spark 
scala version to 2.11.6
0bad052 [Iulian Dragos] Fix scaladoc faux-error.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3c4c1f96
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3c4c1f96
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3c4c1f96

Branch: refs/heads/master
Commit: 3c4c1f96474b3e66fa1d44ac0177f548cf5a3a10
Parents: 7b16e9f
Author: Iulian Dragos jagua...@gmail.com
Authored: Tue May 19 12:14:48 2015 -0700
Committer: Patrick Wendell patr...@databricks.com
Committed: Tue May 19 12:14:48 2015 -0700

--
 .../org/apache/spark/network/shuffle/protocol/OpenBlocks.java| 3 +++
 .../apache/spark/network/shuffle/protocol/RegisterExecutor.java  | 3 +++
 .../org/apache/spark/network/shuffle/protocol/StreamHandle.java  | 3 +++
 .../org/apache/spark/network/shuffle/protocol/UploadBlock.java   | 3 +++
 pom.xml  | 4 ++--
 .../src/main/scala/org/apache/spark/repl/SparkIMain.scala| 2 +-
 6 files changed, 15 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3c4c1f96/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java
--
diff --git 
a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java
 
b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java
index 60485ba..ce954b8 100644
--- 
a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java
+++ 
b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java
@@ -24,6 +24,9 @@ import io.netty.buffer.ByteBuf;
 
 import org.apache.spark.network.protocol.Encoders;
 
+// Needed by ScalaDoc. See SPARK-7726
+import static 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type;
+
 /** Request to read a set of blocks. Returns {@link StreamHandle}. */
 public class OpenBlocks extends BlockTransferMessage {
   public final String appId;

http://git-wip-us.apache.org/repos/asf/spark/blob/3c4c1f96/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java
--
diff --git 
a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java
 
b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java
index 38acae3..cca8b17 100644
--- 
a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java
+++ 
b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java
@@ -22,6 +22,9 @@ import io.netty.buffer.ByteBuf;
 
 import org.apache.spark.network.protocol.Encoders;
 
+// Needed by ScalaDoc. See SPARK-7726
+import static 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type;
+
 /**
  * Initial registration message between an executor and its local shuffle 
server.
  * Returns nothing (empty bye array).

http://git-wip-us.apache.org/repos/asf/spark/blob/3c4c1f96/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java
--
diff --git 
a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java
 
b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java
index 9a92202..1915295 100644
--- 
a/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java
+++ 
b/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java
@@ -20,6 +20,9 @@ package org.apache.spark.network.shuffle.protocol;
 import com.google.common.base.Objects;
 import io.netty.buffer.ByteBuf;
 
+// Needed by ScalaDoc. See SPARK-7726
+import static 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type;
+
 /**
  * Identifier for a fixed number of chunks to read from a stream created by an 
open blocks
  * message. This is used by {@link 
org.apache.spark.network.shuffle.OneForOneBlockFetcher}.