spark git commit: [MINOR] [SQL] Fix sphinx warnings in PySpark SQL

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 5be517584 - 257e9d727


[MINOR] [SQL] Fix sphinx warnings in PySpark SQL

Author: MechCoder manojkumarsivaraj...@gmail.com

Closes #8171 from MechCoder/sql_sphinx.

(cherry picked from commit 52c60537a274af5414f6b0340a4bd7488ef35280)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/257e9d72
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/257e9d72
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/257e9d72

Branch: refs/heads/branch-1.5
Commit: 257e9d727874332fd192f6a993f9ea8bf464abf5
Parents: 5be5175
Author: MechCoder manojkumarsivaraj...@gmail.com
Authored: Thu Aug 20 10:05:31 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 10:05:39 2015 -0700

--
 python/pyspark/context.py   | 8 
 python/pyspark/sql/types.py | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/257e9d72/python/pyspark/context.py
--
diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index eb5b0bb..1b2a52a 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -302,10 +302,10 @@ class SparkContext(object):
 
 A unique identifier for the Spark application.
 Its format depends on the scheduler implementation.
-(i.e.
-in case of local spark app something like 'local-1433865536131'
-in case of YARN something like 'application_1433865536131_34483'
-)
+
+* in case of local spark app something like 'local-1433865536131'
+* in case of YARN something like 'application_1433865536131_34483'
+
  sc.applicationId  # doctest: +ELLIPSIS
 u'local-...'
 

http://git-wip-us.apache.org/repos/asf/spark/blob/257e9d72/python/pyspark/sql/types.py
--
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index c083bf8..ed4e5b5 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -467,9 +467,11 @@ class StructType(DataType):
 
 Construct a StructType by adding new elements to it to define the 
schema. The method accepts
 either:
+
 a) A single parameter which is a StructField object.
 b) Between 2 and 4 parameters as (name, data_type, nullable 
(optional),
- metadata(optional). The data_type parameter may be either a 
String or a DataType object
+   metadata(optional). The data_type parameter may be either a 
String or a
+   DataType object.
 
  struct1 = StructType().add(f1, StringType(), True).add(f2, 
StringType(), True, None)
  struct2 = StructType([StructField(f1, StringType(), True),\


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] [SQL] Fix sphinx warnings in PySpark SQL

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master b4f4e91c3 - 52c60537a


[MINOR] [SQL] Fix sphinx warnings in PySpark SQL

Author: MechCoder manojkumarsivaraj...@gmail.com

Closes #8171 from MechCoder/sql_sphinx.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52c60537
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52c60537
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52c60537

Branch: refs/heads/master
Commit: 52c60537a274af5414f6b0340a4bd7488ef35280
Parents: b4f4e91
Author: MechCoder manojkumarsivaraj...@gmail.com
Authored: Thu Aug 20 10:05:31 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 10:05:31 2015 -0700

--
 python/pyspark/context.py   | 8 
 python/pyspark/sql/types.py | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/52c60537/python/pyspark/context.py
--
diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index eb5b0bb..1b2a52a 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -302,10 +302,10 @@ class SparkContext(object):
 
 A unique identifier for the Spark application.
 Its format depends on the scheduler implementation.
-(i.e.
-in case of local spark app something like 'local-1433865536131'
-in case of YARN something like 'application_1433865536131_34483'
-)
+
+* in case of local spark app something like 'local-1433865536131'
+* in case of YARN something like 'application_1433865536131_34483'
+
  sc.applicationId  # doctest: +ELLIPSIS
 u'local-...'
 

http://git-wip-us.apache.org/repos/asf/spark/blob/52c60537/python/pyspark/sql/types.py
--
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index c083bf8..ed4e5b5 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -467,9 +467,11 @@ class StructType(DataType):
 
 Construct a StructType by adding new elements to it to define the 
schema. The method accepts
 either:
+
 a) A single parameter which is a StructField object.
 b) Between 2 and 4 parameters as (name, data_type, nullable 
(optional),
- metadata(optional). The data_type parameter may be either a 
String or a DataType object
+   metadata(optional). The data_type parameter may be either a 
String or a
+   DataType object.
 
  struct1 = StructType().add(f1, StringType(), True).add(f2, 
StringType(), True, None)
  struct2 = StructType([StructField(f1, StringType(), True),\


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-9982] [SPARKR] SparkR DataFrame fail to return data of Decimal type

2015-08-20 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 257e9d727 - a7027e6d3


[SPARK-9982] [SPARKR] SparkR DataFrame fail to return data of Decimal type

Author: Alex Shkurenko ashkure...@enova.com

Closes #8239 from ashkurenko/master.

(cherry picked from commit 39e91fe2fd43044cc734d55625a3c03284b69f09)
Signed-off-by: Shivaram Venkataraman shiva...@cs.berkeley.edu


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7027e6d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7027e6d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7027e6d

Branch: refs/heads/branch-1.5
Commit: a7027e6d3369a1157c53557c8215273606086d84
Parents: 257e9d7
Author: Alex Shkurenko ashkure...@enova.com
Authored: Thu Aug 20 10:16:38 2015 -0700
Committer: Shivaram Venkataraman shiva...@cs.berkeley.edu
Committed: Thu Aug 20 10:16:57 2015 -0700

--
 core/src/main/scala/org/apache/spark/api/r/SerDe.scala | 5 +
 1 file changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7027e6d/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala 
b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
index d5b4260..3c89f24 100644
--- a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
+++ b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
@@ -181,6 +181,7 @@ private[spark] object SerDe {
   // Boolean - logical
   // Float - double
   // Double - double
+  // Decimal - double
   // Long - double
   // Array[Byte] - raw
   // Date - Date
@@ -219,6 +220,10 @@ private[spark] object SerDe {
 case float | java.lang.Float =
   writeType(dos, double)
   writeDouble(dos, value.asInstanceOf[Float].toDouble)
+case decimal | java.math.BigDecimal =
+  writeType(dos, double)
+  val javaDecimal = value.asInstanceOf[java.math.BigDecimal]
+  writeDouble(dos, scala.math.BigDecimal(javaDecimal).toDouble)
 case double | java.lang.Double =
   writeType(dos, double)
   writeDouble(dos, value.asInstanceOf[Double])


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-9982] [SPARKR] SparkR DataFrame fail to return data of Decimal type

2015-08-20 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master 52c60537a - 39e91fe2f


[SPARK-9982] [SPARKR] SparkR DataFrame fail to return data of Decimal type

Author: Alex Shkurenko ashkure...@enova.com

Closes #8239 from ashkurenko/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/39e91fe2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/39e91fe2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/39e91fe2

Branch: refs/heads/master
Commit: 39e91fe2fd43044cc734d55625a3c03284b69f09
Parents: 52c6053
Author: Alex Shkurenko ashkure...@enova.com
Authored: Thu Aug 20 10:16:38 2015 -0700
Committer: Shivaram Venkataraman shiva...@cs.berkeley.edu
Committed: Thu Aug 20 10:16:38 2015 -0700

--
 core/src/main/scala/org/apache/spark/api/r/SerDe.scala | 5 +
 1 file changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/39e91fe2/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala 
b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
index d5b4260..3c89f24 100644
--- a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
+++ b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
@@ -181,6 +181,7 @@ private[spark] object SerDe {
   // Boolean - logical
   // Float - double
   // Double - double
+  // Decimal - double
   // Long - double
   // Array[Byte] - raw
   // Date - Date
@@ -219,6 +220,10 @@ private[spark] object SerDe {
 case float | java.lang.Float =
   writeType(dos, double)
   writeDouble(dos, value.asInstanceOf[Float].toDouble)
+case decimal | java.math.BigDecimal =
+  writeType(dos, double)
+  val javaDecimal = value.asInstanceOf[java.math.BigDecimal]
+  writeDouble(dos, scala.math.BigDecimal(javaDecimal).toDouble)
 case double | java.lang.Double =
   writeType(dos, double)
   writeDouble(dos, value.asInstanceOf[Double])


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10092] [SQL] Multi-DB support follow up.

2015-08-20 Thread lian
Repository: spark
Updated Branches:
  refs/heads/master b762f9920 - 43e013542


[SPARK-10092] [SQL] Multi-DB support follow up.

https://issues.apache.org/jira/browse/SPARK-10092

This pr is a follow-up one for Multi-DB support. It has the following changes:

* `HiveContext.refreshTable` now accepts `dbName.tableName`.
* `HiveContext.analyze` now accepts `dbName.tableName`.
* `CreateTableUsing`, `CreateTableUsingAsSelect`, `CreateTempTableUsing`, 
`CreateTempTableUsingAsSelect`, `CreateMetastoreDataSource`, and 
`CreateMetastoreDataSourceAsSelect` all take `TableIdentifier` instead of the 
string representation of table name.
* When you call `saveAsTable` with a specified database, the data will be saved 
to the correct location.
* Explicitly do not allow users to create a temporary with a specified database 
name (users cannot do it before).
* When we save table to metastore, we also check if db name and table name can 
be accepted by hive (using `MetaStoreUtils.validateName`).

Author: Yin Huai yh...@databricks.com

Closes #8324 from yhuai/saveAsTableDB.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43e01354
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43e01354
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43e01354

Branch: refs/heads/master
Commit: 43e0135421b2262cbb0e06aae53523f663b4f959
Parents: b762f99
Author: Yin Huai yh...@databricks.com
Authored: Thu Aug 20 15:30:31 2015 +0800
Committer: Cheng Lian l...@databricks.com
Committed: Thu Aug 20 15:30:31 2015 +0800

--
 .../spark/sql/catalyst/TableIdentifier.scala|   4 +-
 .../spark/sql/catalyst/analysis/Catalog.scala   |  63 +---
 .../org/apache/spark/sql/DataFrameWriter.scala  |   2 +-
 .../scala/org/apache/spark/sql/SQLContext.scala |  15 +-
 .../spark/sql/execution/SparkStrategies.scala   |  10 +-
 .../sql/execution/datasources/DDLParser.scala   |  32 ++--
 .../spark/sql/execution/datasources/ddl.scala   |  22 +--
 .../spark/sql/execution/datasources/rules.scala |   8 +-
 .../org/apache/spark/sql/SQLQuerySuite.scala|  35 
 .../org/apache/spark/sql/hive/HiveContext.scala |  14 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  22 ++-
 .../apache/spark/sql/hive/HiveStrategies.scala  |  12 +-
 .../spark/sql/hive/execution/commands.scala |  54 +--
 .../apache/spark/sql/hive/ListTablesSuite.scala |   6 -
 .../spark/sql/hive/MultiDatabaseSuite.scala | 158 ++-
 .../sql/hive/execution/SQLQuerySuite.scala  |  35 
 16 files changed, 398 insertions(+), 94 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/43e01354/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
index aebcdeb..d701559 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
@@ -25,7 +25,9 @@ private[sql] case class TableIdentifier(table: String, 
database: Option[String]
 
   def toSeq: Seq[String] = database.toSeq :+ table
 
-  override def toString: String = toSeq.map(` + _ + `).mkString(.)
+  override def toString: String = quotedString
+
+  def quotedString: String = toSeq.map(` + _ + `).mkString(.)
 
   def unquotedString: String = toSeq.mkString(.)
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/43e01354/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
index 5766e6a..503c4f4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
@@ -23,6 +23,7 @@ import scala.collection.JavaConversions._
 import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
 
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.{TableIdentifier, CatalystConf, EmptyConf}
 import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery}
 
@@ -55,12 +56,15 @@ trait Catalog {
 
   def refreshTable(tableIdent: TableIdentifier): Unit
 
+  // TODO: Refactor it in the work of SPARK-10104
   def registerTable(tableIdentifier: Seq[String], plan: LogicalPlan): Unit
 
+  // TODO: Refactor it in the work of SPARK-10104
   def unregisterTable(tableIdentifier: 

spark git commit: [SPARK-10092] [SQL] Backports #8324 to branch-1.5

2015-08-20 Thread lian
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 71aa54755 - 675e22494


[SPARK-10092] [SQL] Backports #8324 to branch-1.5

Author: Yin Huai yh...@databricks.com

Closes #8336 from liancheng/spark-10092/for-branch-1.5.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/675e2249
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/675e2249
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/675e2249

Branch: refs/heads/branch-1.5
Commit: 675e2249472fbadecb5c8f8da6ae8ff7a1f05305
Parents: 71aa547
Author: Yin Huai yh...@databricks.com
Authored: Thu Aug 20 18:43:24 2015 +0800
Committer: Cheng Lian l...@databricks.com
Committed: Thu Aug 20 18:43:24 2015 +0800

--
 .../spark/sql/catalyst/TableIdentifier.scala|   4 +-
 .../spark/sql/catalyst/analysis/Catalog.scala   |  63 +---
 .../org/apache/spark/sql/DataFrameWriter.scala  |   2 +-
 .../scala/org/apache/spark/sql/SQLContext.scala |  15 +-
 .../spark/sql/execution/SparkStrategies.scala   |  10 +-
 .../sql/execution/datasources/DDLParser.scala   |  32 ++--
 .../spark/sql/execution/datasources/ddl.scala   |  22 +--
 .../spark/sql/execution/datasources/rules.scala |   8 +-
 .../org/apache/spark/sql/SQLQuerySuite.scala|  35 
 .../org/apache/spark/sql/hive/HiveContext.scala |  14 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  22 ++-
 .../apache/spark/sql/hive/HiveStrategies.scala  |  12 +-
 .../spark/sql/hive/execution/commands.scala |  54 +--
 .../apache/spark/sql/hive/ListTablesSuite.scala |   6 -
 .../spark/sql/hive/MultiDatabaseSuite.scala | 158 ++-
 .../sql/hive/execution/SQLQuerySuite.scala  |  35 
 16 files changed, 398 insertions(+), 94 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/675e2249/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
index aebcdeb..d701559 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/TableIdentifier.scala
@@ -25,7 +25,9 @@ private[sql] case class TableIdentifier(table: String, 
database: Option[String]
 
   def toSeq: Seq[String] = database.toSeq :+ table
 
-  override def toString: String = toSeq.map(` + _ + `).mkString(.)
+  override def toString: String = quotedString
+
+  def quotedString: String = toSeq.map(` + _ + `).mkString(.)
 
   def unquotedString: String = toSeq.mkString(.)
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/675e2249/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
index 5766e6a..503c4f4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
@@ -23,6 +23,7 @@ import scala.collection.JavaConversions._
 import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
 
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.{TableIdentifier, CatalystConf, EmptyConf}
 import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery}
 
@@ -55,12 +56,15 @@ trait Catalog {
 
   def refreshTable(tableIdent: TableIdentifier): Unit
 
+  // TODO: Refactor it in the work of SPARK-10104
   def registerTable(tableIdentifier: Seq[String], plan: LogicalPlan): Unit
 
+  // TODO: Refactor it in the work of SPARK-10104
   def unregisterTable(tableIdentifier: Seq[String]): Unit
 
   def unregisterAllTables(): Unit
 
+  // TODO: Refactor it in the work of SPARK-10104
   protected def processTableIdentifier(tableIdentifier: Seq[String]): 
Seq[String] = {
 if (conf.caseSensitiveAnalysis) {
   tableIdentifier
@@ -69,6 +73,7 @@ trait Catalog {
 }
   }
 
+  // TODO: Refactor it in the work of SPARK-10104
   protected def getDbTableName(tableIdent: Seq[String]): String = {
 val size = tableIdent.size
 if (size = 2) {
@@ -78,9 +83,22 @@ trait Catalog {
 }
   }
 
+  // TODO: Refactor it in the work of SPARK-10104
   protected def getDBTable(tableIdent: Seq[String]) : (Option[String], String) 
= {
 (tableIdent.lift(tableIdent.size - 2), tableIdent.last)
   }
+
+  /**
+   * It is not allowed to specifiy database name for tables stored in 
[[SimpleCatalog]].
+   * We use 

Git Push Summary

2015-08-20 Thread pwendell
Repository: spark
Updated Tags:  refs/tags/v1.5.0-rc1 [created] 4c56ad772

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[1/2] spark git commit: Preparing Spark release v1.5.0-rc1

2015-08-20 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 175c1d9c9 - 988e838a2


Preparing Spark release v1.5.0-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c56ad77
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c56ad77
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c56ad77

Branch: refs/heads/branch-1.5
Commit: 4c56ad772637615cc1f4f88d619fac6c372c8552
Parents: 175c1d9
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 16:24:07 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 16:24:07 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4c56ad77/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index e9c6d26..3ef7d6f 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4c56ad77/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index ed5c37e..684e07b 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4c56ad77/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 4f79d71..6b082ad 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4c56ad77/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index e6884b0..9ef1eda 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4c56ad77/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index e05e431..fe878e6 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4c56ad77/external/flume-sink/pom.xml
--
diff --git 

Git Push Summary

2015-08-20 Thread rxin
Repository: spark
Updated Tags:  refs/tags/v1.5.0-rc1 [deleted] d837d51d5

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10140] [DOC] add target fields to @Since

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master afe9f03fd - cdd9a2bb1


[SPARK-10140] [DOC] add target fields to @Since

so constructors parameters and public fields can be annotated. rxin MechCoder

Author: Xiangrui Meng m...@databricks.com

Closes #8344 from mengxr/SPARK-10140.2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdd9a2bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdd9a2bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdd9a2bb

Branch: refs/heads/master
Commit: cdd9a2bb10e20556003843a0f7aaa33acd55f6d2
Parents: afe9f03
Author: Xiangrui Meng m...@databricks.com
Authored: Thu Aug 20 20:01:13 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 20:01:13 2015 -0700

--
 core/src/main/scala/org/apache/spark/annotation/Since.scala | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cdd9a2bb/core/src/main/scala/org/apache/spark/annotation/Since.scala
--
diff --git a/core/src/main/scala/org/apache/spark/annotation/Since.scala 
b/core/src/main/scala/org/apache/spark/annotation/Since.scala
index fa59393..af483e3 100644
--- a/core/src/main/scala/org/apache/spark/annotation/Since.scala
+++ b/core/src/main/scala/org/apache/spark/annotation/Since.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.annotation
 
 import scala.annotation.StaticAnnotation
+import scala.annotation.meta._
 
 /**
  * A Scala annotation that specifies the Spark version when a definition was 
added.
@@ -25,4 +26,5 @@ import scala.annotation.StaticAnnotation
  * hence works for overridden methods that inherit API documentation directly 
from parents.
  * The limitation is that it does not show up in the generated Java API 
documentation.
  */
+@param @field @getter @setter @beanGetter @beanSetter
 private[spark] class Since(version: String) extends StaticAnnotation


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master cdd9a2bb1 - dcfe0c5cd


[SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier

Added user guide for multilayer perceptron classifier:
  - Simplified description of the multilayer perceptron classifier
  - Example code for Scala and Java

Author: Alexander Ulanov na...@yandex.ru

Closes #8262 from avulanov/SPARK-9846-mlpc-docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dcfe0c5c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dcfe0c5c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dcfe0c5c

Branch: refs/heads/master
Commit: dcfe0c5cde953b31c5bfeb6e41d1fc9b333241eb
Parents: cdd9a2b
Author: Alexander Ulanov na...@yandex.ru
Authored: Thu Aug 20 20:02:27 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 20:02:27 2015 -0700

--
 docs/ml-ann.md   | 123 ++
 docs/ml-guide.md |   1 +
 2 files changed, 124 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dcfe0c5c/docs/ml-ann.md
--
diff --git a/docs/ml-ann.md b/docs/ml-ann.md
new file mode 100644
index 000..d5ddd92
--- /dev/null
+++ b/docs/ml-ann.md
@@ -0,0 +1,123 @@
+---
+layout: global
+title: Multilayer perceptron classifier - ML
+displayTitle: a href=ml-guide.htmlML/a - Multilayer perceptron classifier
+---
+
+
+`\[
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\x}{\mathbf{x}}
+\newcommand{\y}{\mathbf{y}}
+\newcommand{\wv}{\mathbf{w}}
+\newcommand{\av}{\mathbf{\alpha}}
+\newcommand{\bv}{\mathbf{b}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\id}{\mathbf{I}}
+\newcommand{\ind}{\mathbf{1}}
+\newcommand{\0}{\mathbf{0}}
+\newcommand{\unit}{\mathbf{e}}
+\newcommand{\one}{\mathbf{1}}
+\newcommand{\zero}{\mathbf{0}}
+\]`
+
+
+Multilayer perceptron classifier (MLPC) is a classifier based on the 
[feedforward artificial neural 
network](https://en.wikipedia.org/wiki/Feedforward_neural_network). 
+MLPC consists of multiple layers of nodes. 
+Each layer is fully connected to the next layer in the network. Nodes in the 
input layer represent the input data. All other nodes maps inputs to the 
outputs 
+by performing linear combination of the inputs with the node's weights `$\wv$` 
and bias `$\bv$` and applying an activation function. 
+It can be written in matrix form for MLPC with `$K+1$` layers as follows:
+`\[
+\mathrm{y}(\x) = \mathrm{f_K}(...\mathrm{f_2}(\wv_2^T\mathrm{f_1}(\wv_1^T 
\x+b_1)+b_2)...+b_K)
+\]`
+Nodes in intermediate layers use sigmoid (logistic) function:
+`\[
+\mathrm{f}(z_i) = \frac{1}{1 + e^{-z_i}}
+\]`
+Nodes in the output layer use softmax function:
+`\[
+\mathrm{f}(z_i) = \frac{e^{z_i}}{\sum_{k=1}^N e^{z_k}}
+\]`
+The number of nodes `$N$` in the output layer corresponds to the number of 
classes. 
+
+MLPC employes backpropagation for learning the model. We use logistic loss 
function for optimization and L-BFGS as optimization routine.
+
+**Examples**
+
+div class=codetabs
+
+div data-lang=scala markdown=1
+
+{% highlight scala %}
+import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
+import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.sql.Row
+
+// Load training data
+val data = MLUtils.loadLibSVMFile(sc, 
data/mllib/sample_multiclass_classification_data.txt).toDF()
+// Split the data into train and test
+val splits = data.randomSplit(Array(0.6, 0.4), seed = 1234L)
+val train = splits(0)
+val test = splits(1)
+// specify layers for the neural network: 
+// input layer of size 4 (features), two intermediate of size 5 and 4 and 
output of size 3 (classes)
+val layers = Array[Int](4, 5, 4, 3)
+// create the trainer and set its parameters
+val trainer = new MultilayerPerceptronClassifier()
+  .setLayers(layers)
+  .setBlockSize(128)
+  .setSeed(1234L)
+  .setMaxIter(100)
+// train the model
+val model = trainer.fit(train)
+// compute precision on the test set
+val result = model.transform(test)
+val predictionAndLabels = result.select(prediction, label)
+val evaluator = new MulticlassClassificationEvaluator()
+  .setMetricName(precision)
+println(Precision: + evaluator.evaluate(predictionAndLabels))
+{% endhighlight %}
+
+/div
+
+div data-lang=java markdown=1
+
+{% highlight java %}
+import org.apache.spark.api.java.JavaRDD;
+import 
org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel;
+import org.apache.spark.ml.classification.MultilayerPerceptronClassifier;
+import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+
+// 

spark git commit: [SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 04ef52a5b - e5e601739


[SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier

Added user guide for multilayer perceptron classifier:
  - Simplified description of the multilayer perceptron classifier
  - Example code for Scala and Java

Author: Alexander Ulanov na...@yandex.ru

Closes #8262 from avulanov/SPARK-9846-mlpc-docs.

(cherry picked from commit dcfe0c5cde953b31c5bfeb6e41d1fc9b333241eb)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e5e60173
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e5e60173
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e5e60173

Branch: refs/heads/branch-1.5
Commit: e5e601739b1ec49da95f30657bfcfc691e35d9be
Parents: 04ef52a
Author: Alexander Ulanov na...@yandex.ru
Authored: Thu Aug 20 20:02:27 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 20:02:34 2015 -0700

--
 docs/ml-ann.md   | 123 ++
 docs/ml-guide.md |   1 +
 2 files changed, 124 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e5e60173/docs/ml-ann.md
--
diff --git a/docs/ml-ann.md b/docs/ml-ann.md
new file mode 100644
index 000..d5ddd92
--- /dev/null
+++ b/docs/ml-ann.md
@@ -0,0 +1,123 @@
+---
+layout: global
+title: Multilayer perceptron classifier - ML
+displayTitle: a href=ml-guide.htmlML/a - Multilayer perceptron classifier
+---
+
+
+`\[
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\x}{\mathbf{x}}
+\newcommand{\y}{\mathbf{y}}
+\newcommand{\wv}{\mathbf{w}}
+\newcommand{\av}{\mathbf{\alpha}}
+\newcommand{\bv}{\mathbf{b}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\id}{\mathbf{I}}
+\newcommand{\ind}{\mathbf{1}}
+\newcommand{\0}{\mathbf{0}}
+\newcommand{\unit}{\mathbf{e}}
+\newcommand{\one}{\mathbf{1}}
+\newcommand{\zero}{\mathbf{0}}
+\]`
+
+
+Multilayer perceptron classifier (MLPC) is a classifier based on the 
[feedforward artificial neural 
network](https://en.wikipedia.org/wiki/Feedforward_neural_network). 
+MLPC consists of multiple layers of nodes. 
+Each layer is fully connected to the next layer in the network. Nodes in the 
input layer represent the input data. All other nodes maps inputs to the 
outputs 
+by performing linear combination of the inputs with the node's weights `$\wv$` 
and bias `$\bv$` and applying an activation function. 
+It can be written in matrix form for MLPC with `$K+1$` layers as follows:
+`\[
+\mathrm{y}(\x) = \mathrm{f_K}(...\mathrm{f_2}(\wv_2^T\mathrm{f_1}(\wv_1^T 
\x+b_1)+b_2)...+b_K)
+\]`
+Nodes in intermediate layers use sigmoid (logistic) function:
+`\[
+\mathrm{f}(z_i) = \frac{1}{1 + e^{-z_i}}
+\]`
+Nodes in the output layer use softmax function:
+`\[
+\mathrm{f}(z_i) = \frac{e^{z_i}}{\sum_{k=1}^N e^{z_k}}
+\]`
+The number of nodes `$N$` in the output layer corresponds to the number of 
classes. 
+
+MLPC employes backpropagation for learning the model. We use logistic loss 
function for optimization and L-BFGS as optimization routine.
+
+**Examples**
+
+div class=codetabs
+
+div data-lang=scala markdown=1
+
+{% highlight scala %}
+import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
+import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.sql.Row
+
+// Load training data
+val data = MLUtils.loadLibSVMFile(sc, 
data/mllib/sample_multiclass_classification_data.txt).toDF()
+// Split the data into train and test
+val splits = data.randomSplit(Array(0.6, 0.4), seed = 1234L)
+val train = splits(0)
+val test = splits(1)
+// specify layers for the neural network: 
+// input layer of size 4 (features), two intermediate of size 5 and 4 and 
output of size 3 (classes)
+val layers = Array[Int](4, 5, 4, 3)
+// create the trainer and set its parameters
+val trainer = new MultilayerPerceptronClassifier()
+  .setLayers(layers)
+  .setBlockSize(128)
+  .setSeed(1234L)
+  .setMaxIter(100)
+// train the model
+val model = trainer.fit(train)
+// compute precision on the test set
+val result = model.transform(test)
+val predictionAndLabels = result.select(prediction, label)
+val evaluator = new MulticlassClassificationEvaluator()
+  .setMetricName(precision)
+println(Precision: + evaluator.evaluate(predictionAndLabels))
+{% endhighlight %}
+
+/div
+
+div data-lang=java markdown=1
+
+{% highlight java %}
+import org.apache.spark.api.java.JavaRDD;
+import 
org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel;
+import org.apache.spark.ml.classification.MultilayerPerceptronClassifier;
+import 

spark git commit: [SPARK-10140] [DOC] add target fields to @Since

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 988e838a2 - 04ef52a5b


[SPARK-10140] [DOC] add target fields to @Since

so constructors parameters and public fields can be annotated. rxin MechCoder

Author: Xiangrui Meng m...@databricks.com

Closes #8344 from mengxr/SPARK-10140.2.

(cherry picked from commit cdd9a2bb10e20556003843a0f7aaa33acd55f6d2)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/04ef52a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/04ef52a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/04ef52a5

Branch: refs/heads/branch-1.5
Commit: 04ef52a5bcbe8fba2941af235b8cda1255d4af8d
Parents: 988e838
Author: Xiangrui Meng m...@databricks.com
Authored: Thu Aug 20 20:01:13 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 20:01:27 2015 -0700

--
 core/src/main/scala/org/apache/spark/annotation/Since.scala | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/04ef52a5/core/src/main/scala/org/apache/spark/annotation/Since.scala
--
diff --git a/core/src/main/scala/org/apache/spark/annotation/Since.scala 
b/core/src/main/scala/org/apache/spark/annotation/Since.scala
index fa59393..af483e3 100644
--- a/core/src/main/scala/org/apache/spark/annotation/Since.scala
+++ b/core/src/main/scala/org/apache/spark/annotation/Since.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.annotation
 
 import scala.annotation.StaticAnnotation
+import scala.annotation.meta._
 
 /**
  * A Scala annotation that specifies the Spark version when a definition was 
added.
@@ -25,4 +26,5 @@ import scala.annotation.StaticAnnotation
  * hence works for overridden methods that inherit API documentation directly 
from parents.
  * The limitation is that it does not show up in the generated Java API 
documentation.
  */
+@param @field @getter @setter @beanGetter @beanSetter
 private[spark] class Since(version: String) extends StaticAnnotation


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[2/2] spark git commit: Preparing development version 1.5.1-SNAPSHOT

2015-08-20 Thread pwendell
Preparing development version 1.5.1-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/988e838a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/988e838a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/988e838a

Branch: refs/heads/branch-1.5
Commit: 988e838a2fe381052f3018df4f31a55434be75ea
Parents: 4c56ad7
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 16:24:12 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 16:24:12 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/988e838a/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 3ef7d6f..7b41ebb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/988e838a/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 684e07b..16bf17c 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/988e838a/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 6b082ad..beb547f 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/988e838a/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 9ef1eda..3926b79 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.1-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/988e838a/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index fe878e6..bdd5037 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.1-SNAPSHOT/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/988e838a/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index 

[1/2] spark git commit: Preparing Spark release v1.5.0-rc1

2015-08-20 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 2f47e099d - a1785e3f5


Preparing Spark release v1.5.0-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/19b92c87
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/19b92c87
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/19b92c87

Branch: refs/heads/branch-1.5
Commit: 19b92c87a38fd3594e60e96dbf1f85e92163be36
Parents: 2f47e09
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 11:06:31 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 11:06:31 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/19b92c87/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index e9c6d26..1567a0e 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/19b92c87/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index ed5c37e..a0e68c1 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/19b92c87/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 4f79d71..b76aab4 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/19b92c87/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index e6884b0..e0afb4a 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/19b92c87/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index e05e431..a4093bb 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/19b92c87/external/flume-sink/pom.xml

[2/2] spark git commit: [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array

2015-08-20 Thread marmbrus
[SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array

I caught SPARK-10136 while adding more test cases to 
`ParquetAvroCompatibilitySuite`. Actual bug fix code lies in 
`CatalystRowConverter.scala`.

Author: Cheng Lian l...@databricks.com

Closes #8341 from liancheng/spark-10136/parquet-avro-nested-primitive-array.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/85f9a613
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/85f9a613
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/85f9a613

Branch: refs/heads/master
Commit: 85f9a61357994da5023b08b0a8a2eb09388ce7f8
Parents: 39e91fe
Author: Cheng Lian l...@databricks.com
Authored: Thu Aug 20 11:00:24 2015 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Thu Aug 20 11:00:29 2015 -0700

--
 .../parquet/CatalystReadSupport.scala   |   1 -
 .../parquet/CatalystRowConverter.scala  |  24 +-
 sql/core/src/test/avro/parquet-compat.avdl  |  19 +-
 sql/core/src/test/avro/parquet-compat.avpr  |  54 +-
 .../parquet/test/avro/AvroArrayOfArray.java | 142 
 .../parquet/test/avro/AvroMapOfArray.java   | 142 
 .../test/avro/AvroNonNullableArrays.java| 196 +
 .../test/avro/AvroOptionalPrimitives.java   | 466 +++
 .../parquet/test/avro/AvroPrimitives.java   | 461 +++
 .../parquet/test/avro/CompatibilityTest.java|   2 +-
 .../parquet/test/avro/ParquetAvroCompat.java| 821 +--
 .../parquet/ParquetAvroCompatibilitySuite.scala | 227 +++--
 .../parquet/ParquetCompatibilityTest.scala  |   7 +
 13 files changed, 1718 insertions(+), 844 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/85f9a613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
index a4679bb..3f8353a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
@@ -61,7 +61,6 @@ private[parquet] class CatalystReadSupport extends 
ReadSupport[InternalRow] with
  |
  |Parquet form:
  |$parquetRequestedSchema
- |
  |Catalyst form:
  |$catalystRequestedSchema
.stripMargin

http://git-wip-us.apache.org/repos/asf/spark/blob/85f9a613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
index 18c5b50..d2c2db5 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
@@ -25,11 +25,12 @@ import scala.collection.mutable.ArrayBuffer
 
 import org.apache.parquet.column.Dictionary
 import org.apache.parquet.io.api.{Binary, Converter, GroupConverter, 
PrimitiveConverter}
-import org.apache.parquet.schema.OriginalType.{LIST, INT_32, UTF8}
+import org.apache.parquet.schema.OriginalType.{INT_32, LIST, UTF8}
 import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.DOUBLE
 import org.apache.parquet.schema.Type.Repetition
 import org.apache.parquet.schema.{GroupType, MessageType, PrimitiveType, Type}
 
+import org.apache.spark.Logging
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
@@ -145,7 +146,16 @@ private[parquet] class CatalystRowConverter(
 parquetType: GroupType,
 catalystType: StructType,
 updater: ParentContainerUpdater)
-  extends CatalystGroupConverter(updater) {
+  extends CatalystGroupConverter(updater) with Logging {
+
+  logDebug(
+sBuilding row converter for the following schema:
+   |
+   |Parquet form:
+   |$parquetType
+   |Catalyst form:
+   |${catalystType.prettyJson}
+ .stripMargin)
 
   /**
* Updater used together with field converters within a 
[[CatalystRowConverter]].  It propagates
@@ -464,9 +474,15 @@ private[parquet] class CatalystRowConverter(
 
   override def getConverter(fieldIndex: Int): Converter = converter
 
-  

Git Push Summary

2015-08-20 Thread pwendell
Repository: spark
Updated Tags:  refs/tags/v1.5.0-rc1 [created] 19b92c87a

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke snapshot publishing for Scala 2.11

2015-08-20 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/master 85f9a6135 - 12de34833


[SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke snapshot 
publishing for Scala 2.11

The current `release-build.sh` has a typo which breaks snapshot publication for 
Scala 2.11. We should change the Scala version to 2.11 and clean before 
building a 2.11 snapshot.

Author: Josh Rosen joshro...@databricks.com

Closes #8325 from JoshRosen/fix-2.11-snapshots.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12de3483
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12de3483
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12de3483

Branch: refs/heads/master
Commit: 12de348332108f8c0c5bdad1d4cfac89b952b0f8
Parents: 85f9a61
Author: Josh Rosen joshro...@databricks.com
Authored: Thu Aug 20 11:31:03 2015 -0700
Committer: Josh Rosen joshro...@databricks.com
Committed: Thu Aug 20 11:31:03 2015 -0700

--
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/12de3483/dev/create-release/release-build.sh
--
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 399c73e..d0b3a54 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -225,9 +225,9 @@ if [[ $1 == publish-snapshot ]]; then
 
   $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests 
$PUBLISH_PROFILES \
 -Phive-thriftserver deploy
-  ./dev/change-scala-version.sh 2.10
+  ./dev/change-scala-version.sh 2.11
   $MVN -DzincPort=$ZINC_PORT -Dscala-2.11 --settings $tmp_settings \
--DskipTests $PUBLISH_PROFILES deploy
+-DskipTests $PUBLISH_PROFILES clean deploy
 
   # Clean-up Zinc nailgun process
   /usr/sbin/lsof -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | 
xargs kill


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke snapshot publishing for Scala 2.11

2015-08-20 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 a1785e3f5 - 6026f4fd7


[SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke snapshot 
publishing for Scala 2.11

The current `release-build.sh` has a typo which breaks snapshot publication for 
Scala 2.11. We should change the Scala version to 2.11 and clean before 
building a 2.11 snapshot.

Author: Josh Rosen joshro...@databricks.com

Closes #8325 from JoshRosen/fix-2.11-snapshots.

(cherry picked from commit 12de348332108f8c0c5bdad1d4cfac89b952b0f8)
Signed-off-by: Josh Rosen joshro...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6026f4fd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6026f4fd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6026f4fd

Branch: refs/heads/branch-1.5
Commit: 6026f4fd729f4c7158a87c5c706fde866d7aae60
Parents: a1785e3
Author: Josh Rosen joshro...@databricks.com
Authored: Thu Aug 20 11:31:03 2015 -0700
Committer: Josh Rosen joshro...@databricks.com
Committed: Thu Aug 20 11:31:21 2015 -0700

--
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6026f4fd/dev/create-release/release-build.sh
--
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 399c73e..d0b3a54 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -225,9 +225,9 @@ if [[ $1 == publish-snapshot ]]; then
 
   $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests 
$PUBLISH_PROFILES \
 -Phive-thriftserver deploy
-  ./dev/change-scala-version.sh 2.10
+  ./dev/change-scala-version.sh 2.11
   $MVN -DzincPort=$ZINC_PORT -Dscala-2.11 --settings $tmp_settings \
--DskipTests $PUBLISH_PROFILES deploy
+-DskipTests $PUBLISH_PROFILES clean deploy
 
   # Clean-up Zinc nailgun process
   /usr/sbin/lsof -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | 
xargs kill


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[2/2] spark git commit: [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array

2015-08-20 Thread marmbrus
[SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array

I caught SPARK-10136 while adding more test cases to 
`ParquetAvroCompatibilitySuite`. Actual bug fix code lies in 
`CatalystRowConverter.scala`.

Author: Cheng Lian l...@databricks.com

Closes #8341 from liancheng/spark-10136/parquet-avro-nested-primitive-array.

(cherry picked from commit 85f9a61357994da5023b08b0a8a2eb09388ce7f8)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f47e099
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2f47e099
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2f47e099

Branch: refs/heads/branch-1.5
Commit: 2f47e099d31275f03ad372483e1bb23a322044f5
Parents: a7027e6
Author: Cheng Lian l...@databricks.com
Authored: Thu Aug 20 11:00:24 2015 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Thu Aug 20 11:02:02 2015 -0700

--
 .../parquet/CatalystReadSupport.scala   |   1 -
 .../parquet/CatalystRowConverter.scala  |  24 +-
 sql/core/src/test/avro/parquet-compat.avdl  |  19 +-
 sql/core/src/test/avro/parquet-compat.avpr  |  54 +-
 .../parquet/test/avro/AvroArrayOfArray.java | 142 
 .../parquet/test/avro/AvroMapOfArray.java   | 142 
 .../test/avro/AvroNonNullableArrays.java| 196 +
 .../test/avro/AvroOptionalPrimitives.java   | 466 +++
 .../parquet/test/avro/AvroPrimitives.java   | 461 +++
 .../parquet/test/avro/CompatibilityTest.java|   2 +-
 .../parquet/test/avro/ParquetAvroCompat.java| 821 +--
 .../parquet/ParquetAvroCompatibilitySuite.scala | 227 +++--
 .../parquet/ParquetCompatibilityTest.scala  |   7 +
 13 files changed, 1718 insertions(+), 844 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2f47e099/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
index a4679bb..3f8353a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
@@ -61,7 +61,6 @@ private[parquet] class CatalystReadSupport extends 
ReadSupport[InternalRow] with
  |
  |Parquet form:
  |$parquetRequestedSchema
- |
  |Catalyst form:
  |$catalystRequestedSchema
.stripMargin

http://git-wip-us.apache.org/repos/asf/spark/blob/2f47e099/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
index 18c5b50..d2c2db5 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
@@ -25,11 +25,12 @@ import scala.collection.mutable.ArrayBuffer
 
 import org.apache.parquet.column.Dictionary
 import org.apache.parquet.io.api.{Binary, Converter, GroupConverter, 
PrimitiveConverter}
-import org.apache.parquet.schema.OriginalType.{LIST, INT_32, UTF8}
+import org.apache.parquet.schema.OriginalType.{INT_32, LIST, UTF8}
 import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.DOUBLE
 import org.apache.parquet.schema.Type.Repetition
 import org.apache.parquet.schema.{GroupType, MessageType, PrimitiveType, Type}
 
+import org.apache.spark.Logging
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
@@ -145,7 +146,16 @@ private[parquet] class CatalystRowConverter(
 parquetType: GroupType,
 catalystType: StructType,
 updater: ParentContainerUpdater)
-  extends CatalystGroupConverter(updater) {
+  extends CatalystGroupConverter(updater) with Logging {
+
+  logDebug(
+sBuilding row converter for the following schema:
+   |
+   |Parquet form:
+   |$parquetType
+   |Catalyst form:
+   |${catalystType.prettyJson}
+ .stripMargin)
 
   /**
* Updater used together with field converters within a 
[[CatalystRowConverter]].  It propagates
@@ -464,9 +474,15 @@ 

[1/2] spark git commit: [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array

2015-08-20 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/master 39e91fe2f - 85f9a6135


http://git-wip-us.apache.org/repos/asf/spark/blob/85f9a613/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
--
diff --git 
a/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
 
b/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
index 681cacb..ef12d19 100644
--- 
a/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
+++ 
b/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
@@ -7,22 +7,8 @@ package 
org.apache.spark.sql.execution.datasources.parquet.test.avro;
 @SuppressWarnings(all)
 @org.apache.avro.specific.AvroGenerated
 public class ParquetAvroCompat extends 
org.apache.avro.specific.SpecificRecordBase implements 
org.apache.avro.specific.SpecificRecord {
-  public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse({\type\:\record\,\name\:\ParquetAvroCompat\,\namespace\:\org.apache.spark.sql.execution.datasources.parquet.test.avro\,\fields\:[{\name\:\bool_column\,\type\:\boolean\},{\name\:\int_column\,\type\:\int\},{\name\:\long_column\,\type\:\long\},{\name\:\float_column\,\type\:\float\},{\name\:\double_column\,\type\:\double\},{\name\:\binary_column\,\type\:\bytes\},{\name\:\string_column\,\type\:{\type\:\string\,\avro.java.string\:\String\}},{\name\:\maybe_bool_column\,\type\:[\null\,\boolean\]},{\name\:\maybe_int_column\,\type\:[\null\,\int\]},{\name\:\maybe_long_column\,\type\:[\null\,\long\]},{\name\:\maybe_float_column\,\type\:[\null\,\float\]},{\name\:\maybe_double_column\,\type\:[\null\,\double\]},{\name\:\maybe_binary_column\,\type\:[\null\,\bytes\]},{\
 
name\:\maybe_string_column\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]},{\name\:\strings_column\,\type\:{\type\:\array\,\items\:{\type\:\string\,\avro.java.string\:\String\}}},{\name\:\string_to_int_column\,\type\:{\type\:\map\,\values\:\int\,\avro.java.string\:\String\}},{\name\:\complex_column\,\type\:{\type\:\map\,\values\:{\type\:\array\,\items\:{\type\:\record\,\name\:\Nested\,\fields\:[{\name\:\nested_ints_column\,\type\:{\type\:\array\,\items\:\int\}},{\name\:\nested_string_column\,\type\:{\type\:\string\,\avro.java.string\:\String\}}]}},\avro.java.string\:\String\}}]});
+  public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse({\type\:\record\,\name\:\ParquetAvroCompat\,\namespace\:\org.apache.spark.sql.execution.datasources.parquet.test.avro\,\fields\:[{\name\:\strings_column\,\type\:{\type\:\array\,\items\:{\type\:\string\,\avro.java.string\:\String\}}},{\name\:\string_to_int_column\,\type\:{\type\:\map\,\values\:\int\,\avro.java.string\:\String\}},{\name\:\complex_column\,\type\:{\type\:\map\,\values\:{\type\:\array\,\items\:{\type\:\record\,\name\:\Nested\,\fields\:[{\name\:\nested_ints_column\,\type\:{\type\:\array\,\items\:\int\}},{\name\:\nested_string_column\,\type\:{\type\:\string\,\avro.java.string\:\String\}}]}},\avro.java.string\:\String\}}]});
   public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
-  @Deprecated public boolean bool_column;
-  @Deprecated public int int_column;
-  @Deprecated public long long_column;
-  @Deprecated public float float_column;
-  @Deprecated public double double_column;
-  @Deprecated public java.nio.ByteBuffer binary_column;
-  @Deprecated public java.lang.String string_column;
-  @Deprecated public java.lang.Boolean maybe_bool_column;
-  @Deprecated public java.lang.Integer maybe_int_column;
-  @Deprecated public java.lang.Long maybe_long_column;
-  @Deprecated public java.lang.Float maybe_float_column;
-  @Deprecated public java.lang.Double maybe_double_column;
-  @Deprecated public java.nio.ByteBuffer maybe_binary_column;
-  @Deprecated public java.lang.String maybe_string_column;
   @Deprecated public java.util.Listjava.lang.String strings_column;
   @Deprecated public java.util.Mapjava.lang.String,java.lang.Integer 
string_to_int_column;
   @Deprecated public 
java.util.Mapjava.lang.String,java.util.Listorg.apache.spark.sql.execution.datasources.parquet.test.avro.Nested
 complex_column;
@@ -37,21 +23,7 @@ public class ParquetAvroCompat extends 
org.apache.avro.specific.SpecificRecordBa
   /**
* All-args constructor.
*/
-  public ParquetAvroCompat(java.lang.Boolean bool_column, java.lang.Integer 
int_column, java.lang.Long long_column, java.lang.Float float_column, 
java.lang.Double double_column, java.nio.ByteBuffer binary_column, 
java.lang.String string_column, java.lang.Boolean maybe_bool_column, 
java.lang.Integer maybe_int_column, java.lang.Long maybe_long_column, 
java.lang.Float maybe_float_column, 

[1/2] spark git commit: [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array

2015-08-20 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 a7027e6d3 - 2f47e099d


http://git-wip-us.apache.org/repos/asf/spark/blob/2f47e099/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
--
diff --git 
a/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
 
b/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
index 681cacb..ef12d19 100644
--- 
a/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
+++ 
b/sql/core/src/test/gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro/ParquetAvroCompat.java
@@ -7,22 +7,8 @@ package 
org.apache.spark.sql.execution.datasources.parquet.test.avro;
 @SuppressWarnings(all)
 @org.apache.avro.specific.AvroGenerated
 public class ParquetAvroCompat extends 
org.apache.avro.specific.SpecificRecordBase implements 
org.apache.avro.specific.SpecificRecord {
-  public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse({\type\:\record\,\name\:\ParquetAvroCompat\,\namespace\:\org.apache.spark.sql.execution.datasources.parquet.test.avro\,\fields\:[{\name\:\bool_column\,\type\:\boolean\},{\name\:\int_column\,\type\:\int\},{\name\:\long_column\,\type\:\long\},{\name\:\float_column\,\type\:\float\},{\name\:\double_column\,\type\:\double\},{\name\:\binary_column\,\type\:\bytes\},{\name\:\string_column\,\type\:{\type\:\string\,\avro.java.string\:\String\}},{\name\:\maybe_bool_column\,\type\:[\null\,\boolean\]},{\name\:\maybe_int_column\,\type\:[\null\,\int\]},{\name\:\maybe_long_column\,\type\:[\null\,\long\]},{\name\:\maybe_float_column\,\type\:[\null\,\float\]},{\name\:\maybe_double_column\,\type\:[\null\,\double\]},{\name\:\maybe_binary_column\,\type\:[\null\,\bytes\]},{\
 
name\:\maybe_string_column\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]},{\name\:\strings_column\,\type\:{\type\:\array\,\items\:{\type\:\string\,\avro.java.string\:\String\}}},{\name\:\string_to_int_column\,\type\:{\type\:\map\,\values\:\int\,\avro.java.string\:\String\}},{\name\:\complex_column\,\type\:{\type\:\map\,\values\:{\type\:\array\,\items\:{\type\:\record\,\name\:\Nested\,\fields\:[{\name\:\nested_ints_column\,\type\:{\type\:\array\,\items\:\int\}},{\name\:\nested_string_column\,\type\:{\type\:\string\,\avro.java.string\:\String\}}]}},\avro.java.string\:\String\}}]});
+  public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse({\type\:\record\,\name\:\ParquetAvroCompat\,\namespace\:\org.apache.spark.sql.execution.datasources.parquet.test.avro\,\fields\:[{\name\:\strings_column\,\type\:{\type\:\array\,\items\:{\type\:\string\,\avro.java.string\:\String\}}},{\name\:\string_to_int_column\,\type\:{\type\:\map\,\values\:\int\,\avro.java.string\:\String\}},{\name\:\complex_column\,\type\:{\type\:\map\,\values\:{\type\:\array\,\items\:{\type\:\record\,\name\:\Nested\,\fields\:[{\name\:\nested_ints_column\,\type\:{\type\:\array\,\items\:\int\}},{\name\:\nested_string_column\,\type\:{\type\:\string\,\avro.java.string\:\String\}}]}},\avro.java.string\:\String\}}]});
   public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
-  @Deprecated public boolean bool_column;
-  @Deprecated public int int_column;
-  @Deprecated public long long_column;
-  @Deprecated public float float_column;
-  @Deprecated public double double_column;
-  @Deprecated public java.nio.ByteBuffer binary_column;
-  @Deprecated public java.lang.String string_column;
-  @Deprecated public java.lang.Boolean maybe_bool_column;
-  @Deprecated public java.lang.Integer maybe_int_column;
-  @Deprecated public java.lang.Long maybe_long_column;
-  @Deprecated public java.lang.Float maybe_float_column;
-  @Deprecated public java.lang.Double maybe_double_column;
-  @Deprecated public java.nio.ByteBuffer maybe_binary_column;
-  @Deprecated public java.lang.String maybe_string_column;
   @Deprecated public java.util.Listjava.lang.String strings_column;
   @Deprecated public java.util.Mapjava.lang.String,java.lang.Integer 
string_to_int_column;
   @Deprecated public 
java.util.Mapjava.lang.String,java.util.Listorg.apache.spark.sql.execution.datasources.parquet.test.avro.Nested
 complex_column;
@@ -37,21 +23,7 @@ public class ParquetAvroCompat extends 
org.apache.avro.specific.SpecificRecordBa
   /**
* All-args constructor.
*/
-  public ParquetAvroCompat(java.lang.Boolean bool_column, java.lang.Integer 
int_column, java.lang.Long long_column, java.lang.Float float_column, 
java.lang.Double double_column, java.nio.ByteBuffer binary_column, 
java.lang.String string_column, java.lang.Boolean maybe_bool_column, 
java.lang.Integer maybe_int_column, java.lang.Long maybe_long_column, 
java.lang.Float maybe_float_column, 

spark git commit: [SQL] [MINOR] remove unnecessary class

2015-08-20 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 12de34833 - 907df2fce


[SQL] [MINOR] remove unnecessary class

This class is identical to `org.apache.spark.sql.execution.datasources.jdbc. 
DefaultSource` and is not needed.

Author: Wenchen Fan cloud0...@outlook.com

Closes #8334 from cloud-fan/minor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/907df2fc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/907df2fc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/907df2fc

Branch: refs/heads/master
Commit: 907df2fce00d2cbc9fae371344f05f800e0d2726
Parents: 12de348
Author: Wenchen Fan cloud0...@outlook.com
Authored: Thu Aug 20 13:51:54 2015 -0700
Committer: Reynold Xin r...@databricks.com
Committed: Thu Aug 20 13:51:54 2015 -0700

--
 .../execution/datasources/DefaultSource.scala   | 64 
 1 file changed, 64 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/907df2fc/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DefaultSource.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DefaultSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DefaultSource.scala
deleted file mode 100644
index 6e4cc4d..000
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DefaultSource.scala
+++ /dev/null
@@ -1,64 +0,0 @@
-/*
-* Licensed to the Apache Software Foundation (ASF) under one or more
-* contributor license agreements.  See the NOTICE file distributed with
-* this work for additional information regarding copyright ownership.
-* The ASF licenses this file to You under the Apache License, Version 2.0
-* (the License); you may not use this file except in compliance with
-* the License.  You may obtain a copy of the License at
-*
-*http://www.apache.org/licenses/LICENSE-2.0
-*
-* Unless required by applicable law or agreed to in writing, software
-* distributed under the License is distributed on an AS IS BASIS,
-* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-* See the License for the specific language governing permissions and
-* limitations under the License.
-*/
-
-package org.apache.spark.sql.execution.datasources
-
-import java.util.Properties
-
-import org.apache.spark.sql.SQLContext
-import org.apache.spark.sql.execution.datasources.jdbc.{JDBCRelation, 
JDBCPartitioningInfo, DriverRegistry}
-import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, 
RelationProvider}
-
-
-class DefaultSource extends RelationProvider with DataSourceRegister {
-
-  override def shortName(): String = jdbc
-
-  /** Returns a new base relation with the given parameters. */
-  override def createRelation(
-  sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
-val url = parameters.getOrElse(url, sys.error(Option 'url' not 
specified))
-val driver = parameters.getOrElse(driver, null)
-val table = parameters.getOrElse(dbtable, sys.error(Option 'dbtable' 
not specified))
-val partitionColumn = parameters.getOrElse(partitionColumn, null)
-val lowerBound = parameters.getOrElse(lowerBound, null)
-val upperBound = parameters.getOrElse(upperBound, null)
-val numPartitions = parameters.getOrElse(numPartitions, null)
-
-if (driver != null) DriverRegistry.register(driver)
-
-if (partitionColumn != null
-   (lowerBound == null || upperBound == null || numPartitions == null)) {
-  sys.error(Partitioning incompletely specified)
-}
-
-val partitionInfo = if (partitionColumn == null) {
-  null
-} else {
-  JDBCPartitioningInfo(
-partitionColumn,
-lowerBound.toLong,
-upperBound.toLong,
-numPartitions.toInt)
-}
-val parts = JDBCRelation.columnPartition(partitionInfo)
-val properties = new Properties() // Additional properties that we will 
pass to getConnection
-parameters.foreach(kv = properties.setProperty(kv._1, kv._2))
-JDBCRelation(url, table, parts, properties)(sqlContext)
-  }
-}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Git Push Summary

2015-08-20 Thread rxin
Repository: spark
Updated Tags:  refs/tags/v1.5.0-rc1 [deleted] 19b92c87a

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Git Push Summary

2015-08-20 Thread rxin
Repository: spark
Updated Tags:  refs/tags/v1.5.0-snapshot-20150810 [deleted] 3369ad9bc

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Git Push Summary

2015-08-20 Thread rxin
Repository: spark
Updated Tags:  refs/tags/v1.5.0-snapshot-20150811 [deleted] 158b2ea7a

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Git Push Summary

2015-08-20 Thread rxin
Repository: spark
Updated Tags:  refs/tags/v1.5.0-preview-20150812 [deleted] cedce9bdb

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation.

2015-08-20 Thread yhuai
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 675e22494 - 5be517584


[SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in 
aggregation.

This improves performance by ~ 20 - 30% in one of my local test and should fix 
the performance regression from 1.4 to 1.5 on ss_max.

Author: Reynold Xin r...@databricks.com

Closes #8332 from rxin/SPARK-10100.

(cherry picked from commit b4f4e91c395cb69ced61d9ff1492d1b814f96828)
Signed-off-by: Yin Huai yh...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5be51758
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5be51758
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5be51758

Branch: refs/heads/branch-1.5
Commit: 5be517584be0c78dc4641a4aa14ea9da05ed344d
Parents: 675e224
Author: Reynold Xin r...@databricks.com
Authored: Thu Aug 20 07:53:27 2015 -0700
Committer: Yin Huai yh...@databricks.com
Committed: Thu Aug 20 07:53:40 2015 -0700

--
 .../execution/aggregate/TungstenAggregate.scala |  2 +-
 .../aggregate/TungstenAggregationIterator.scala | 30 ++--
 2 files changed, 22 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5be51758/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
index 99f51ba..ba379d3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
@@ -104,7 +104,7 @@ case class TungstenAggregate(
 } else {
   // This is a grouped aggregate and the input iterator is empty,
   // so return an empty iterator.
-  Iterator[UnsafeRow]()
+  Iterator.empty
 }
   } else {
 aggregationIterator.start(parentIterator)

http://git-wip-us.apache.org/repos/asf/spark/blob/5be51758/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
index af7e0fc..26fdbc8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
@@ -357,18 +357,30 @@ class TungstenAggregationIterator(
   // sort-based aggregation (by calling switchToSortBasedAggregation).
   private def processInputs(): Unit = {
 assert(inputIter != null, attempted to process input when iterator was 
null)
-while (!sortBased  inputIter.hasNext) {
-  val newInput = inputIter.next()
-  numInputRows += 1
-  val groupingKey = groupProjection.apply(newInput)
+if (groupingExpressions.isEmpty) {
+  // If there is no grouping expressions, we can just reuse the same 
buffer over and over again.
+  // Note that it would be better to eliminate the hash map entirely in 
the future.
+  val groupingKey = groupProjection.apply(null)
   val buffer: UnsafeRow = 
hashMap.getAggregationBufferFromUnsafeRow(groupingKey)
-  if (buffer == null) {
-// buffer == null means that we could not allocate more memory.
-// Now, we need to spill the map and switch to sort-based aggregation.
-switchToSortBasedAggregation(groupingKey, newInput)
-  } else {
+  while (inputIter.hasNext) {
+val newInput = inputIter.next()
+numInputRows += 1
 processRow(buffer, newInput)
   }
+} else {
+  while (!sortBased  inputIter.hasNext) {
+val newInput = inputIter.next()
+numInputRows += 1
+val groupingKey = groupProjection.apply(newInput)
+val buffer: UnsafeRow = 
hashMap.getAggregationBufferFromUnsafeRow(groupingKey)
+if (buffer == null) {
+  // buffer == null means that we could not allocate more memory.
+  // Now, we need to spill the map and switch to sort-based 
aggregation.
+  switchToSortBasedAggregation(groupingKey, newInput)
+} else {
+  processRow(buffer, newInput)
+}
+  }
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: 

spark git commit: [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation.

2015-08-20 Thread yhuai
Repository: spark
Updated Branches:
  refs/heads/master 43e013542 - b4f4e91c3


[SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in 
aggregation.

This improves performance by ~ 20 - 30% in one of my local test and should fix 
the performance regression from 1.4 to 1.5 on ss_max.

Author: Reynold Xin r...@databricks.com

Closes #8332 from rxin/SPARK-10100.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b4f4e91c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b4f4e91c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b4f4e91c

Branch: refs/heads/master
Commit: b4f4e91c395cb69ced61d9ff1492d1b814f96828
Parents: 43e0135
Author: Reynold Xin r...@databricks.com
Authored: Thu Aug 20 07:53:27 2015 -0700
Committer: Yin Huai yh...@databricks.com
Committed: Thu Aug 20 07:53:27 2015 -0700

--
 .../execution/aggregate/TungstenAggregate.scala |  2 +-
 .../aggregate/TungstenAggregationIterator.scala | 30 ++--
 2 files changed, 22 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b4f4e91c/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
index 99f51ba..ba379d3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
@@ -104,7 +104,7 @@ case class TungstenAggregate(
 } else {
   // This is a grouped aggregate and the input iterator is empty,
   // so return an empty iterator.
-  Iterator[UnsafeRow]()
+  Iterator.empty
 }
   } else {
 aggregationIterator.start(parentIterator)

http://git-wip-us.apache.org/repos/asf/spark/blob/b4f4e91c/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
index af7e0fc..26fdbc8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
@@ -357,18 +357,30 @@ class TungstenAggregationIterator(
   // sort-based aggregation (by calling switchToSortBasedAggregation).
   private def processInputs(): Unit = {
 assert(inputIter != null, attempted to process input when iterator was 
null)
-while (!sortBased  inputIter.hasNext) {
-  val newInput = inputIter.next()
-  numInputRows += 1
-  val groupingKey = groupProjection.apply(newInput)
+if (groupingExpressions.isEmpty) {
+  // If there is no grouping expressions, we can just reuse the same 
buffer over and over again.
+  // Note that it would be better to eliminate the hash map entirely in 
the future.
+  val groupingKey = groupProjection.apply(null)
   val buffer: UnsafeRow = 
hashMap.getAggregationBufferFromUnsafeRow(groupingKey)
-  if (buffer == null) {
-// buffer == null means that we could not allocate more memory.
-// Now, we need to spill the map and switch to sort-based aggregation.
-switchToSortBasedAggregation(groupingKey, newInput)
-  } else {
+  while (inputIter.hasNext) {
+val newInput = inputIter.next()
+numInputRows += 1
 processRow(buffer, newInput)
   }
+} else {
+  while (!sortBased  inputIter.hasNext) {
+val newInput = inputIter.next()
+numInputRows += 1
+val groupingKey = groupProjection.apply(newInput)
+val buffer: UnsafeRow = 
hashMap.getAggregationBufferFromUnsafeRow(groupingKey)
+if (buffer == null) {
+  // buffer == null means that we could not allocate more memory.
+  // Now, we need to spill the map and switch to sort-based 
aggregation.
+  switchToSortBasedAggregation(groupingKey, newInput)
+} else {
+  processRow(buffer, newInput)
+}
+  }
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[1/2] spark git commit: Preparing Spark release v1.5.0-rc1

2015-08-20 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 6026f4fd7 - eac31abdf


Preparing Spark release v1.5.0-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99eeac8c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99eeac8c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99eeac8c

Branch: refs/heads/branch-1.5
Commit: 99eeac8cca176cfb64d5fd354a0a7c279613bbc9
Parents: 6026f4f
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 12:43:08 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 12:43:08 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/99eeac8c/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index e9c6d26..1567a0e 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/99eeac8c/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index ed5c37e..a0e68c1 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/99eeac8c/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 4f79d71..b76aab4 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/99eeac8c/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index e6884b0..e0afb4a 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/99eeac8c/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index e05e431..a4093bb 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0-rc1/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/99eeac8c/external/flume-sink/pom.xml

[2/2] spark git commit: Preparing development version 1.5.0-SNAPSHOT

2015-08-20 Thread pwendell
Preparing development version 1.5.0-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eac31abd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eac31abd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eac31abd

Branch: refs/heads/branch-1.5
Commit: eac31abdf2fe89abb3dec2fa9285f918ae682d58
Parents: 99eeac8
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 12:43:13 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 12:43:13 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eac31abd/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 1567a0e..e9c6d26 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-rc1/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/eac31abd/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index a0e68c1..ed5c37e 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-rc1/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/eac31abd/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index b76aab4..4f79d71 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-rc1/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/eac31abd/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index e0afb4a..e6884b0 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-rc1/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/eac31abd/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index a4093bb..e05e431 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-rc1/version
+version1.5.0-SNAPSHOT/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/eac31abd/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml 

Git Push Summary

2015-08-20 Thread pwendell
Repository: spark
Updated Tags:  refs/tags/v1.5.0-rc1 [created] 99eeac8cc

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10138] [ML] move setters to MultilayerPerceptronClassifier and add Java test suite

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 eac31abdf - 2e0d2a9cc


[SPARK-10138] [ML] move setters to MultilayerPerceptronClassifier and add Java 
test suite

Otherwise, setters do not return self type. jkbradley avulanov

Author: Xiangrui Meng m...@databricks.com

Closes #8342 from mengxr/SPARK-10138.

(cherry picked from commit 2a3d98aae285aba39786e9809f96de412a130f39)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e0d2a9c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e0d2a9c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e0d2a9c

Branch: refs/heads/branch-1.5
Commit: 2e0d2a9cc3cb7021e3bdd032d079cf6c8916c725
Parents: eac31ab
Author: Xiangrui Meng m...@databricks.com
Authored: Thu Aug 20 14:47:04 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 14:47:11 2015 -0700

--
 .../MultilayerPerceptronClassifier.scala| 54 +++---
 ...JavaMultilayerPerceptronClassifierSuite.java | 74 
 2 files changed, 101 insertions(+), 27 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2e0d2a9c/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
index ccca4ec..1e5b0bc 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
@@ -42,9 +42,6 @@ private[ml] trait MultilayerPerceptronParams extends 
PredictorParams
 ParamValidators.arrayLengthGt(1)
   )
 
-  /** @group setParam */
-  def setLayers(value: Array[Int]): this.type = set(layers, value)
-
   /** @group getParam */
   final def getLayers: Array[Int] = $(layers)
 
@@ -61,33 +58,9 @@ private[ml] trait MultilayerPerceptronParams extends 
PredictorParams
   it is adjusted to the size of this data. Recommended size is between 10 
and 1000,
 ParamValidators.gt(0))
 
-  /** @group setParam */
-  def setBlockSize(value: Int): this.type = set(blockSize, value)
-
   /** @group getParam */
   final def getBlockSize: Int = $(blockSize)
 
-  /**
-   * Set the maximum number of iterations.
-   * Default is 100.
-   * @group setParam
-   */
-  def setMaxIter(value: Int): this.type = set(maxIter, value)
-
-  /**
-   * Set the convergence tolerance of iterations.
-   * Smaller value will lead to higher accuracy with the cost of more 
iterations.
-   * Default is 1E-4.
-   * @group setParam
-   */
-  def setTol(value: Double): this.type = set(tol, value)
-
-  /**
-   * Set the seed for weights initialization.
-   * @group setParam
-   */
-  def setSeed(value: Long): this.type = set(seed, value)
-
   setDefault(maxIter - 100, tol - 1e-4, layers - Array(1, 1), blockSize - 
128)
 }
 
@@ -136,6 +109,33 @@ class MultilayerPerceptronClassifier(override val uid: 
String)
 
   def this() = this(Identifiable.randomUID(mlpc))
 
+  /** @group setParam */
+  def setLayers(value: Array[Int]): this.type = set(layers, value)
+
+  /** @group setParam */
+  def setBlockSize(value: Int): this.type = set(blockSize, value)
+
+  /**
+   * Set the maximum number of iterations.
+   * Default is 100.
+   * @group setParam
+   */
+  def setMaxIter(value: Int): this.type = set(maxIter, value)
+
+  /**
+   * Set the convergence tolerance of iterations.
+   * Smaller value will lead to higher accuracy with the cost of more 
iterations.
+   * Default is 1E-4.
+   * @group setParam
+   */
+  def setTol(value: Double): this.type = set(tol, value)
+
+  /**
+   * Set the seed for weights initialization.
+   * @group setParam
+   */
+  def setSeed(value: Long): this.type = set(seed, value)
+
   override def copy(extra: ParamMap): MultilayerPerceptronClassifier = 
defaultCopy(extra)
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/2e0d2a9c/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
--
diff --git 
a/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
 
b/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
new file mode 100644
index 000..ec6b4bf
--- /dev/null
+++ 
b/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or 

spark git commit: [SPARK-10108] Add since tags to mllib.feature

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 2a3d98aae - 7cfc0750e


[SPARK-10108] Add since tags to mllib.feature

Author: MechCoder manojkumarsivaraj...@gmail.com

Closes #8309 from MechCoder/tags_feature.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7cfc0750
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7cfc0750
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7cfc0750

Branch: refs/heads/master
Commit: 7cfc0750e14f2c1b3847e4720cc02150253525a9
Parents: 2a3d98a
Author: MechCoder manojkumarsivaraj...@gmail.com
Authored: Thu Aug 20 14:56:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 14:56:08 2015 -0700

--
 .../spark/mllib/feature/ChiSqSelector.scala  | 12 +---
 .../spark/mllib/feature/ElementwiseProduct.scala |  4 +++-
 .../apache/spark/mllib/feature/HashingTF.scala   | 11 ++-
 .../org/apache/spark/mllib/feature/IDF.scala |  8 +++-
 .../apache/spark/mllib/feature/Normalizer.scala  |  5 -
 .../org/apache/spark/mllib/feature/PCA.scala |  9 -
 .../spark/mllib/feature/StandardScaler.scala | 13 -
 .../spark/mllib/feature/VectorTransformer.scala  |  6 +-
 .../apache/spark/mllib/feature/Word2Vec.scala| 19 ++-
 9 files changed, 76 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7cfc0750/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
index 5f8c1de..fdd974d 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
@@ -19,7 +19,7 @@ package org.apache.spark.mllib.feature
 
 import scala.collection.mutable.ArrayBuilder
 
-import org.apache.spark.annotation.Experimental
+import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
 import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.stat.Statistics
@@ -31,8 +31,10 @@ import org.apache.spark.rdd.RDD
  *
  * @param selectedFeatures list of indices to select (filter). Must be ordered 
asc
  */
+@Since(1.3.0)
 @Experimental
-class ChiSqSelectorModel (val selectedFeatures: Array[Int]) extends 
VectorTransformer {
+class ChiSqSelectorModel (
+  @Since(1.3.0) val selectedFeatures: Array[Int]) extends VectorTransformer {
 
   require(isSorted(selectedFeatures), Array has to be sorted asc)
 
@@ -52,6 +54,7 @@ class ChiSqSelectorModel (val selectedFeatures: Array[Int]) 
extends VectorTransf
* @param vector vector to be transformed.
* @return transformed vector.
*/
+  @Since(1.3.0)
   override def transform(vector: Vector): Vector = {
 compress(vector, selectedFeatures)
   }
@@ -107,8 +110,10 @@ class ChiSqSelectorModel (val selectedFeatures: 
Array[Int]) extends VectorTransf
  * @param numTopFeatures number of features that selector will select
  *   (ordered by statistic value descending)
  */
+@Since(1.3.0)
 @Experimental
-class ChiSqSelector (val numTopFeatures: Int) extends Serializable {
+class ChiSqSelector (
+  @Since(1.3.0) val numTopFeatures: Int) extends Serializable {
 
   /**
* Returns a ChiSquared feature selector.
@@ -117,6 +122,7 @@ class ChiSqSelector (val numTopFeatures: Int) extends 
Serializable {
* Real-valued features will be treated as categorical for each 
distinct value.
* Apply feature discretizer before using this function.
*/
+  @Since(1.3.0)
   def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = {
 val indices = Statistics.chiSqTest(data)
   .zipWithIndex.sortBy { case (res, _) = -res.statistic }

http://git-wip-us.apache.org/repos/asf/spark/blob/7cfc0750/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
index d67fe6c..33e2d17 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.mllib.feature
 
-import org.apache.spark.annotation.Experimental
+import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.mllib.linalg._
 
 /**
@@ -27,6 +27,7 @@ import 

spark git commit: [SPARK-9245] [MLLIB] LDA topic assignments

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 7cfc0750e - eaafe139f


[SPARK-9245] [MLLIB] LDA topic assignments

For each (document, term) pair, return top topic.  Note that instances of (doc, 
term) pairs within a document (a.k.a. tokens) are exchangeable, so we should 
provide an estimate per document-term, rather than per token.

CC: rotationsymmetry mengxr

Author: Joseph K. Bradley jos...@databricks.com

Closes #8329 from jkbradley/lda-topic-assignments.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eaafe139
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eaafe139
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eaafe139

Branch: refs/heads/master
Commit: eaafe139f881d6105996373c9b11f2ccd91b5b3e
Parents: 7cfc075
Author: Joseph K. Bradley jos...@databricks.com
Authored: Thu Aug 20 15:01:31 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 15:01:31 2015 -0700

--
 .../spark/mllib/clustering/LDAModel.scala   | 51 ++--
 .../spark/mllib/clustering/LDAOptimizer.scala   |  2 +-
 .../spark/mllib/clustering/JavaLDASuite.java|  7 +++
 .../spark/mllib/clustering/LDASuite.scala   | 21 +++-
 4 files changed, 74 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eaafe139/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
index b70e380..6bc68a4 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.mllib.clustering
 
-import breeze.linalg.{DenseMatrix = BDM, DenseVector = BDV, argtopk, 
normalize, sum}
+import breeze.linalg.{DenseMatrix = BDM, DenseVector = BDV, argmax, argtopk, 
normalize, sum}
 import breeze.numerics.{exp, lgamma}
 import org.apache.hadoop.fs.Path
 import org.json4s.DefaultFormats
@@ -438,7 +438,7 @@ object LocalLDAModel extends Loader[LocalLDAModel] {
   Loader.checkSchema[Data](dataFrame.schema)
   val topics = dataFrame.collect()
   val vocabSize = topics(0).getAs[Vector](0).size
-  val k = topics.size
+  val k = topics.length
 
   val brzTopics = BDM.zeros[Double](vocabSize, k)
   topics.foreach { case Row(vec: Vector, ind: Int) =
@@ -610,6 +610,50 @@ class DistributedLDAModel private[clustering] (
 }
   }
 
+  /**
+   * Return the top topic for each (doc, term) pair.  I.e., for each document, 
what is the most
+   * likely topic generating each term?
+   *
+   * @return RDD of (doc ID, assignment of top topic index for each term),
+   * where the assignment is specified via a pair of zippable arrays
+   * (term indices, topic indices).  Note that terms will be omitted 
if not present in
+   * the document.
+   */
+  lazy val topicAssignments: RDD[(Long, Array[Int], Array[Int])] = {
+// For reference, compare the below code with the core part of 
EMLDAOptimizer.next().
+val eta = topicConcentration
+val W = vocabSize
+val alpha = docConcentration(0)
+val N_k = globalTopicTotals
+val sendMsg: EdgeContext[TopicCounts, TokenCount, (Array[Int], 
Array[Int])] = Unit =
+  (edgeContext) = {
+// E-STEP: Compute gamma_{wjk} (smoothed topic distributions).
+val scaledTopicDistribution: TopicCounts =
+  computePTopic(edgeContext.srcAttr, edgeContext.dstAttr, N_k, W, eta, 
alpha)
+// For this (doc j, term w), send top topic k to doc vertex.
+val topTopic: Int = argmax(scaledTopicDistribution)
+val term: Int = index2term(edgeContext.dstId)
+edgeContext.sendToSrc((Array(term), Array(topTopic)))
+  }
+val mergeMsg: ((Array[Int], Array[Int]), (Array[Int], Array[Int])) = 
(Array[Int], Array[Int]) =
+  (terms_topics0, terms_topics1) = {
+(terms_topics0._1 ++ terms_topics1._1, terms_topics0._2 ++ 
terms_topics1._2)
+  }
+// M-STEP: Aggregation computes new N_{kj}, N_{wk} counts.
+val perDocAssignments =
+  graph.aggregateMessages[(Array[Int], Array[Int])](sendMsg, 
mergeMsg).filter(isDocumentVertex)
+perDocAssignments.map { case (docID: Long, (terms: Array[Int], topics: 
Array[Int])) =
+  // TODO: Avoid zip, which is inefficient.
+  val (sortedTerms, sortedTopics) = terms.zip(topics).sortBy(_._1).unzip
+  (docID, sortedTerms.toArray, sortedTopics.toArray)
+}
+  }
+
+  /** Java-friendly version of [[topicAssignments]] */
+  lazy val javaTopicAssignments: JavaRDD[(java.lang.Long, Array[Int], 
Array[Int])] = 

spark git commit: [SPARK-9400] [SQL] codegen for StringLocate

2015-08-20 Thread davies
Repository: spark
Updated Branches:
  refs/heads/master eaafe139f - afe9f03fd


[SPARK-9400] [SQL] codegen for StringLocate

This is based on #7779 , thanks to tarekauel . Fix the conflict and nullability.

Closes #7779 and #8274 .

Author: Tarek Auel tarek.a...@googlemail.com
Author: Davies Liu dav...@databricks.com

Closes #8330 from davies/stringLocate.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/afe9f03f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/afe9f03f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/afe9f03f

Branch: refs/heads/master
Commit: afe9f03fd964d1e8604d02feee8d6970efbe6009
Parents: eaafe13
Author: Tarek Auel tarek.a...@googlemail.com
Authored: Thu Aug 20 15:10:13 2015 -0700
Committer: Davies Liu davies@gmail.com
Committed: Thu Aug 20 15:10:13 2015 -0700

--
 .../expressions/stringExpressions.scala | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/afe9f03f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 3c23f2e..b60d318 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -409,13 +409,14 @@ case class SubstringIndex(strExpr: Expression, delimExpr: 
Expression, countExpr:
  * in given string after position pos.
  */
 case class StringLocate(substr: Expression, str: Expression, start: Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with CodegenFallback {
+  extends TernaryExpression with ImplicitCastInputTypes {
 
   def this(substr: Expression, str: Expression) = {
 this(substr, str, Literal(0))
   }
 
   override def children: Seq[Expression] = substr :: str :: start :: Nil
+  override def nullable: Boolean = substr.nullable || str.nullable
   override def dataType: DataType = IntegerType
   override def inputTypes: Seq[DataType] = Seq(StringType, StringType, 
IntegerType)
 
@@ -441,6 +442,31 @@ case class StringLocate(substr: Expression, str: 
Expression, start: Expression)
 }
   }
 
+  override protected def genCode(ctx: CodeGenContext, ev: 
GeneratedExpressionCode): String = {
+val substrGen = substr.gen(ctx)
+val strGen = str.gen(ctx)
+val startGen = start.gen(ctx)
+s
+  int ${ev.primitive} = 0;
+  boolean ${ev.isNull} = false;
+  ${startGen.code}
+  if (!${startGen.isNull}) {
+${substrGen.code}
+if (!${substrGen.isNull}) {
+  ${strGen.code}
+  if (!${strGen.isNull}) {
+${ev.primitive} = 
${strGen.primitive}.indexOf(${substrGen.primitive},
+  ${startGen.primitive}) + 1;
+  } else {
+${ev.isNull} = true;
+  }
+} else {
+  ${ev.isNull} = true;
+}
+  }
+ 
+  }
+
   override def prettyName: String = locate
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10138] [ML] move setters to MultilayerPerceptronClassifier and add Java test suite

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 907df2fce - 2a3d98aae


[SPARK-10138] [ML] move setters to MultilayerPerceptronClassifier and add Java 
test suite

Otherwise, setters do not return self type. jkbradley avulanov

Author: Xiangrui Meng m...@databricks.com

Closes #8342 from mengxr/SPARK-10138.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2a3d98aa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2a3d98aa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2a3d98aa

Branch: refs/heads/master
Commit: 2a3d98aae285aba39786e9809f96de412a130f39
Parents: 907df2f
Author: Xiangrui Meng m...@databricks.com
Authored: Thu Aug 20 14:47:04 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 14:47:04 2015 -0700

--
 .../MultilayerPerceptronClassifier.scala| 54 +++---
 ...JavaMultilayerPerceptronClassifierSuite.java | 74 
 2 files changed, 101 insertions(+), 27 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2a3d98aa/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
index ccca4ec..1e5b0bc 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
@@ -42,9 +42,6 @@ private[ml] trait MultilayerPerceptronParams extends 
PredictorParams
 ParamValidators.arrayLengthGt(1)
   )
 
-  /** @group setParam */
-  def setLayers(value: Array[Int]): this.type = set(layers, value)
-
   /** @group getParam */
   final def getLayers: Array[Int] = $(layers)
 
@@ -61,33 +58,9 @@ private[ml] trait MultilayerPerceptronParams extends 
PredictorParams
   it is adjusted to the size of this data. Recommended size is between 10 
and 1000,
 ParamValidators.gt(0))
 
-  /** @group setParam */
-  def setBlockSize(value: Int): this.type = set(blockSize, value)
-
   /** @group getParam */
   final def getBlockSize: Int = $(blockSize)
 
-  /**
-   * Set the maximum number of iterations.
-   * Default is 100.
-   * @group setParam
-   */
-  def setMaxIter(value: Int): this.type = set(maxIter, value)
-
-  /**
-   * Set the convergence tolerance of iterations.
-   * Smaller value will lead to higher accuracy with the cost of more 
iterations.
-   * Default is 1E-4.
-   * @group setParam
-   */
-  def setTol(value: Double): this.type = set(tol, value)
-
-  /**
-   * Set the seed for weights initialization.
-   * @group setParam
-   */
-  def setSeed(value: Long): this.type = set(seed, value)
-
   setDefault(maxIter - 100, tol - 1e-4, layers - Array(1, 1), blockSize - 
128)
 }
 
@@ -136,6 +109,33 @@ class MultilayerPerceptronClassifier(override val uid: 
String)
 
   def this() = this(Identifiable.randomUID(mlpc))
 
+  /** @group setParam */
+  def setLayers(value: Array[Int]): this.type = set(layers, value)
+
+  /** @group setParam */
+  def setBlockSize(value: Int): this.type = set(blockSize, value)
+
+  /**
+   * Set the maximum number of iterations.
+   * Default is 100.
+   * @group setParam
+   */
+  def setMaxIter(value: Int): this.type = set(maxIter, value)
+
+  /**
+   * Set the convergence tolerance of iterations.
+   * Smaller value will lead to higher accuracy with the cost of more 
iterations.
+   * Default is 1E-4.
+   * @group setParam
+   */
+  def setTol(value: Double): this.type = set(tol, value)
+
+  /**
+   * Set the seed for weights initialization.
+   * @group setParam
+   */
+  def setSeed(value: Long): this.type = set(seed, value)
+
   override def copy(extra: ParamMap): MultilayerPerceptronClassifier = 
defaultCopy(extra)
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/2a3d98aa/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
--
diff --git 
a/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
 
b/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
new file mode 100644
index 000..ec6b4bf
--- /dev/null
+++ 
b/mllib/src/test/java/org/apache/spark/ml/classification/JavaMultilayerPerceptronClassifierSuite.java
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding 

spark git commit: [SPARK-10108] Add since tags to mllib.feature

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 2e0d2a9cc - 560ec1268


[SPARK-10108] Add since tags to mllib.feature

Author: MechCoder manojkumarsivaraj...@gmail.com

Closes #8309 from MechCoder/tags_feature.

(cherry picked from commit 7cfc0750e14f2c1b3847e4720cc02150253525a9)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/560ec126
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/560ec126
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/560ec126

Branch: refs/heads/branch-1.5
Commit: 560ec1268b824acc01d347a3fbc78ac16216a9b0
Parents: 2e0d2a9
Author: MechCoder manojkumarsivaraj...@gmail.com
Authored: Thu Aug 20 14:56:08 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 14:59:55 2015 -0700

--
 .../spark/mllib/feature/ChiSqSelector.scala  | 12 +---
 .../spark/mllib/feature/ElementwiseProduct.scala |  4 +++-
 .../apache/spark/mllib/feature/HashingTF.scala   | 11 ++-
 .../org/apache/spark/mllib/feature/IDF.scala |  8 +++-
 .../apache/spark/mllib/feature/Normalizer.scala  |  5 -
 .../org/apache/spark/mllib/feature/PCA.scala |  9 -
 .../spark/mllib/feature/StandardScaler.scala | 13 -
 .../spark/mllib/feature/VectorTransformer.scala  |  6 +-
 .../apache/spark/mllib/feature/Word2Vec.scala| 19 ++-
 9 files changed, 76 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/560ec126/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
index 5f8c1de..fdd974d 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
@@ -19,7 +19,7 @@ package org.apache.spark.mllib.feature
 
 import scala.collection.mutable.ArrayBuilder
 
-import org.apache.spark.annotation.Experimental
+import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
 import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.stat.Statistics
@@ -31,8 +31,10 @@ import org.apache.spark.rdd.RDD
  *
  * @param selectedFeatures list of indices to select (filter). Must be ordered 
asc
  */
+@Since(1.3.0)
 @Experimental
-class ChiSqSelectorModel (val selectedFeatures: Array[Int]) extends 
VectorTransformer {
+class ChiSqSelectorModel (
+  @Since(1.3.0) val selectedFeatures: Array[Int]) extends VectorTransformer {
 
   require(isSorted(selectedFeatures), Array has to be sorted asc)
 
@@ -52,6 +54,7 @@ class ChiSqSelectorModel (val selectedFeatures: Array[Int]) 
extends VectorTransf
* @param vector vector to be transformed.
* @return transformed vector.
*/
+  @Since(1.3.0)
   override def transform(vector: Vector): Vector = {
 compress(vector, selectedFeatures)
   }
@@ -107,8 +110,10 @@ class ChiSqSelectorModel (val selectedFeatures: 
Array[Int]) extends VectorTransf
  * @param numTopFeatures number of features that selector will select
  *   (ordered by statistic value descending)
  */
+@Since(1.3.0)
 @Experimental
-class ChiSqSelector (val numTopFeatures: Int) extends Serializable {
+class ChiSqSelector (
+  @Since(1.3.0) val numTopFeatures: Int) extends Serializable {
 
   /**
* Returns a ChiSquared feature selector.
@@ -117,6 +122,7 @@ class ChiSqSelector (val numTopFeatures: Int) extends 
Serializable {
* Real-valued features will be treated as categorical for each 
distinct value.
* Apply feature discretizer before using this function.
*/
+  @Since(1.3.0)
   def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = {
 val indices = Statistics.chiSqTest(data)
   .zipWithIndex.sortBy { case (res, _) = -res.statistic }

http://git-wip-us.apache.org/repos/asf/spark/blob/560ec126/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
index d67fe6c..33e2d17 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/ElementwiseProduct.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.mllib.feature
 
-import org.apache.spark.annotation.Experimental
+import 

spark git commit: [SPARK-9245] [MLLIB] LDA topic assignments

2015-08-20 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 560ec1268 - 2beea65bf


[SPARK-9245] [MLLIB] LDA topic assignments

For each (document, term) pair, return top topic.  Note that instances of (doc, 
term) pairs within a document (a.k.a. tokens) are exchangeable, so we should 
provide an estimate per document-term, rather than per token.

CC: rotationsymmetry mengxr

Author: Joseph K. Bradley jos...@databricks.com

Closes #8329 from jkbradley/lda-topic-assignments.

(cherry picked from commit eaafe139f881d6105996373c9b11f2ccd91b5b3e)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2beea65b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2beea65b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2beea65b

Branch: refs/heads/branch-1.5
Commit: 2beea65bfbbf4a94ad6b7ca5e4c24f59089f6099
Parents: 560ec12
Author: Joseph K. Bradley jos...@databricks.com
Authored: Thu Aug 20 15:01:31 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Thu Aug 20 15:01:37 2015 -0700

--
 .../spark/mllib/clustering/LDAModel.scala   | 51 ++--
 .../spark/mllib/clustering/LDAOptimizer.scala   |  2 +-
 .../spark/mllib/clustering/JavaLDASuite.java|  7 +++
 .../spark/mllib/clustering/LDASuite.scala   | 21 +++-
 4 files changed, 74 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2beea65b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
index b70e380..6bc68a4 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.mllib.clustering
 
-import breeze.linalg.{DenseMatrix = BDM, DenseVector = BDV, argtopk, 
normalize, sum}
+import breeze.linalg.{DenseMatrix = BDM, DenseVector = BDV, argmax, argtopk, 
normalize, sum}
 import breeze.numerics.{exp, lgamma}
 import org.apache.hadoop.fs.Path
 import org.json4s.DefaultFormats
@@ -438,7 +438,7 @@ object LocalLDAModel extends Loader[LocalLDAModel] {
   Loader.checkSchema[Data](dataFrame.schema)
   val topics = dataFrame.collect()
   val vocabSize = topics(0).getAs[Vector](0).size
-  val k = topics.size
+  val k = topics.length
 
   val brzTopics = BDM.zeros[Double](vocabSize, k)
   topics.foreach { case Row(vec: Vector, ind: Int) =
@@ -610,6 +610,50 @@ class DistributedLDAModel private[clustering] (
 }
   }
 
+  /**
+   * Return the top topic for each (doc, term) pair.  I.e., for each document, 
what is the most
+   * likely topic generating each term?
+   *
+   * @return RDD of (doc ID, assignment of top topic index for each term),
+   * where the assignment is specified via a pair of zippable arrays
+   * (term indices, topic indices).  Note that terms will be omitted 
if not present in
+   * the document.
+   */
+  lazy val topicAssignments: RDD[(Long, Array[Int], Array[Int])] = {
+// For reference, compare the below code with the core part of 
EMLDAOptimizer.next().
+val eta = topicConcentration
+val W = vocabSize
+val alpha = docConcentration(0)
+val N_k = globalTopicTotals
+val sendMsg: EdgeContext[TopicCounts, TokenCount, (Array[Int], 
Array[Int])] = Unit =
+  (edgeContext) = {
+// E-STEP: Compute gamma_{wjk} (smoothed topic distributions).
+val scaledTopicDistribution: TopicCounts =
+  computePTopic(edgeContext.srcAttr, edgeContext.dstAttr, N_k, W, eta, 
alpha)
+// For this (doc j, term w), send top topic k to doc vertex.
+val topTopic: Int = argmax(scaledTopicDistribution)
+val term: Int = index2term(edgeContext.dstId)
+edgeContext.sendToSrc((Array(term), Array(topTopic)))
+  }
+val mergeMsg: ((Array[Int], Array[Int]), (Array[Int], Array[Int])) = 
(Array[Int], Array[Int]) =
+  (terms_topics0, terms_topics1) = {
+(terms_topics0._1 ++ terms_topics1._1, terms_topics0._2 ++ 
terms_topics1._2)
+  }
+// M-STEP: Aggregation computes new N_{kj}, N_{wk} counts.
+val perDocAssignments =
+  graph.aggregateMessages[(Array[Int], Array[Int])](sendMsg, 
mergeMsg).filter(isDocumentVertex)
+perDocAssignments.map { case (docID: Long, (terms: Array[Int], topics: 
Array[Int])) =
+  // TODO: Avoid zip, which is inefficient.
+  val (sortedTerms, sortedTopics) = terms.zip(topics).sortBy(_._1).unzip
+  (docID, sortedTerms.toArray, sortedTopics.toArray)
+}
+  }
+
+  /** 

Git Push Summary

2015-08-20 Thread rxin
Repository: spark
Updated Tags:  refs/tags/v1.5.0-rc1 [deleted] 99eeac8cc

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[2/2] spark git commit: Preparing development version 1.5.0-SNAPSHOT

2015-08-20 Thread pwendell
Preparing development version 1.5.0-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/175c1d9c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/175c1d9c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/175c1d9c

Branch: refs/heads/branch-1.5
Commit: 175c1d9c90d47a469568e14b4b90a440b8d9e95c
Parents: d837d51
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 15:33:10 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 15:33:10 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/175c1d9c/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 3ef7d6f..e9c6d26 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/175c1d9c/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 684e07b..ed5c37e 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/175c1d9c/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 6b082ad..4f79d71 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/175c1d9c/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 9ef1eda..e6884b0 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.0-SNAPSHOT/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/175c1d9c/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index fe878e6..e05e431 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0/version
+version1.5.0-SNAPSHOT/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/175c1d9c/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index 

[1/2] spark git commit: Preparing Spark release v1.5.0-rc1

2015-08-20 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 2beea65bf - 175c1d9c9


Preparing Spark release v1.5.0-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d837d51d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d837d51d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d837d51d

Branch: refs/heads/branch-1.5
Commit: d837d51d54510310f7bae05cd331c0c23946404c
Parents: 2beea65
Author: Patrick Wendell pwend...@gmail.com
Authored: Thu Aug 20 15:33:04 2015 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Thu Aug 20 15:33:04 2015 -0700

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 33 files changed, 33 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d837d51d/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index e9c6d26..3ef7d6f 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d837d51d/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index ed5c37e..684e07b 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d837d51d/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 4f79d71..6b082ad 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d837d51d/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index e6884b0..9ef1eda 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d837d51d/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index e05e431..fe878e6 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   parent
 groupIdorg.apache.spark/groupId
 artifactIdspark-parent_2.10/artifactId
-version1.5.0-SNAPSHOT/version
+version1.5.0/version
 relativePath../../pom.xml/relativePath
   /parent
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d837d51d/external/flume-sink/pom.xml
--
diff --git