spark git commit: [SPARK-7975] Add style checker to disallow overriding equals covariantly.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 8764dcceb -> 7896e99b2


[SPARK-7975] Add style checker to disallow overriding equals covariantly.

Author: Reynold Xin 

This patch had conflicts when merged, resolved by
Committer: Reynold Xin 

Closes #6527 from rxin/covariant-equals and squashes the following commits:

e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7896e99b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7896e99b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7896e99b

Branch: refs/heads/master
Commit: 7896e99b2a0a160bd0b6c5c11cf40b6cbf4a65cf
Parents: 8764dcc
Author: Reynold Xin 
Authored: Sun May 31 00:05:55 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:05:55 2015 -0700

--
 scalastyle-config.xml  | 2 +-
 .../src/main/scala/org/apache/spark/sql/parquet/newParquet.scala   | 2 +-
 .../scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7896e99b/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 072c480..3a98422 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -97,7 +97,7 @@
  
  
   
- 
+  
  
  
  

http://git-wip-us.apache.org/repos/asf/spark/blob/7896e99b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
index 8b3e1b2..e439a18 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
@@ -155,7 +155,7 @@ private[sql] class ParquetRelation2(
 meta
   }
 
-  override def equals(other: scala.Any): Boolean = other match {
+  override def equals(other: Any): Boolean = other match {
 case that: ParquetRelation2 =>
   val schemaEquality = if (shouldMergeSchemas) {
 this.shouldMergeSchemas == that.shouldMergeSchemas

http://git-wip-us.apache.org/repos/asf/spark/blob/7896e99b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 47b8573..ca1f49b 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -596,7 +596,7 @@ private[hive] case class MetastoreRelation
 
   self: Product =>
 
-  override def equals(other: scala.Any): Boolean = other match {
+  override def equals(other: Any): Boolean = other match {
 case relation: MetastoreRelation =>
   databaseName == relation.databaseName &&
 tableName == relation.tableName &&


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7975] Add style checker to disallow overriding equals covariantly.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 0d093d6e7 -> 2016927f7


[SPARK-7975] Add style checker to disallow overriding equals covariantly.

Author: Reynold Xin 

This patch had conflicts when merged, resolved by
Committer: Reynold Xin 

Closes #6527 from rxin/covariant-equals and squashes the following commits:

e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker

(cherry picked from commit 7896e99b2a0a160bd0b6c5c11cf40b6cbf4a65cf)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2016927f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2016927f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2016927f

Branch: refs/heads/branch-1.4
Commit: 2016927f70fdcbd33a7863fa6c2542f159ad43aa
Parents: 0d093d6
Author: Reynold Xin 
Authored: Sun May 31 00:05:55 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:06:02 2015 -0700

--
 scalastyle-config.xml  | 2 +-
 .../src/main/scala/org/apache/spark/sql/parquet/newParquet.scala   | 2 +-
 .../scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2016927f/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 6e27035..5498947 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -97,7 +97,7 @@
  
  
   
- 
+  
  
  
  

http://git-wip-us.apache.org/repos/asf/spark/blob/2016927f/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
index 8b3e1b2..e439a18 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
@@ -155,7 +155,7 @@ private[sql] class ParquetRelation2(
 meta
   }
 
-  override def equals(other: scala.Any): Boolean = other match {
+  override def equals(other: Any): Boolean = other match {
 case that: ParquetRelation2 =>
   val schemaEquality = if (shouldMergeSchemas) {
 this.shouldMergeSchemas == that.shouldMergeSchemas

http://git-wip-us.apache.org/repos/asf/spark/blob/2016927f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 47b8573..ca1f49b 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -596,7 +596,7 @@ private[hive] case class MetastoreRelation
 
   self: Product =>
 
-  override def equals(other: scala.Any): Boolean = other match {
+  override def equals(other: Any): Boolean = other match {
 case relation: MetastoreRelation =>
   databaseName == relation.databaseName &&
 tableName == relation.tableName &&


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-3850] Trim trailing spaces for core.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 7896e99b2 -> 74fdc97c7


[SPARK-3850] Trim trailing spaces for core.

Author: Reynold Xin 

Closes #6533 from rxin/whitespace-2 and squashes the following commits:

038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/74fdc97c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/74fdc97c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/74fdc97c

Branch: refs/heads/master
Commit: 74fdc97c7206c6d715f128ef7c46055e0bb90760
Parents: 7896e99
Author: Reynold Xin 
Authored: Sun May 31 00:16:22 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:16:22 2015 -0700

--
 .../scala/org/apache/spark/Aggregator.scala |  4 +--
 .../scala/org/apache/spark/FutureAction.scala   |  2 +-
 .../org/apache/spark/HeartbeatReceiver.scala| 20 +++---
 .../scala/org/apache/spark/HttpFileServer.scala |  4 +--
 .../main/scala/org/apache/spark/SparkConf.scala | 12 -
 .../main/scala/org/apache/spark/TestUtils.scala |  2 +-
 .../apache/spark/api/java/JavaDoubleRDD.scala   |  2 +-
 .../org/apache/spark/api/java/JavaRDD.scala |  6 ++---
 .../org/apache/spark/api/python/PythonRDD.scala |  4 +--
 .../scala/org/apache/spark/api/r/RBackend.scala |  4 +--
 .../apache/spark/api/r/RBackendHandler.scala|  2 +-
 .../org/apache/spark/deploy/SparkSubmit.scala   |  2 +-
 .../deploy/history/HistoryServerArguments.scala |  2 +-
 .../master/ZooKeeperPersistenceEngine.scala |  2 +-
 .../org/apache/spark/executor/TaskMetrics.scala | 16 +--
 .../apache/spark/metrics/sink/Slf4jSink.scala   |  4 +--
 .../org/apache/spark/metrics/sink/package.scala |  2 +-
 .../org/apache/spark/rdd/AsyncRDDActions.scala  |  4 +--
 .../org/apache/spark/rdd/NewHadoopRDD.scala |  2 +-
 .../org/apache/spark/rdd/PairRDDFunctions.scala |  2 +-
 .../spark/scheduler/ReplayListenerBus.scala |  4 +--
 .../scala/org/apache/spark/scheduler/Task.scala |  2 +-
 .../apache/spark/scheduler/TaskSetManager.scala |  4 +--
 .../cluster/CoarseGrainedSchedulerBackend.scala |  2 +-
 .../mesos/MesosSchedulerBackendUtil.scala   |  4 +--
 .../spark/serializer/KryoSerializer.scala   |  2 +-
 .../shuffle/hash/BlockStoreShuffleFetcher.scala |  2 +-
 .../spark/status/api/v1/OneStageResource.scala  |  2 +-
 .../storage/BlockManagerMasterEndpoint.scala|  8 +++---
 .../apache/spark/storage/DiskBlockManager.scala |  2 +-
 .../spark/storage/TachyonBlockManager.scala |  4 +--
 .../main/scala/org/apache/spark/ui/WebUI.scala  |  4 +--
 .../spark/ui/jobs/JobProgressListener.scala |  2 +-
 .../spark/util/AsynchronousListenerBus.scala|  2 +-
 .../org/apache/spark/util/SizeEstimator.scala   |  4 +--
 .../util/collection/ExternalAppendOnlyMap.scala |  4 +--
 .../spark/util/collection/ExternalSorter.scala  |  2 +-
 .../scala/org/apache/spark/FailureSuite.scala   |  6 ++---
 .../apache/spark/ImplicitOrderingSuite.scala| 28 ++--
 .../org/apache/spark/SparkContextSuite.scala| 10 +++
 .../org/apache/spark/rdd/JdbcRDDSuite.scala |  2 +-
 .../cluster/mesos/MemoryUtilsSuite.scala|  4 +--
 .../mesos/MesosSchedulerBackendSuite.scala  |  4 +--
 .../spark/serializer/KryoSerializerSuite.scala  |  6 ++---
 .../ProactiveClosureSerializationSuite.scala| 18 ++---
 .../apache/spark/util/ClosureCleanerSuite.scala |  2 +-
 .../spark/util/random/RandomSamplerSuite.scala  |  2 +-
 47 files changed, 117 insertions(+), 117 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/74fdc97c/core/src/main/scala/org/apache/spark/Aggregator.scala
--
diff --git a/core/src/main/scala/org/apache/spark/Aggregator.scala 
b/core/src/main/scala/org/apache/spark/Aggregator.scala
index b8a5f50..ceeb580 100644
--- a/core/src/main/scala/org/apache/spark/Aggregator.scala
+++ b/core/src/main/scala/org/apache/spark/Aggregator.scala
@@ -34,8 +34,8 @@ case class Aggregator[K, V, C] (
 mergeValue: (C, V) => C,
 mergeCombiners: (C, C) => C) {
 
-  // When spilling is enabled sorting will happen externally, but not 
necessarily with an 
-  // ExternalSorter. 
+  // When spilling is enabled sorting will happen externally, but not 
necessarily with an
+  // ExternalSorter.
   private val isSpillEnabled = 
SparkEnv.get.conf.getBoolean("spark.shuffle.spill", true)
 
   @deprecated("use combineValuesByKey with TaskContext argument", "0.9.0")

http://git-wip-us.apache.org/repos/asf/spark/blob/74fdc97c/core/src/main/scala/org/apache/spark/FutureAction.scala
--
diff --git a/core/src/main/scala/org/apache/spark/FutureAction.scala 
b/core/src/main/scala/org/apa

spark git commit: [SPARK-3850] Trim trailing spaces for core.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 2016927f7 -> a7c217166


[SPARK-3850] Trim trailing spaces for core.

Author: Reynold Xin 

Closes #6533 from rxin/whitespace-2 and squashes the following commits:

038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.

(cherry picked from commit 74fdc97c7206c6d715f128ef7c46055e0bb90760)
Signed-off-by: Reynold Xin 

Conflicts:
core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala

core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7c21716
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7c21716
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7c21716

Branch: refs/heads/branch-1.4
Commit: a7c217166b95e3207b2341dadffb43870603624f
Parents: 2016927
Author: Reynold Xin 
Authored: Sun May 31 00:16:22 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:17:47 2015 -0700

--
 .../scala/org/apache/spark/Aggregator.scala |  4 +--
 .../scala/org/apache/spark/FutureAction.scala   |  2 +-
 .../org/apache/spark/HeartbeatReceiver.scala| 20 +++---
 .../scala/org/apache/spark/HttpFileServer.scala |  4 +--
 .../main/scala/org/apache/spark/SparkConf.scala | 12 -
 .../main/scala/org/apache/spark/TestUtils.scala |  2 +-
 .../apache/spark/api/java/JavaDoubleRDD.scala   |  2 +-
 .../org/apache/spark/api/java/JavaRDD.scala |  6 ++---
 .../org/apache/spark/api/python/PythonRDD.scala |  4 +--
 .../scala/org/apache/spark/api/r/RBackend.scala |  4 +--
 .../apache/spark/api/r/RBackendHandler.scala|  2 +-
 .../org/apache/spark/deploy/SparkSubmit.scala   |  2 +-
 .../deploy/history/HistoryServerArguments.scala |  2 +-
 .../master/ZooKeeperPersistenceEngine.scala |  2 +-
 .../org/apache/spark/executor/TaskMetrics.scala | 16 +--
 .../apache/spark/metrics/sink/Slf4jSink.scala   |  4 +--
 .../org/apache/spark/metrics/sink/package.scala |  2 +-
 .../org/apache/spark/rdd/AsyncRDDActions.scala  |  4 +--
 .../org/apache/spark/rdd/NewHadoopRDD.scala |  2 +-
 .../org/apache/spark/rdd/PairRDDFunctions.scala |  2 +-
 .../spark/scheduler/ReplayListenerBus.scala |  4 +--
 .../scala/org/apache/spark/scheduler/Task.scala |  2 +-
 .../apache/spark/scheduler/TaskSetManager.scala |  4 +--
 .../cluster/CoarseGrainedSchedulerBackend.scala |  2 +-
 .../mesos/MesosSchedulerBackendUtil.scala   |  4 +--
 .../spark/serializer/KryoSerializer.scala   |  2 +-
 .../shuffle/hash/BlockStoreShuffleFetcher.scala |  2 +-
 .../spark/status/api/v1/OneStageResource.scala  |  2 +-
 .../storage/BlockManagerMasterEndpoint.scala|  8 +++---
 .../apache/spark/storage/DiskBlockManager.scala |  2 +-
 .../main/scala/org/apache/spark/ui/WebUI.scala  |  4 +--
 .../spark/ui/jobs/JobProgressListener.scala |  2 +-
 .../spark/util/AsynchronousListenerBus.scala|  2 +-
 .../org/apache/spark/util/SizeEstimator.scala   |  4 +--
 .../util/collection/ExternalAppendOnlyMap.scala |  4 +--
 .../spark/util/collection/ExternalSorter.scala  |  2 +-
 .../scala/org/apache/spark/FailureSuite.scala   |  6 ++---
 .../apache/spark/ImplicitOrderingSuite.scala| 28 ++--
 .../org/apache/spark/SparkContextSuite.scala| 10 +++
 .../org/apache/spark/rdd/JdbcRDDSuite.scala |  2 +-
 .../cluster/mesos/MemoryUtilsSuite.scala|  4 +--
 .../mesos/MesosSchedulerBackendSuite.scala  |  4 +--
 .../spark/serializer/KryoSerializerSuite.scala  |  2 +-
 .../ProactiveClosureSerializationSuite.scala| 18 ++---
 .../apache/spark/util/ClosureCleanerSuite.scala |  2 +-
 .../spark/util/random/RandomSamplerSuite.scala  |  2 +-
 46 files changed, 113 insertions(+), 113 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7c21716/core/src/main/scala/org/apache/spark/Aggregator.scala
--
diff --git a/core/src/main/scala/org/apache/spark/Aggregator.scala 
b/core/src/main/scala/org/apache/spark/Aggregator.scala
index b8a5f50..ceeb580 100644
--- a/core/src/main/scala/org/apache/spark/Aggregator.scala
+++ b/core/src/main/scala/org/apache/spark/Aggregator.scala
@@ -34,8 +34,8 @@ case class Aggregator[K, V, C] (
 mergeValue: (C, V) => C,
 mergeCombiners: (C, C) => C) {
 
-  // When spilling is enabled sorting will happen externally, but not 
necessarily with an 
-  // ExternalSorter. 
+  // When spilling is enabled sorting will happen externally, but not 
necessarily with an
+  // ExternalSorter.
   private val isSpillEnabled = 
SparkEnv.get.conf.getBoolean("spark.shuffle.spill", true)
 
   @deprecated("use combineValuesByKey with TaskContext argument", "0.9.0")

http://git-wip-us.apache.org/repos/asf/spark/blob/a7c21716/core/sr

spark git commit: [SPARK-3850] Trim trailing spaces for SQL.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 564bc11e9 -> 63a50be13


[SPARK-3850] Trim trailing spaces for SQL.

Author: Reynold Xin 

Closes #6535 from rxin/whitespace-sql and squashes the following commits:

de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63a50be1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/63a50be1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/63a50be1

Branch: refs/heads/master
Commit: 63a50be13d32b9e5f3aad8d1a6ba5362f17a252f
Parents: 564bc11
Author: Reynold Xin 
Authored: Sun May 31 00:48:49 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:48:49 2015 -0700

--
 .../sql/catalyst/CatalystTypeConverters.scala |  2 +-
 .../sql/catalyst/analysis/HiveTypeCoercion.scala  |  4 ++--
 .../sql/catalyst/expressions/SortOrder.scala  |  2 +-
 .../sql/catalyst/expressions/aggregates.scala | 16 
 .../sql/catalyst/expressions/arithmetic.scala |  2 +-
 .../sql/catalyst/expressions/complexTypes.scala   |  2 +-
 .../catalyst/expressions/mathfuncs/binary.scala   |  2 +-
 .../spark/sql/catalyst/expressions/random.scala   |  2 +-
 .../catalyst/expressions/stringOperations.scala   |  6 +++---
 .../org/apache/spark/sql/types/StructType.scala   |  2 +-
 .../expressions/ExpressionEvaluationSuite.scala   |  4 ++--
 .../catalyst/optimizer/CombiningLimitsSuite.scala |  4 ++--
 .../catalyst/optimizer/ConstantFoldingSuite.scala |  2 +-
 .../catalyst/optimizer/FilterPushdownSuite.scala  |  4 ++--
 .../sql/catalyst/optimizer/OptimizeInSuite.scala  |  2 +-
 .../apache/spark/sql/types/DataTypeSuite.scala|  6 +++---
 .../scala/org/apache/spark/sql/GroupedData.scala  |  2 +-
 .../org/apache/spark/sql/api/r/SQLUtils.scala |  4 ++--
 .../spark/sql/execution/GeneratedAggregate.scala  |  8 
 .../spark/sql/execution/basicOperators.scala  |  2 +-
 .../spark/sql/execution/stat/FrequentItems.scala  |  6 +++---
 .../spark/sql/execution/stat/StatFunctions.scala  |  2 +-
 .../scala/org/apache/spark/sql/functions.scala|  2 +-
 .../scala/org/apache/spark/sql/jdbc/JDBCRDD.scala |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCRelation.scala  |  6 +++---
 .../scala/org/apache/spark/sql/jdbc/jdbc.scala|  4 ++--
 .../spark/sql/sources/SqlNewHadoopRDD.scala   |  2 +-
 .../org/apache/spark/sql/DataFrameStatSuite.scala |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala |  6 +++---
 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala| 12 ++--
 .../apache/spark/sql/hive/client/package.scala|  2 +-
 .../spark/sql/hive/execution/HiveTableScan.scala  |  4 ++--
 .../sql/hive/execution/ScriptTransformation.scala | 18 +-
 .../org/apache/spark/sql/hive/hiveUdfs.scala  | 10 +-
 .../apache/spark/sql/hive/CachedTableSuite.scala  |  2 +-
 .../spark/sql/hive/InsertIntoHiveTableSuite.scala |  2 +-
 .../spark/sql/hive/client/VersionsSuite.scala |  6 +++---
 .../sql/hive/execution/HiveTableScanSuite.scala   |  8 
 .../spark/sql/hive/execution/HiveUdfSuite.scala   |  2 +-
 39 files changed, 88 insertions(+), 88 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/63a50be1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
index 75a493b..1c0ddb5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
@@ -233,7 +233,7 @@ object CatalystTypeConverters {
 case other => other
   }
 
-  /** 
+  /**
* Converts Catalyst types used internally in rows to standard Scala types
* This method is slow, and for batch conversion you should be using 
converter
* produced by createToScalaConverter.

http://git-wip-us.apache.org/repos/asf/spark/blob/63a50be1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index 195418d..96d7b96 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -296,8 +296,8 @@ trait HiveTypeCoercion {
   object InConve

spark git commit: [SPARK-3850] Trim trailing spaces for examples/streaming/yarn.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 a7c217166 -> f63eab950


[SPARK-3850] Trim trailing spaces for examples/streaming/yarn.

Author: Reynold Xin 

Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:

7b7b3a0 [Reynold Xin] Reset again.
dc14597 [Reynold Xin] Reset scalastyle.
cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.

(cherry picked from commit 564bc11e9827915c8652bc06f4bd591809dea4b1)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f63eab95
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f63eab95
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f63eab95

Branch: refs/heads/branch-1.4
Commit: f63eab950b1ec460fb1de2c2b6f123ef065e1c3f
Parents: a7c2171
Author: Reynold Xin 
Authored: Sun May 31 00:47:56 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:48:29 2015 -0700

--
 .../org/apache/spark/examples/LogQuery.scala|  2 +-
 .../examples/mllib/DenseGaussianMixture.scala   | 10 +++
 .../examples/streaming/MQTTWordCount.scala  | 10 +++
 .../streaming/flume/FlumeInputDStream.scala | 10 +++
 .../flume/FlumePollingInputDStream.scala|  2 +-
 .../streaming/flume/FlumeStreamSuite.scala  |  2 +-
 .../spark/streaming/kafka/KafkaCluster.scala|  2 +-
 .../spark/streaming/kafka/KafkaUtils.scala  |  8 +++---
 .../streaming/KinesisWordCountASL.scala | 12 -
 .../kinesis/KinesisCheckpointState.scala|  8 +++---
 .../streaming/kinesis/KinesisReceiver.scala |  8 +++---
 .../kinesis/KinesisRecordProcessor.scala| 28 ++--
 .../org/apache/spark/graphx/EdgeSuite.scala | 10 +++
 .../streaming/receiver/BlockGenerator.scala |  2 +-
 .../receiver/ReceivedBlockHandler.scala |  2 +-
 .../spark/streaming/UISeleniumSuite.scala   |  4 ---
 .../streaming/util/WriteAheadLogSuite.scala |  4 +--
 .../yarn/ClientDistributedCacheManager.scala| 26 +-
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala |  4 +--
 .../ClientDistributedCacheManagerSuite.scala| 24 -
 20 files changed, 87 insertions(+), 91 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f63eab95/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
--
diff --git a/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala 
b/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
index 32e02ea..75c8211 100644
--- a/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
@@ -22,7 +22,7 @@ import org.apache.spark.SparkContext._
 
 /**
  * Executes a roll up-style query against Apache logs.
- *  
+ *
  * Usage: LogQuery [logFile]
  */
 object LogQuery {

http://git-wip-us.apache.org/repos/asf/spark/blob/f63eab95/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
index 9a1aab0..f8c71cc 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
@@ -41,22 +41,22 @@ object DenseGaussianMixture {
   private def run(inputFile: String, k: Int, convergenceTol: Double, 
maxIterations: Int) {
 val conf = new SparkConf().setAppName("Gaussian Mixture Model EM example")
 val ctx = new SparkContext(conf)
-
+
 val data = ctx.textFile(inputFile).map { line =>
   Vectors.dense(line.trim.split(' ').map(_.toDouble))
 }.cache()
-  
+
 val clusters = new GaussianMixture()
   .setK(k)
   .setConvergenceTol(convergenceTol)
   .setMaxIterations(maxIterations)
   .run(data)
-
+
 for (i <- 0 until clusters.k) {
-  println("weight=%f\nmu=%s\nsigma=\n%s\n" format 
+  println("weight=%f\nmu=%s\nsigma=\n%s\n" format
 (clusters.weights(i), clusters.gaussians(i).mu, 
clusters.gaussians(i).sigma))
 }
-
+
 println("Cluster labels (first <= 100):")
 val clusterLabels = clusters.predict(data)
 clusterLabels.take(100).foreach { x =>

http://git-wip-us.apache.org/repos/asf/spark/blob/f63eab95/examples/src/main/scala/org/apache/spark/examples/streaming/MQTTWordCount.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examp

spark git commit: [SPARK-3850] Trim trailing spaces for examples/streaming/yarn.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 74fdc97c7 -> 564bc11e9


[SPARK-3850] Trim trailing spaces for examples/streaming/yarn.

Author: Reynold Xin 

Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:

7b7b3a0 [Reynold Xin] Reset again.
dc14597 [Reynold Xin] Reset scalastyle.
cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/564bc11e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/564bc11e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/564bc11e

Branch: refs/heads/master
Commit: 564bc11e9827915c8652bc06f4bd591809dea4b1
Parents: 74fdc97
Author: Reynold Xin 
Authored: Sun May 31 00:47:56 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:47:56 2015 -0700

--
 .../org/apache/spark/examples/LogQuery.scala|  2 +-
 .../examples/mllib/DenseGaussianMixture.scala   | 10 +++
 .../examples/streaming/MQTTWordCount.scala  | 10 +++
 .../streaming/flume/FlumeInputDStream.scala | 10 +++
 .../flume/FlumePollingInputDStream.scala|  2 +-
 .../streaming/flume/FlumeStreamSuite.scala  |  2 +-
 .../spark/streaming/kafka/KafkaCluster.scala|  2 +-
 .../spark/streaming/kafka/KafkaUtils.scala  |  8 +++---
 .../streaming/KinesisWordCountASL.scala | 12 -
 .../kinesis/KinesisCheckpointState.scala|  8 +++---
 .../streaming/kinesis/KinesisReceiver.scala |  8 +++---
 .../kinesis/KinesisRecordProcessor.scala| 28 ++--
 .../org/apache/spark/graphx/EdgeSuite.scala | 10 +++
 .../streaming/receiver/BlockGenerator.scala |  2 +-
 .../receiver/ReceivedBlockHandler.scala |  2 +-
 .../spark/streaming/UISeleniumSuite.scala   |  4 ---
 .../streaming/util/WriteAheadLogSuite.scala |  4 +--
 .../yarn/ClientDistributedCacheManager.scala| 26 +-
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala |  4 +--
 .../ClientDistributedCacheManagerSuite.scala| 24 -
 20 files changed, 87 insertions(+), 91 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/564bc11e/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
--
diff --git a/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala 
b/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
index 32e02ea..75c8211 100644
--- a/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
@@ -22,7 +22,7 @@ import org.apache.spark.SparkContext._
 
 /**
  * Executes a roll up-style query against Apache logs.
- *  
+ *
  * Usage: LogQuery [logFile]
  */
 object LogQuery {

http://git-wip-us.apache.org/repos/asf/spark/blob/564bc11e/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
index 9a1aab0..f8c71cc 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala
@@ -41,22 +41,22 @@ object DenseGaussianMixture {
   private def run(inputFile: String, k: Int, convergenceTol: Double, 
maxIterations: Int) {
 val conf = new SparkConf().setAppName("Gaussian Mixture Model EM example")
 val ctx = new SparkContext(conf)
-
+
 val data = ctx.textFile(inputFile).map { line =>
   Vectors.dense(line.trim.split(' ').map(_.toDouble))
 }.cache()
-  
+
 val clusters = new GaussianMixture()
   .setK(k)
   .setConvergenceTol(convergenceTol)
   .setMaxIterations(maxIterations)
   .run(data)
-
+
 for (i <- 0 until clusters.k) {
-  println("weight=%f\nmu=%s\nsigma=\n%s\n" format 
+  println("weight=%f\nmu=%s\nsigma=\n%s\n" format
 (clusters.weights(i), clusters.gaussians(i).mu, 
clusters.gaussians(i).sigma))
 }
-
+
 println("Cluster labels (first <= 100):")
 val clusterLabels = clusters.predict(data)
 clusterLabels.take(100).foreach { x =>

http://git-wip-us.apache.org/repos/asf/spark/blob/564bc11e/examples/src/main/scala/org/apache/spark/examples/streaming/MQTTWordCount.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/streaming/MQTTWordCount.scala
 
b/examples/src/main/scala/org/apache/spark/examples/streaming/MQTTWord

spark git commit: [SPARK-3850] Trim trailing spaces for SQL.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 f63eab950 -> a1904fa79


[SPARK-3850] Trim trailing spaces for SQL.

Author: Reynold Xin 

Closes #6535 from rxin/whitespace-sql and squashes the following commits:

de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.

(cherry picked from commit 63a50be13d32b9e5f3aad8d1a6ba5362f17a252f)
Signed-off-by: Reynold Xin 

Conflicts:

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala

sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1904fa7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1904fa7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1904fa7

Branch: refs/heads/branch-1.4
Commit: a1904fa79eb29d44d70278d01a443df101fdfd87
Parents: f63eab9
Author: Reynold Xin 
Authored: Sun May 31 00:48:49 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 00:52:02 2015 -0700

--
 .../sql/catalyst/CatalystTypeConverters.scala |  2 +-
 .../sql/catalyst/expressions/SortOrder.scala  |  2 +-
 .../sql/catalyst/expressions/aggregates.scala | 16 
 .../sql/catalyst/expressions/arithmetic.scala |  2 +-
 .../sql/catalyst/expressions/complexTypes.scala   |  2 +-
 .../catalyst/expressions/mathfuncs/binary.scala   |  2 +-
 .../spark/sql/catalyst/expressions/random.scala   |  2 +-
 .../catalyst/expressions/stringOperations.scala   |  6 +++---
 .../expressions/ExpressionEvaluationSuite.scala   |  4 ++--
 .../catalyst/optimizer/CombiningLimitsSuite.scala |  4 ++--
 .../catalyst/optimizer/ConstantFoldingSuite.scala |  2 +-
 .../catalyst/optimizer/FilterPushdownSuite.scala  |  4 ++--
 .../sql/catalyst/optimizer/OptimizeInSuite.scala  |  2 +-
 .../scala/org/apache/spark/sql/GroupedData.scala  |  2 +-
 .../org/apache/spark/sql/api/r/SQLUtils.scala |  4 ++--
 .../spark/sql/execution/GeneratedAggregate.scala  |  8 
 .../spark/sql/execution/basicOperators.scala  |  2 +-
 .../spark/sql/execution/stat/FrequentItems.scala  |  6 +++---
 .../spark/sql/execution/stat/StatFunctions.scala  |  2 +-
 .../scala/org/apache/spark/sql/functions.scala|  2 +-
 .../scala/org/apache/spark/sql/jdbc/JDBCRDD.scala |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCRelation.scala  |  6 +++---
 .../scala/org/apache/spark/sql/jdbc/jdbc.scala|  4 ++--
 .../spark/sql/sources/SqlNewHadoopRDD.scala   |  2 +-
 .../org/apache/spark/sql/DataFrameStatSuite.scala |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala |  6 +++---
 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala| 12 ++--
 .../apache/spark/sql/hive/client/package.scala|  2 +-
 .../spark/sql/hive/execution/HiveTableScan.scala  |  4 ++--
 .../sql/hive/execution/ScriptTransformation.scala | 18 +-
 .../org/apache/spark/sql/hive/hiveUdfs.scala  | 10 +-
 .../apache/spark/sql/hive/CachedTableSuite.scala  |  2 +-
 .../spark/sql/hive/InsertIntoHiveTableSuite.scala |  2 +-
 .../spark/sql/hive/client/VersionsSuite.scala |  6 +++---
 .../sql/hive/execution/HiveTableScanSuite.scala   |  8 
 .../spark/sql/hive/execution/HiveUdfSuite.scala   |  2 +-
 36 files changed, 82 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a1904fa7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
index 75a493b..1c0ddb5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
@@ -233,7 +233,7 @@ object CatalystTypeConverters {
 case other => other
   }
 
-  /** 
+  /**
* Converts Catalyst types used internally in rows to standard Scala types
* This method is slow, and for batch conversion you should be using 
converter
* produced by createToScalaConverter.

http://git-wip-us.apache.org/repos/asf/spark/blob/a1904fa7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
index 83074eb..0dd8a4e

spark git commit: [SPARK-7979] Enforce structural type checker.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 a1904fa79 -> 01f38f75d


[SPARK-7979] Enforce structural type checker.

Author: Reynold Xin 

Closes #6536 from rxin/structural-type-checker and squashes the following 
commits:

f833151 [Reynold Xin] Fixed compilation.
633f9a1 [Reynold Xin] Fixed typo.
d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.

(cherry picked from commit 4b5f12bac939a2f47a3a61365b5325d849b7b51f)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/01f38f75
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/01f38f75
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/01f38f75

Branch: refs/heads/branch-1.4
Commit: 01f38f75d98b4f773b44c20e592ae4f23033d049
Parents: a1904fa
Author: Reynold Xin 
Authored: Sun May 31 01:37:56 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 01:40:57 2015 -0700

--
 .../org/apache/spark/util/random/XORShiftRandomSuite.scala | 2 +-
 .../org/apache/spark/examples/mllib/DecisionTreeRunner.scala   | 6 +-
 graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala| 4 +++-
 .../scala/org/apache/spark/ml/classification/OneVsRest.scala   | 2 ++
 scalastyle-config.xml  | 3 +++
 5 files changed, 14 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/01f38f75/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala 
b/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
index 03f5f2d..5eba208 100644
--- a/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
@@ -28,7 +28,7 @@ import scala.language.reflectiveCalls
 
 class XORShiftRandomSuite extends FunSuite with Matchers {
 
-  def fixture: Object {val seed: Long; val hundMil: Int; val xorRand: 
XORShiftRandom} = new {
+  private def fixture = new {
 val seed = 1L
 val xorRand = new XORShiftRandom(seed)
 val hundMil = 1e8.toInt

http://git-wip-us.apache.org/repos/asf/spark/blob/01f38f75/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
index b061363..3381941 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
@@ -22,7 +22,6 @@ import scala.language.reflectiveCalls
 import scopt.OptionParser
 
 import org.apache.spark.{SparkConf, SparkContext}
-import org.apache.spark.SparkContext._
 import org.apache.spark.mllib.evaluation.MulticlassMetrics
 import org.apache.spark.mllib.linalg.Vector
 import org.apache.spark.mllib.regression.LabeledPoint
@@ -354,7 +353,11 @@ object DecisionTreeRunner {
 
   /**
* Calculates the mean squared error for regression.
+   *
+   * This is just for demo purpose. In general, don't copy this code because 
it is NOT efficient
+   * due to the use of structural types, which leads to one reflection call 
per record.
*/
+  // scalastyle:off structural.type
   private[mllib] def meanSquaredError(
   model: { def predict(features: Vector): Double },
   data: RDD[LabeledPoint]): Double = {
@@ -363,4 +366,5 @@ object DecisionTreeRunner {
   err * err
 }.mean()
   }
+  // scalastyle:on structural.type
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/01f38f75/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
--
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
index cc70b39..4611a3a 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
@@ -41,14 +41,16 @@ abstract class EdgeRDD[ED](
 @transient sc: SparkContext,
 @transient deps: Seq[Dependency[_]]) extends RDD[Edge[ED]](sc, deps) {
 
+  // scalastyle:off structural.type
   private[graphx] def partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])] 
forSome { type VD }
+  // scalastyle:on structural.type
 
   override protected def getPartitions: Array[Partition] = 
partitionsRDD.partitions
 
   override def compute(part: Partition, context: TaskContext): 
Iterator[Edge[ED]] = {
 val p = firstPar

spark git commit: [SPARK-7979] Enforce structural type checker.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 63a50be13 -> 4b5f12bac


[SPARK-7979] Enforce structural type checker.

Author: Reynold Xin 

Closes #6536 from rxin/structural-type-checker and squashes the following 
commits:

f833151 [Reynold Xin] Fixed compilation.
633f9a1 [Reynold Xin] Fixed typo.
d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b5f12ba
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b5f12ba
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b5f12ba

Branch: refs/heads/master
Commit: 4b5f12bac939a2f47a3a61365b5325d849b7b51f
Parents: 63a50be
Author: Reynold Xin 
Authored: Sun May 31 01:37:56 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 01:37:56 2015 -0700

--
 .../org/apache/spark/util/random/XORShiftRandomSuite.scala | 2 +-
 .../org/apache/spark/examples/mllib/DecisionTreeRunner.scala   | 6 +-
 graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala| 4 +++-
 .../scala/org/apache/spark/ml/classification/OneVsRest.scala   | 2 ++
 scalastyle-config.xml  | 3 +++
 5 files changed, 14 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b5f12ba/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala 
b/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
index 6ca484c..d26667b 100644
--- a/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/random/XORShiftRandomSuite.scala
@@ -28,7 +28,7 @@ import scala.language.reflectiveCalls
 
 class XORShiftRandomSuite extends SparkFunSuite with Matchers {
 
-  def fixture: Object {val seed: Long; val hundMil: Int; val xorRand: 
XORShiftRandom} = new {
+  private def fixture = new {
 val seed = 1L
 val xorRand = new XORShiftRandom(seed)
 val hundMil = 1e8.toInt

http://git-wip-us.apache.org/repos/asf/spark/blob/4b5f12ba/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
index b061363..3381941 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
@@ -22,7 +22,6 @@ import scala.language.reflectiveCalls
 import scopt.OptionParser
 
 import org.apache.spark.{SparkConf, SparkContext}
-import org.apache.spark.SparkContext._
 import org.apache.spark.mllib.evaluation.MulticlassMetrics
 import org.apache.spark.mllib.linalg.Vector
 import org.apache.spark.mllib.regression.LabeledPoint
@@ -354,7 +353,11 @@ object DecisionTreeRunner {
 
   /**
* Calculates the mean squared error for regression.
+   *
+   * This is just for demo purpose. In general, don't copy this code because 
it is NOT efficient
+   * due to the use of structural types, which leads to one reflection call 
per record.
*/
+  // scalastyle:off structural.type
   private[mllib] def meanSquaredError(
   model: { def predict(features: Vector): Double },
   data: RDD[LabeledPoint]): Double = {
@@ -363,4 +366,5 @@ object DecisionTreeRunner {
   err * err
 }.mean()
   }
+  // scalastyle:on structural.type
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/4b5f12ba/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
--
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
index cc70b39..4611a3a 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
@@ -41,14 +41,16 @@ abstract class EdgeRDD[ED](
 @transient sc: SparkContext,
 @transient deps: Seq[Dependency[_]]) extends RDD[Edge[ED]](sc, deps) {
 
+  // scalastyle:off structural.type
   private[graphx] def partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])] 
forSome { type VD }
+  // scalastyle:on structural.type
 
   override protected def getPartitions: Array[Partition] = 
partitionsRDD.partitions
 
   override def compute(part: Partition, context: TaskContext): 
Iterator[Edge[ED]] = {
 val p = firstParent[(PartitionID, EdgePartition[ED, _])].iterator(part, 
context)
 if (p.hasNext) {
-  p.next

spark git commit: [MINOR] Add license for dagre-d3 and graphlib-dot

2015-05-31 Thread andrewor14
Repository: spark
Updated Branches:
  refs/heads/master 4b5f12bac -> d1d2def2f


[MINOR] Add license for dagre-d3 and graphlib-dot

Add license for dagre-d3 and graphlib-dot

Author: zsxwing 

Closes #6539 from zsxwing/LICENSE and squashes the following commits:

82b0475 [zsxwing] Add license for dagre-d3 and graphlib-dot


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d1d2def2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d1d2def2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d1d2def2

Branch: refs/heads/master
Commit: d1d2def2f5f91e86f340656421170d1097f14854
Parents: 4b5f12b
Author: zsxwing 
Authored: Sun May 31 11:18:12 2015 -0700
Committer: Andrew Or 
Committed: Sun May 31 11:18:12 2015 -0700

--
 LICENSE | 46 ++
 1 file changed, 46 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d1d2def2/LICENSE
--
diff --git a/LICENSE b/LICENSE
index 9d1b00b..d0cd0dc 100644
--- a/LICENSE
+++ b/LICENSE
@@ -854,6 +854,52 @@ and
 Vis.js may be distributed under either license.
 
 
+For dagre-d3 
(core/src/main/resources/org/apache/spark/ui/static/dagre-d3.min.js):
+
+Copyright (c) 2013 Chris Pettitt
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+For graphlib-dot 
(core/src/main/resources/org/apache/spark/ui/static/graphlib-dot.min.js):
+
+Copyright (c) 2012-2013 Chris Pettitt
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
 BSD-style licenses
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] Add license for dagre-d3 and graphlib-dot

2015-05-31 Thread andrewor14
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 01f38f75d -> 8a72bc917


[MINOR] Add license for dagre-d3 and graphlib-dot

Add license for dagre-d3 and graphlib-dot

Author: zsxwing 

Closes #6539 from zsxwing/LICENSE and squashes the following commits:

82b0475 [zsxwing] Add license for dagre-d3 and graphlib-dot

(cherry picked from commit d1d2def2f5f91e86f340656421170d1097f14854)
Signed-off-by: Andrew Or 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8a72bc91
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8a72bc91
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8a72bc91

Branch: refs/heads/branch-1.4
Commit: 8a72bc9170e7cc9fef78794278fab385f3d8a695
Parents: 01f38f7
Author: zsxwing 
Authored: Sun May 31 11:18:12 2015 -0700
Committer: Andrew Or 
Committed: Sun May 31 11:18:20 2015 -0700

--
 LICENSE | 46 ++
 1 file changed, 46 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8a72bc91/LICENSE
--
diff --git a/LICENSE b/LICENSE
index 9d1b00b..d0cd0dc 100644
--- a/LICENSE
+++ b/LICENSE
@@ -854,6 +854,52 @@ and
 Vis.js may be distributed under either license.
 
 
+For dagre-d3 
(core/src/main/resources/org/apache/spark/ui/static/dagre-d3.min.js):
+
+Copyright (c) 2013 Chris Pettitt
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+For graphlib-dot 
(core/src/main/resources/org/apache/spark/ui/static/graphlib-dot.min.js):
+
+Copyright (c) 2012-2013 Chris Pettitt
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
 BSD-style licenses
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-3850] Trim trailing spaces for MLlib.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 8a72bc917 -> 70cf9c349


[SPARK-3850] Trim trailing spaces for MLlib.

Author: Reynold Xin 

Closes #6534 from rxin/whitespace-mllib and squashes the following commits:

38926e3 [Reynold Xin] [SPARK-3850] Trim trailing spaces for MLlib.

(cherry picked from commit e1067d0ad1c32c678c23d76d7653b51770795831)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/70cf9c34
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/70cf9c34
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/70cf9c34

Branch: refs/heads/branch-1.4
Commit: 70cf9c34954d9486cffc31d0cfa0ba280276eead
Parents: 8a72bc9
Author: Reynold Xin 
Authored: Sun May 31 11:35:30 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 11:35:46 2015 -0700

--
 .../spark/ml/feature/StandardScaler.scala   | 10 +--
 .../spark/ml/regression/LinearRegression.scala  |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala |  4 +-
 .../mllib/clustering/GaussianMixture.scala  | 86 ++--
 .../mllib/clustering/GaussianMixtureModel.scala | 22 ++---
 .../clustering/PowerIterationClustering.scala   |  8 +-
 .../apache/spark/mllib/feature/Word2Vec.scala   | 50 ++--
 .../org/apache/spark/mllib/linalg/BLAS.scala|  8 +-
 .../mllib/linalg/EigenValueDecomposition.scala  |  2 +-
 .../BinaryClassificationPMMLModelExport.scala   | 10 +--
 .../mllib/pmml/export/PMMLModelExport.scala |  4 +-
 .../pmml/export/PMMLModelExportFactory.scala|  8 +-
 .../apache/spark/mllib/random/RandomRDDs.scala  |  6 +-
 .../apache/spark/mllib/recommendation/ALS.scala |  2 +-
 .../mllib/regression/IsotonicRegression.scala   | 10 +--
 .../distribution/MultivariateGaussian.scala | 54 ++--
 .../spark/mllib/tree/GradientBoostedTrees.scala |  2 +-
 .../apache/spark/mllib/tree/RandomForest.scala  |  2 +-
 .../org/apache/spark/mllib/util/MLUtils.scala   |  2 +-
 .../evaluation/RegressionEvaluatorSuite.scala   |  2 +-
 .../spark/ml/feature/BinarizerSuite.scala   |  2 +-
 .../mllib/clustering/GaussianMixtureSuite.scala |  4 +-
 .../PowerIterationClusteringSuite.scala |  2 +-
 .../apache/spark/mllib/linalg/BLASSuite.scala   | 34 
 .../spark/mllib/linalg/VectorsSuite.scala   |  6 +-
 ...naryClassificationPMMLModelExportSuite.scala |  8 +-
 .../export/KMeansPMMLModelExportSuite.scala |  2 +-
 .../export/PMMLModelExportFactorySuite.scala| 10 +--
 .../MultivariateGaussianSuite.scala | 14 ++--
 .../apache/spark/mllib/util/MLUtilsSuite.scala  |  2 +-
 30 files changed, 189 insertions(+), 189 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/70cf9c34/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
index fdd2494..b0fd06d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
@@ -35,13 +35,13 @@ private[feature] trait StandardScalerParams extends Params 
with HasInputCol with
 
   /**
* Centers the data with mean before scaling.
-   * It will build a dense output, so this does not work on sparse input 
+   * It will build a dense output, so this does not work on sparse input
* and will raise an exception.
* Default: false
* @group param
*/
   val withMean: BooleanParam = new BooleanParam(this, "withMean", "Center data 
with mean")
-  
+
   /**
* Scales the data to unit standard deviation.
* Default: true
@@ -68,13 +68,13 @@ class StandardScaler(override val uid: String) extends 
Estimator[StandardScalerM
 
   /** @group setParam */
   def setOutputCol(value: String): this.type = set(outputCol, value)
-  
+
   /** @group setParam */
   def setWithMean(value: Boolean): this.type = set(withMean, value)
-  
+
   /** @group setParam */
   def setWithStd(value: Boolean): this.type = set(withStd, value)
-  
+
   override def fit(dataset: DataFrame): StandardScalerModel = {
 transformSchema(dataset.schema, logging = true)
 val input = dataset.select($(inputCol)).map { case Row(v: Vector) => v }

http://git-wip-us.apache.org/repos/asf/spark/blob/70cf9c34/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 
b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
index 7c40db1..fe2a71a 100644
--- a/mllib/src/main/scala/org/apache/spark/m

spark git commit: [SPARK-3850] Trim trailing spaces for MLlib.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master d1d2def2f -> e1067d0ad


[SPARK-3850] Trim trailing spaces for MLlib.

Author: Reynold Xin 

Closes #6534 from rxin/whitespace-mllib and squashes the following commits:

38926e3 [Reynold Xin] [SPARK-3850] Trim trailing spaces for MLlib.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e1067d0a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e1067d0a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e1067d0a

Branch: refs/heads/master
Commit: e1067d0ad1c32c678c23d76d7653b51770795831
Parents: d1d2def
Author: Reynold Xin 
Authored: Sun May 31 11:35:30 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 11:35:30 2015 -0700

--
 .../spark/ml/feature/StandardScaler.scala   | 10 +--
 .../spark/ml/regression/LinearRegression.scala  |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala |  4 +-
 .../mllib/clustering/GaussianMixture.scala  | 86 ++--
 .../mllib/clustering/GaussianMixtureModel.scala | 22 ++---
 .../clustering/PowerIterationClustering.scala   |  8 +-
 .../apache/spark/mllib/feature/Word2Vec.scala   | 50 ++--
 .../org/apache/spark/mllib/linalg/BLAS.scala|  8 +-
 .../mllib/linalg/EigenValueDecomposition.scala  |  2 +-
 .../BinaryClassificationPMMLModelExport.scala   | 10 +--
 .../mllib/pmml/export/PMMLModelExport.scala |  4 +-
 .../pmml/export/PMMLModelExportFactory.scala|  8 +-
 .../apache/spark/mllib/random/RandomRDDs.scala  |  6 +-
 .../apache/spark/mllib/recommendation/ALS.scala |  2 +-
 .../mllib/regression/IsotonicRegression.scala   | 10 +--
 .../distribution/MultivariateGaussian.scala | 54 ++--
 .../spark/mllib/tree/GradientBoostedTrees.scala |  2 +-
 .../apache/spark/mllib/tree/RandomForest.scala  |  2 +-
 .../org/apache/spark/mllib/util/MLUtils.scala   |  2 +-
 .../evaluation/RegressionEvaluatorSuite.scala   |  2 +-
 .../spark/ml/feature/BinarizerSuite.scala   |  2 +-
 .../mllib/clustering/GaussianMixtureSuite.scala |  4 +-
 .../PowerIterationClusteringSuite.scala |  2 +-
 .../apache/spark/mllib/linalg/BLASSuite.scala   | 34 
 .../spark/mllib/linalg/VectorsSuite.scala   |  6 +-
 ...naryClassificationPMMLModelExportSuite.scala |  8 +-
 .../export/KMeansPMMLModelExportSuite.scala |  2 +-
 .../export/PMMLModelExportFactorySuite.scala| 10 +--
 .../MultivariateGaussianSuite.scala | 14 ++--
 .../apache/spark/mllib/util/MLUtilsSuite.scala  |  2 +-
 30 files changed, 189 insertions(+), 189 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e1067d0a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
index fdd2494..b0fd06d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
@@ -35,13 +35,13 @@ private[feature] trait StandardScalerParams extends Params 
with HasInputCol with
 
   /**
* Centers the data with mean before scaling.
-   * It will build a dense output, so this does not work on sparse input 
+   * It will build a dense output, so this does not work on sparse input
* and will raise an exception.
* Default: false
* @group param
*/
   val withMean: BooleanParam = new BooleanParam(this, "withMean", "Center data 
with mean")
-  
+
   /**
* Scales the data to unit standard deviation.
* Default: true
@@ -68,13 +68,13 @@ class StandardScaler(override val uid: String) extends 
Estimator[StandardScalerM
 
   /** @group setParam */
   def setOutputCol(value: String): this.type = set(outputCol, value)
-  
+
   /** @group setParam */
   def setWithMean(value: Boolean): this.type = set(withMean, value)
-  
+
   /** @group setParam */
   def setWithStd(value: Boolean): this.type = set(withStd, value)
-  
+
   override def fit(dataset: DataFrame): StandardScalerModel = {
 transformSchema(dataset.schema, logging = true)
 val input = dataset.select($(inputCol)).map { case Row(v: Vector) => v }

http://git-wip-us.apache.org/repos/asf/spark/blob/e1067d0a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 
b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
index 7c40db1..fe2a71a 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegres

spark git commit: [SPARK-7949] [MLLIB] [DOC] update document with some missing save/load

2015-05-31 Thread jkbradley
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 70cf9c349 -> 4d5ce4677


[SPARK-7949] [MLLIB] [DOC] update document with some missing save/load

add save load for examples:
KMeansModel
PowerIterationClusteringModel
Word2VecModel
IsotonicRegressionModel

Author: Yuhao Yang 

Closes #6498 from hhbyyh/docSaveLoad and squashes the following commits:

7f9f06d [Yuhao Yang] add missing imports
c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into 
docSaveLoad
1dd77cc [Yuhao Yang] update document with some missing save/load

(cherry picked from commit 0674700303da3e4737d73f5fabd2a925ec712f63)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4d5ce467
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4d5ce467
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4d5ce467

Branch: refs/heads/branch-1.4
Commit: 4d5ce46772ad0c9296fcfd24e867d396761437fd
Parents: 70cf9c3
Author: Yuhao Yang 
Authored: Sun May 31 11:51:49 2015 -0700
Committer: Joseph K. Bradley 
Committed: Sun May 31 11:52:04 2015 -0700

--
 docs/mllib-clustering.md  | 28 
 docs/mllib-feature-extraction.md  |  6 +-
 docs/mllib-isotonic-regression.md | 10 +-
 3 files changed, 38 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4d5ce467/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index f41ca70..dac22f7 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -47,7 +47,7 @@ Set Sum of Squared Error (WSSSE). You can reduce this error 
measure by increasin
 optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
 
 {% highlight scala %}
-import org.apache.spark.mllib.clustering.KMeans
+import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
 import org.apache.spark.mllib.linalg.Vectors
 
 // Load and parse the data
@@ -62,6 +62,10 @@ val clusters = KMeans.train(parsedData, numClusters, 
numIterations)
 // Evaluate clustering by computing Within Set Sum of Squared Errors
 val WSSSE = clusters.computeCost(parsedData)
 println("Within Set Sum of Squared Errors = " + WSSSE)
+
+// Save and load model
+clusters.save(sc, "myModelPath")
+val sameModel = KMeansModel.load(sc, "myModelPath")
 {% endhighlight %}
 
 
@@ -110,6 +114,10 @@ public class KMeansExample {
 // Evaluate clustering by computing Within Set Sum of Squared Errors
 double WSSSE = clusters.computeCost(parsedData.rdd());
 System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+
+// Save and load model
+clusters.save(sc.sc(), "myModelPath");
+KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
   }
 }
 {% endhighlight %}
@@ -124,7 +132,7 @@ Within Set Sum of Squared Error (WSSSE). You can reduce 
this error measure by in
 fact the optimal *k* is usually one where there is an "elbow" in the WSSSE 
graph.
 
 {% highlight python %}
-from pyspark.mllib.clustering import KMeans
+from pyspark.mllib.clustering import KMeans, KMeansModel
 from numpy import array
 from math import sqrt
 
@@ -143,6 +151,10 @@ def error(point):
 
 WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
 print("Within Set Sum of Squared Error = " + str(WSSSE))
+
+# Save and load model
+clusters.save(sc, "myModelPath")
+sameModel = KMeansModel.load(sc, "myModelPath")
 {% endhighlight %}
 
 
@@ -312,12 +324,12 @@ Calling `PowerIterationClustering.run` returns a
 which contains the computed clustering assignments.
 
 {% highlight scala %}
-import org.apache.spark.mllib.clustering.PowerIterationClustering
+import org.apache.spark.mllib.clustering.{PowerIterationClustering, 
PowerIterationClusteringModel}
 import org.apache.spark.mllib.linalg.Vectors
 
 val similarities: RDD[(Long, Long, Double)] = ...
 
-val pic = new PowerIteartionClustering()
+val pic = new PowerIterationClustering()
   .setK(3)
   .setMaxIterations(20)
 val model = pic.run(similarities)
@@ -325,6 +337,10 @@ val model = pic.run(similarities)
 model.assignments.foreach { a =>
   println(s"${a.id} -> ${a.cluster}")
 }
+
+// Save and load model
+model.save(sc, "myModelPath")
+val sameModel = PowerIterationClusteringModel.load(sc, "myModelPath")
 {% endhighlight %}
 
 A full example that produces the experiment described in the PIC paper can be 
found under
@@ -360,6 +376,10 @@ PowerIterationClusteringModel model = 
pic.run(similarities);
 for (PowerIterationClustering.Assignment a: 
model.assignments().toJavaRDD().collect()) {
   System.out.println(a.id() + " -> " + a.cluster());
 }
+
+// Save and load model
+model.save(sc.sc(), "myModelPath");
+PowerIterationClusteringModel sameModel = 
Pow

spark git commit: [SPARK-7949] [MLLIB] [DOC] update document with some missing save/load

2015-05-31 Thread jkbradley
Repository: spark
Updated Branches:
  refs/heads/master e1067d0ad -> 067470030


[SPARK-7949] [MLLIB] [DOC] update document with some missing save/load

add save load for examples:
KMeansModel
PowerIterationClusteringModel
Word2VecModel
IsotonicRegressionModel

Author: Yuhao Yang 

Closes #6498 from hhbyyh/docSaveLoad and squashes the following commits:

7f9f06d [Yuhao Yang] add missing imports
c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into 
docSaveLoad
1dd77cc [Yuhao Yang] update document with some missing save/load


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/06747003
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/06747003
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/06747003

Branch: refs/heads/master
Commit: 0674700303da3e4737d73f5fabd2a925ec712f63
Parents: e1067d0
Author: Yuhao Yang 
Authored: Sun May 31 11:51:49 2015 -0700
Committer: Joseph K. Bradley 
Committed: Sun May 31 11:51:49 2015 -0700

--
 docs/mllib-clustering.md  | 28 
 docs/mllib-feature-extraction.md  |  6 +-
 docs/mllib-isotonic-regression.md | 10 +-
 3 files changed, 38 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/06747003/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index f41ca70..dac22f7 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -47,7 +47,7 @@ Set Sum of Squared Error (WSSSE). You can reduce this error 
measure by increasin
 optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
 
 {% highlight scala %}
-import org.apache.spark.mllib.clustering.KMeans
+import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
 import org.apache.spark.mllib.linalg.Vectors
 
 // Load and parse the data
@@ -62,6 +62,10 @@ val clusters = KMeans.train(parsedData, numClusters, 
numIterations)
 // Evaluate clustering by computing Within Set Sum of Squared Errors
 val WSSSE = clusters.computeCost(parsedData)
 println("Within Set Sum of Squared Errors = " + WSSSE)
+
+// Save and load model
+clusters.save(sc, "myModelPath")
+val sameModel = KMeansModel.load(sc, "myModelPath")
 {% endhighlight %}
 
 
@@ -110,6 +114,10 @@ public class KMeansExample {
 // Evaluate clustering by computing Within Set Sum of Squared Errors
 double WSSSE = clusters.computeCost(parsedData.rdd());
 System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+
+// Save and load model
+clusters.save(sc.sc(), "myModelPath");
+KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
   }
 }
 {% endhighlight %}
@@ -124,7 +132,7 @@ Within Set Sum of Squared Error (WSSSE). You can reduce 
this error measure by in
 fact the optimal *k* is usually one where there is an "elbow" in the WSSSE 
graph.
 
 {% highlight python %}
-from pyspark.mllib.clustering import KMeans
+from pyspark.mllib.clustering import KMeans, KMeansModel
 from numpy import array
 from math import sqrt
 
@@ -143,6 +151,10 @@ def error(point):
 
 WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
 print("Within Set Sum of Squared Error = " + str(WSSSE))
+
+# Save and load model
+clusters.save(sc, "myModelPath")
+sameModel = KMeansModel.load(sc, "myModelPath")
 {% endhighlight %}
 
 
@@ -312,12 +324,12 @@ Calling `PowerIterationClustering.run` returns a
 which contains the computed clustering assignments.
 
 {% highlight scala %}
-import org.apache.spark.mllib.clustering.PowerIterationClustering
+import org.apache.spark.mllib.clustering.{PowerIterationClustering, 
PowerIterationClusteringModel}
 import org.apache.spark.mllib.linalg.Vectors
 
 val similarities: RDD[(Long, Long, Double)] = ...
 
-val pic = new PowerIteartionClustering()
+val pic = new PowerIterationClustering()
   .setK(3)
   .setMaxIterations(20)
 val model = pic.run(similarities)
@@ -325,6 +337,10 @@ val model = pic.run(similarities)
 model.assignments.foreach { a =>
   println(s"${a.id} -> ${a.cluster}")
 }
+
+// Save and load model
+model.save(sc, "myModelPath")
+val sameModel = PowerIterationClusteringModel.load(sc, "myModelPath")
 {% endhighlight %}
 
 A full example that produces the experiment described in the PIC paper can be 
found under
@@ -360,6 +376,10 @@ PowerIterationClusteringModel model = 
pic.run(similarities);
 for (PowerIterationClustering.Assignment a: 
model.assignments().toJavaRDD().collect()) {
   System.out.println(a.id() + " -> " + a.cluster());
 }
+
+// Save and load model
+model.save(sc.sc(), "myModelPath");
+PowerIterationClusteringModel sameModel = 
PowerIterationClusteringModel.load(sc.sc(), "myModelPath");
 {% endhighlight %}
 
 

http://git-wip-us.apache.org/r

svn commit: r1682772 - in /spark: faq.md site/faq.html site/streaming/index.html streaming/index.md

2015-05-31 Thread matei
Author: matei
Date: Sun May 31 19:04:53 2015
New Revision: 1682772

URL: http://svn.apache.org/r1682772
Log:
Some updates to FAQ on streaming

Modified:
spark/faq.md
spark/site/faq.html
spark/site/streaming/index.html
spark/streaming/index.md

Modified: spark/faq.md
URL: 
http://svn.apache.org/viewvc/spark/faq.md?rev=1682772&r1=1682771&r2=1682772&view=diff
==
--- spark/faq.md (original)
+++ spark/faq.md Sun May 31 19:04:53 2015
@@ -36,9 +36,6 @@ Spark is a fast and general processing e
 How can I access data in S3?
 Use the s3n:// URI scheme 
(s3n://bucket/path). You will also need to set your Amazon 
security credentials, either by setting the environment variables 
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY before 
your program runs, or by setting fs.s3.awsAccessKeyId and 
fs.s3.awsSecretAccessKey in 
SparkContext.hadoopConfiguration.
 
-Which languages does Spark support?
-Spark supports Scala, Java and Python.
-
 Does Spark require modified versions of Scala or 
Python?
 No. Spark requires no changes to Scala or compiler plugins. 
The Python API uses the standard CPython implementation, and can call into 
existing C libraries for Python such as NumPy.
 
@@ -48,9 +45,9 @@ Spark is a fast and general processing e
 
 In addition, Spark also has Java and Python 
APIs.
 
-What license is Spark under?
+I understand Spark Streaming uses micro-batching. Does 
this increase latency?
 
-Starting in version 0.8, Spark is under the http://www.apache.org/licenses/LICENSE-2.0.html";>Apache 2.0 license. 
Previous versions used the https://github.com/mesos/spark/blob/branch-0.7/LICENSE";>BSD 
license.
+While Spark does use a micro-batch execution model, this does not have much 
impact on applications, because the batches can be as short as 0.5 seconds. In 
most applications of streaming big data, the analytics is done over a larger 
window (say 10 minutes), or the latency to get data in is higher (e.g. sensors 
collect readings every 10 seconds). The benefit of Spark's micro-batch model is 
that it enables http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf";>exactly-once
 semantics, meaning the system can recover all intermediate state and 
results on failure.
 
 How can I contribute to Spark?
 

Modified: spark/site/faq.html
URL: 
http://svn.apache.org/viewvc/spark/site/faq.html?rev=1682772&r1=1682771&r2=1682772&view=diff
==
--- spark/site/faq.html (original)
+++ spark/site/faq.html Sun May 31 19:04:53 2015
@@ -196,9 +196,6 @@ Spark is a fast and general processing e
 How can I access data in S3?
 Use the s3n:// URI scheme 
(s3n://bucket/path). You will also need to set your Amazon 
security credentials, either by setting the environment variables 
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY before 
your program runs, or by setting fs.s3.awsAccessKeyId and 
fs.s3.awsSecretAccessKey in 
SparkContext.hadoopConfiguration.
 
-Which languages does Spark support?
-Spark supports Scala, Java and Python.
-
 Does Spark require modified versions of Scala or 
Python?
 No. Spark requires no changes to Scala or compiler plugins. 
The Python API uses the standard CPython implementation, and can call into 
existing C libraries for Python such as NumPy.
 
@@ -208,9 +205,9 @@ Spark is a fast and general processing e
 
 In addition, Spark also has Java and Python APIs.
 
-What license is Spark under?
+I understand Spark Streaming uses micro-batching. Does 
this increase latency?
 
-Starting in version 0.8, Spark is under the http://www.apache.org/licenses/LICENSE-2.0.html";>Apache 2.0 license. 
Previous versions used the https://github.com/mesos/spark/blob/branch-0.7/LICENSE";>BSD 
license.
+While Spark does use a micro-batch execution model, this does not have much 
impact on applications, because the batches can be as short as 0.5 seconds. In 
most applications of streaming big data, the analytics is done over a larger 
window (say 10 minutes), or the latency to get data in is higher (e.g. sensors 
collect readings every 10 seconds). The benefit of Spark's micro-batch model is 
that it enables http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf";>exactly-once
 semantics, meaning the system can recover all intermediate state and 
results on failure.
 
 How can I contribute to Spark?
 

Modified: spark/site/streaming/index.html
URL: 
http://svn.apache.org/viewvc/spark/site/streaming/index.html?rev=1682772&r1=1682771&r2=1682772&view=diff
==
--- spark/site/streaming/index.html (original)
+++ spark/site/streaming/index.html Sun May 31 19:04:53 2015
@@ -182,9 +182,9 @@
   Build applications through high-level operators.
 
 
-  Spark Streaming brings Spark's
-  language-integrated API to stream processing,
-  letting you write streaming 

svn commit: r1682773 - in /spark: _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2015-05-31 Thread matei
Author: matei
Date: Sun May 31 19:06:00 2015
New Revision: 1682773

URL: http://svn.apache.org/r1682773
Log:
Consistent wording of packages link

Modified:
spark/_layouts/global.html
spark/site/community.html
spark/site/documentation.html
spark/site/downloads.html
spark/site/examples.html
spark/site/faq.html
spark/site/graphx/index.html
spark/site/index.html
spark/site/mailing-lists.html
spark/site/mllib/index.html
spark/site/news/amp-camp-2013-registration-ope.html
spark/site/news/announcing-the-first-spark-summit.html
spark/site/news/fourth-spark-screencast-published.html
spark/site/news/index.html
spark/site/news/nsdi-paper.html
spark/site/news/one-month-to-spark-summit-2015.html
spark/site/news/proposals-open-for-spark-summit-east.html
spark/site/news/registration-open-for-spark-summit-east.html
spark/site/news/run-spark-and-shark-on-amazon-emr.html
spark/site/news/spark-0-6-1-and-0-5-2-released.html
spark/site/news/spark-0-6-2-released.html
spark/site/news/spark-0-7-0-released.html
spark/site/news/spark-0-7-2-released.html
spark/site/news/spark-0-7-3-released.html
spark/site/news/spark-0-8-0-released.html
spark/site/news/spark-0-8-1-released.html
spark/site/news/spark-0-9-0-released.html
spark/site/news/spark-0-9-1-released.html
spark/site/news/spark-0-9-2-released.html
spark/site/news/spark-1-0-0-released.html
spark/site/news/spark-1-0-1-released.html
spark/site/news/spark-1-0-2-released.html
spark/site/news/spark-1-1-0-released.html
spark/site/news/spark-1-1-1-released.html
spark/site/news/spark-1-2-0-released.html
spark/site/news/spark-1-2-1-released.html
spark/site/news/spark-1-2-2-released.html
spark/site/news/spark-1-3-0-released.html
spark/site/news/spark-accepted-into-apache-incubator.html
spark/site/news/spark-and-shark-in-the-news.html
spark/site/news/spark-becomes-tlp.html
spark/site/news/spark-featured-in-wired.html
spark/site/news/spark-mailing-lists-moving-to-apache.html
spark/site/news/spark-meetups.html
spark/site/news/spark-screencasts-published.html
spark/site/news/spark-summit-2013-is-a-wrap.html
spark/site/news/spark-summit-2014-videos-posted.html
spark/site/news/spark-summit-agenda-posted.html
spark/site/news/spark-summit-east-2015-videos-posted.html
spark/site/news/spark-summit-east-agenda-posted.html
spark/site/news/spark-summit-europe.html
spark/site/news/spark-tips-from-quantifind.html
spark/site/news/spark-user-survey-and-powered-by-page.html
spark/site/news/spark-version-0-6-0-released.html
spark/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html
spark/site/news/strata-exercises-now-available-online.html
spark/site/news/submit-talks-to-spark-summit-2014.html
spark/site/news/two-weeks-to-spark-summit-2014.html
spark/site/news/video-from-first-spark-development-meetup.html
spark/site/releases/spark-release-0-3.html
spark/site/releases/spark-release-0-5-0.html
spark/site/releases/spark-release-0-5-1.html
spark/site/releases/spark-release-0-5-2.html
spark/site/releases/spark-release-0-6-0.html
spark/site/releases/spark-release-0-6-1.html
spark/site/releases/spark-release-0-6-2.html
spark/site/releases/spark-release-0-7-0.html
spark/site/releases/spark-release-0-7-2.html
spark/site/releases/spark-release-0-7-3.html
spark/site/releases/spark-release-0-8-0.html
spark/site/releases/spark-release-0-8-1.html
spark/site/releases/spark-release-0-9-0.html
spark/site/releases/spark-release-0-9-1.html
spark/site/releases/spark-release-0-9-2.html
spark/site/releases/spark-release-1-0-0.html
spark/site/releases/spark-release-1-0-1.html
spark/site/releases/spark-release-1-0-2.html
spark/site/releases/spark-release-1-1-0.html
spark/site/releases/spark-release-1-1-1.html
spark/site/releases/spark-release-1-2-0.html
spark/site/releases/spark-release-1-2-1.html
spark/site/releases/spark-release-1-2-2.html
spark/site/releases/spark-release-1-3-0.html
spark/site/releases/spark-release-1-3-1.html
spark/site/research.html
spark/site/screencasts/1-first-steps-with-spark.html
spark/site/screencasts/2-spark-documentation-overview.html
spark/site/screencasts/3-transformations-and-caching.html
spark/site/screencasts/4-a-standalone-job-in-spark.html
spark/site/screencasts/index.html
spark/site/sql/index.html
spark/site/streaming/index.html

Modified: spark/_layouts/global.html
URL: 
http://svn.apache.org/viewvc/spark/_layouts/global.html?rev=1682773&r1=1682772&r2=1682773&view=diff
==
--- spark/_layouts/global.html (original)
+++ spark/_layouts/global.html Sun May 31 19:06:00 2015
@@ -110,7 +110,7 @@
   MLlib (machine learning)
   GraphX (graph)
 

spark git commit: [SPARK-3850] Turn style checker on for trailing whitespaces.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 4d5ce4677 -> bab0fab68


[SPARK-3850] Turn style checker on for trailing whitespaces.

Author: Reynold Xin 

Closes #6541 from rxin/trailing-whitespace-on and squashes the following 
commits:

f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing 
whitespaces.

(cherry picked from commit 866652c903d06d1cb4356283e0741119d84dcc21)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bab0fab6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bab0fab6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bab0fab6

Branch: refs/heads/branch-1.4
Commit: bab0fab68ffb75f5b7be23ae52e381ef4f5c3087
Parents: 4d5ce46
Author: Reynold Xin 
Authored: Sun May 31 14:23:42 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 14:23:48 2015 -0700

--
 scalastyle-config.xml | 3 +++
 .../main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala  | 2 +-
 .../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala  | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bab0fab6/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index d544cfb..dd4eb8c 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -50,6 +50,9 @@
  */]]>
   
  
+
+  
+
  
  
  

http://git-wip-us.apache.org/repos/asf/spark/blob/bab0fab6/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index 5d106c1..b624eaa 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -43,7 +43,7 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
 
   /**
* Calculates the correlation of two columns of a DataFrame. Currently only 
supports the Pearson
-   * Correlation Coefficient. For Spearman Correlation, consider using RDD 
methods found in 
+   * Correlation Coefficient. For Spearman Correlation, consider using RDD 
methods found in
* MLlib's Statistics.
*
* @param col1 the name of the column

http://git-wip-us.apache.org/repos/asf/spark/blob/bab0fab6/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
index 4af31d4..440b7c8 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
@@ -57,7 +57,7 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 // https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
 sql(
   """
-|CREATE TEMPORARY FUNCTION udtf_count2 
+|CREATE TEMPORARY FUNCTION udtf_count2
 |AS 'org.apache.spark.sql.hive.execution.GenericUDTFCount2'
   """.stripMargin)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-3850] Turn style checker on for trailing whitespaces.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 067470030 -> 866652c90


[SPARK-3850] Turn style checker on for trailing whitespaces.

Author: Reynold Xin 

Closes #6541 from rxin/trailing-whitespace-on and squashes the following 
commits:

f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing 
whitespaces.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/866652c9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/866652c9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/866652c9

Branch: refs/heads/master
Commit: 866652c903d06d1cb4356283e0741119d84dcc21
Parents: 0674700
Author: Reynold Xin 
Authored: Sun May 31 14:23:42 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 14:23:42 2015 -0700

--
 scalastyle-config.xml | 3 +++
 .../main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala  | 2 +-
 .../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala  | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/866652c9/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 75ef1e9..f52b095 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -50,6 +50,9 @@
  */]]>
   
  
+
+  
+
  
  
  

http://git-wip-us.apache.org/repos/asf/spark/blob/866652c9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index 5d106c1..b624eaa 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -43,7 +43,7 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
 
   /**
* Calculates the correlation of two columns of a DataFrame. Currently only 
supports the Pearson
-   * Correlation Coefficient. For Spearman Correlation, consider using RDD 
methods found in 
+   * Correlation Coefficient. For Spearman Correlation, consider using RDD 
methods found in
* MLlib's Statistics.
*
* @param col1 the name of the column

http://git-wip-us.apache.org/repos/asf/spark/blob/866652c9/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
index 4af31d4..440b7c8 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
@@ -57,7 +57,7 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 // https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
 sql(
   """
-|CREATE TEMPORARY FUNCTION udtf_count2 
+|CREATE TEMPORARY FUNCTION udtf_count2
 |AS 'org.apache.spark.sql.hive.execution.GenericUDTFCount2'
   """.stripMargin)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.

2015-05-31 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 bab0fab68 -> f1d4e7e31


[SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.

Author: Sun Rui 

Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:

dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for 
dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.

(cherry picked from commit 46576ab303e50c54c3bd464f8939953efe644574)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f1d4e7e3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f1d4e7e3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f1d4e7e3

Branch: refs/heads/branch-1.4
Commit: f1d4e7e3111a6a44358d405389180d6cf6406223
Parents: bab0fab
Author: Sun Rui 
Authored: Sun May 31 15:01:21 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Sun May 31 15:02:16 2015 -0700

--
 R/pkg/NAMESPACE |   2 +
 R/pkg/R/DataFrame.R | 125 +++
 R/pkg/R/generics.R  |  18 +++
 R/pkg/R/serialize.R |  10 +-
 R/pkg/inst/tests/test_sparkSQL.R| 109 
 .../scala/org/apache/spark/api/r/SerDe.scala|   6 +-
 6 files changed, 267 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f1d4e7e3/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 411126a..f9447f6 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -19,9 +19,11 @@ exportMethods("arrange",
   "count",
   "describe",
   "distinct",
+  "dropna",
   "dtypes",
   "except",
   "explain",
+  "fillna",
   "filter",
   "first",
   "group_by",

http://git-wip-us.apache.org/repos/asf/spark/blob/f1d4e7e3/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index e79d324..0af5cb8 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1429,3 +1429,128 @@ setMethod("describe",
 sdf <- callJMethod(x@sdf, "describe", listToSeq(colList))
 dataFrame(sdf)
   })
+
+#' dropna
+#'
+#' Returns a new DataFrame omitting rows with null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param how "any" or "all".
+#'if "any", drop a row if it contains any nulls.
+#'if "all", drop a row only if all its values are null.
+#'if minNonNulls is specified, how is ignored.
+#' @param minNonNulls If specified, drop rows that have less than
+#'minNonNulls non-null values.
+#'This overwrites the how parameter.
+#' @param cols Optional list of column names to consider.
+#' @return A DataFrame
+#' 
+#' @rdname nafunctions
+#' @export
+#' @examples
+#'\dontrun{
+#' sc <- sparkR.init()
+#' sqlCtx <- sparkRSQL.init(sc)
+#' path <- "path/to/file.json"
+#' df <- jsonFile(sqlCtx, path)
+#' dropna(df)
+#' }
+setMethod("dropna",
+  signature(x = "DataFrame"),
+  function(x, how = c("any", "all"), minNonNulls = NULL, cols = NULL) {
+how <- match.arg(how)
+if (is.null(cols)) {
+  cols <- columns(x)
+}
+if (is.null(minNonNulls)) {
+  minNonNulls <- if (how == "any") { length(cols) } else { 1 }
+}
+
+naFunctions <- callJMethod(x@sdf, "na")
+sdf <- callJMethod(naFunctions, "drop",
+   as.integer(minNonNulls), 
listToSeq(as.list(cols)))
+dataFrame(sdf)
+  })
+
+#' @aliases dropna
+#' @export
+setMethod("na.omit",
+  signature(x = "DataFrame"),
+  function(x, how = c("any", "all"), minNonNulls = NULL, cols = NULL) {
+dropna(x, how, minNonNulls, cols)
+  })
+
+#' fillna
+#'
+#' Replace null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param value Value to replace null values with.
+#'  Should be an integer, numeric, character or named list.
+#'  If the value is a named list, then cols is ignored and
+#'  value must be a mapping from column name (character) to 
+#'  replacement value. The replacement value must be an
+#'  integer, numeric or character.
+#' @param cols optional list of column names to consider.
+#' Columns specified in cols that do not have matching data
+#' type are ignored. For example, if value is a character, and 
+#' s

spark git commit: [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.

2015-05-31 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master 866652c90 -> 46576ab30


[SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.

Author: Sun Rui 

Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:

dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for 
dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/46576ab3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/46576ab3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/46576ab3

Branch: refs/heads/master
Commit: 46576ab303e50c54c3bd464f8939953efe644574
Parents: 866652c
Author: Sun Rui 
Authored: Sun May 31 15:01:21 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Sun May 31 15:01:59 2015 -0700

--
 R/pkg/NAMESPACE |   2 +
 R/pkg/R/DataFrame.R | 125 +++
 R/pkg/R/generics.R  |  18 +++
 R/pkg/R/serialize.R |  10 +-
 R/pkg/inst/tests/test_sparkSQL.R| 109 
 .../scala/org/apache/spark/api/r/SerDe.scala|   6 +-
 6 files changed, 267 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/46576ab3/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 411126a..f9447f6 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -19,9 +19,11 @@ exportMethods("arrange",
   "count",
   "describe",
   "distinct",
+  "dropna",
   "dtypes",
   "except",
   "explain",
+  "fillna",
   "filter",
   "first",
   "group_by",

http://git-wip-us.apache.org/repos/asf/spark/blob/46576ab3/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index e79d324..0af5cb8 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1429,3 +1429,128 @@ setMethod("describe",
 sdf <- callJMethod(x@sdf, "describe", listToSeq(colList))
 dataFrame(sdf)
   })
+
+#' dropna
+#'
+#' Returns a new DataFrame omitting rows with null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param how "any" or "all".
+#'if "any", drop a row if it contains any nulls.
+#'if "all", drop a row only if all its values are null.
+#'if minNonNulls is specified, how is ignored.
+#' @param minNonNulls If specified, drop rows that have less than
+#'minNonNulls non-null values.
+#'This overwrites the how parameter.
+#' @param cols Optional list of column names to consider.
+#' @return A DataFrame
+#' 
+#' @rdname nafunctions
+#' @export
+#' @examples
+#'\dontrun{
+#' sc <- sparkR.init()
+#' sqlCtx <- sparkRSQL.init(sc)
+#' path <- "path/to/file.json"
+#' df <- jsonFile(sqlCtx, path)
+#' dropna(df)
+#' }
+setMethod("dropna",
+  signature(x = "DataFrame"),
+  function(x, how = c("any", "all"), minNonNulls = NULL, cols = NULL) {
+how <- match.arg(how)
+if (is.null(cols)) {
+  cols <- columns(x)
+}
+if (is.null(minNonNulls)) {
+  minNonNulls <- if (how == "any") { length(cols) } else { 1 }
+}
+
+naFunctions <- callJMethod(x@sdf, "na")
+sdf <- callJMethod(naFunctions, "drop",
+   as.integer(minNonNulls), 
listToSeq(as.list(cols)))
+dataFrame(sdf)
+  })
+
+#' @aliases dropna
+#' @export
+setMethod("na.omit",
+  signature(x = "DataFrame"),
+  function(x, how = c("any", "all"), minNonNulls = NULL, cols = NULL) {
+dropna(x, how, minNonNulls, cols)
+  })
+
+#' fillna
+#'
+#' Replace null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param value Value to replace null values with.
+#'  Should be an integer, numeric, character or named list.
+#'  If the value is a named list, then cols is ignored and
+#'  value must be a mapping from column name (character) to 
+#'  replacement value. The replacement value must be an
+#'  integer, numeric or character.
+#' @param cols optional list of column names to consider.
+#' Columns specified in cols that do not have matching data
+#' type are ignored. For example, if value is a character, and 
+#' subset contains a non-character column, then the non-character
+#' column is simply ignored.
+#' @return 

spark git commit: [MINOR] Enable PySpark SQL readerwriter and window tests

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 46576ab30 -> 9126ea4d1


[MINOR] Enable PySpark SQL readerwriter and window tests

PySpark SQL's `readerwriter` and `window` doctests weren't being run by our 
test runner script; this patch re-enables them.

Author: Josh Rosen 

Closes #6542 from JoshRosen/enable-more-pyspark-sql-tests and squashes the 
following commits:

9f46ce4 [Josh Rosen] Enable PySpark SQL readerwriter and window tests.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9126ea4d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9126ea4d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9126ea4d

Branch: refs/heads/master
Commit: 9126ea4d1c5c468f3662e76e0231b4d64c7c9699
Parents: 46576ab
Author: Josh Rosen 
Authored: Sun May 31 15:17:05 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 15:17:05 2015 -0700

--
 python/run-tests | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9126ea4d/python/run-tests
--
diff --git a/python/run-tests b/python/run-tests
index fcfb495..17dda3e 100755
--- a/python/run-tests
+++ b/python/run-tests
@@ -76,6 +76,8 @@ function run_sql_tests() {
 run_test "pyspark.sql.dataframe"
 run_test "pyspark.sql.group"
 run_test "pyspark.sql.functions"
+run_test "pyspark.sql.readwriter"
+run_test "pyspark.sql.window"
 run_test "pyspark.sql.tests"
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [HOTFIX] Remove trailing whitespace to fix Scalastyle checks

2015-05-31 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 f1d4e7e31 -> df0bf71ee


[HOTFIX] Remove trailing whitespace to fix Scalastyle checks

866652c903d06d1cb4356283e0741119d84dcc21 enabled this check.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df0bf71e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df0bf71e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df0bf71e

Branch: refs/heads/branch-1.4
Commit: df0bf71ee0db6cc67f19df34b9eefb960deaca82
Parents: f1d4e7e
Author: Josh Rosen 
Authored: Sun May 31 16:34:20 2015 -0700
Committer: Josh Rosen 
Committed: Sun May 31 16:34:20 2015 -0700

--
 .../src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala | 6 +++---
 .../apache/spark/sql/catalyst/expressions/complexTypes.scala   | 2 +-
 .../spark/sql/catalyst/expressions/stringOperations.scala  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/df0bf71e/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
index b8e15f3..7b92f61 100644
--- a/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
+++ b/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
@@ -60,7 +60,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends 
Serializable {
 
   @deprecated("Use partitions() instead.", "1.1.0")
   def splits: JList[Partition] = new java.util.ArrayList(rdd.partitions.toSeq)
-  
+
   /** Set of partitions in this RDD. */
   def partitions: JList[Partition] = new 
java.util.ArrayList(rdd.partitions.toSeq)
 
@@ -492,9 +492,9 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends 
Serializable {
 new java.util.ArrayList(arr)
   }
 
-  def takeSample(withReplacement: Boolean, num: Int): JList[T] = 
+  def takeSample(withReplacement: Boolean, num: Int): JList[T] =
 takeSample(withReplacement, num, Utils.random.nextLong)
-
+
   def takeSample(withReplacement: Boolean, num: Int, seed: Long): JList[T] = {
 import scala.collection.JavaConversions._
 val arr: java.util.Collection[T] = rdd.takeSample(withReplacement, num, 
seed).toSeq

http://git-wip-us.apache.org/repos/asf/spark/blob/df0bf71e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala
index f4d91d5..98c1c40 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.types._
  */
 case class CreateArray(children: Seq[Expression]) extends Expression {
   override type EvaluatedType = Any
-  
+
   override def foldable: Boolean = children.forall(_.foldable)
 
   lazy val childTypes = children.map(_.dataType).distinct

http://git-wip-us.apache.org/repos/asf/spark/blob/df0bf71e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
index 219ed4e..f76e0ca 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
@@ -211,7 +211,7 @@ case class EndsWith(left: Expression, right: Expression)
  */
 case class Substring(str: Expression, pos: Expression, len: Expression)
   extends Expression with ExpectsInputTypes {
-  
+
   type EvaluatedType = Any
 
   override def foldable: Boolean = str.foldable && pos.foldable && len.foldable


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7986] Split scalastyle config into 3 sections.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 9126ea4d1 -> 6f006b5f5


[SPARK-7986] Split scalastyle config into 3 sections.

(1) rules that we enforce.
(2) rules that we would like to enforce, but haven't cleaned up the codebase to
turn on yet (or we need to make the scalastyle rule more configurable).
(3) rules that we don't want to enforce.

Author: Reynold Xin 

Closes #6543 from rxin/scalastyle and squashes the following commits:

beefaab [Reynold Xin] [SPARK-7986] Split scalastyle config into 3 sections.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f006b5f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f006b5f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f006b5f

Branch: refs/heads/master
Commit: 6f006b5f5fca649ac51745212d8fd44b1609b9ae
Parents: 9126ea4
Author: Reynold Xin 
Authored: Sun May 31 18:04:57 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 18:04:57 2015 -0700

--
 scalastyle-config.xml | 290 +++--
 1 file changed, 174 insertions(+), 116 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f006b5f/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index f52b095..d6f927b 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -14,25 +14,41 @@
   ~ See the License for the specific language governing permissions and
   ~ limitations under the License.
   -->
-
-
-
-
-
-
+
 
 
- Scalastyle standard configuration
- 
- 
- 
- 
- 
- 
- 
-  
-   
-  
- 
+
+  
+
+  
+
+  
 
   
 
- 
- 
- 
- 
-  
-   
-   
-   true
-  
- 
- 
-  
-   
-  
- 
- 
-  
-   
-  
- 
- 
-  
-   
-  
- 
- 
- 
- 
- 
- 
- 
- 
-  
-   
-  
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
+  
+
+  
+  
+  true
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
   
+
   
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
 
   
 
- 
- 
- 
-  
-   
-   
-  
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  

  ARROW, EQUALS, ELSE, TRY, CATCH, FINALLY, 
LARROW, RARROW

- 
+  
+
   
 
  ARROW, EQUALS, COMMA, COLON, IF, ELSE, DO, 
WHILE, FOR, MATCH, TRY, CATCH, FINALLY, LARROW, RARROW
 
   
+
+  
   
+
   
-  
-   
-^FunSuite[A-Za-z]*$
-   
-   Tests must extend org.apache.spark.SparkFunSuite 
instead.
+  
+^FunSuite[A-Za-z]*$
+Tests must extend org.apache.spark.SparkFunSuite 
instead.
+  
+
+  
+  
+  
+
+  
+  
+^println$
+
+  
+
+  
+  
+  
+  
+
+  
+  
+  
+
+
+
+  
+
+  
+  
+
+  
+  
+  
+
+  
+
   
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+800>
+  
+
+  
+  
+30
+  
+
+  
+  
+10
+  
+
+  
+  
+50
+  
+
+  
+  
+  
+
+  
+
+  
+  
+-1,0,1,2,3
+  
+
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 df0bf71ee -> 78a6723e8


[SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

Author: Davies Liu 

Closes #6532 from davies/decimal and squashes the following commits:

c7fcbce [Davies Liu] Update tests.py
1425359 [Davies Liu] DecimalType should not be singleton

(cherry picked from commit 91777a1c3ad3b3ec7b65d5a0413209a9baf6b36a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78a6723e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78a6723e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78a6723e

Branch: refs/heads/branch-1.4
Commit: 78a6723e8758b429f877166973cc4f1bbfce73c4
Parents: df0bf71
Author: Davies Liu 
Authored: Sun May 31 19:55:57 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 19:56:03 2015 -0700

--
 python/pyspark/sql/_types.py | 18 --
 python/pyspark/sql/tests.py  |  9 +
 2 files changed, 25 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/78a6723e/python/pyspark/sql/_types.py
--
diff --git a/python/pyspark/sql/_types.py b/python/pyspark/sql/_types.py
index 9e7e9f0..b6ec613 100644
--- a/python/pyspark/sql/_types.py
+++ b/python/pyspark/sql/_types.py
@@ -97,8 +97,6 @@ class AtomicType(DataType):
 """An internal type used to represent everything that is not
 null, UDTs, arrays, structs, and maps."""
 
-__metaclass__ = DataTypeSingleton
-
 
 class NumericType(AtomicType):
 """Numeric data types.
@@ -109,6 +107,8 @@ class IntegralType(NumericType):
 """Integral data types.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class FractionalType(NumericType):
 """Fractional data types.
@@ -119,26 +119,36 @@ class StringType(AtomicType):
 """String data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class BinaryType(AtomicType):
 """Binary (byte array) data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class BooleanType(AtomicType):
 """Boolean data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class DateType(AtomicType):
 """Date (datetime.date) data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class TimestampType(AtomicType):
 """Timestamp (datetime.datetime) data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class DecimalType(FractionalType):
 """Decimal (decimal.Decimal) data type.
@@ -172,11 +182,15 @@ class DoubleType(FractionalType):
 """Double data type, representing double precision floats.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class FloatType(FractionalType):
 """Float data type, representing single precision floats.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class ByteType(IntegralType):
 """Byte data type, i.e. a signed integer in a single byte.

http://git-wip-us.apache.org/repos/asf/spark/blob/78a6723e/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 5c53c3a..76384d3 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -100,6 +100,15 @@ class DataTypeTests(unittest.TestCase):
 lt2 = pickle.loads(pickle.dumps(LongType()))
 self.assertEquals(lt, lt2)
 
+# regression test for SPARK-7978
+def test_decimal_type(self):
+t1 = DecimalType()
+t2 = DecimalType(10, 2)
+self.assertTrue(t2 is not t1)
+self.assertNotEqual(t1, t2)
+t3 = DecimalType(8)
+self.assertNotEqual(t2, t3)
+
 
 class SQLTests(ReusedPySparkTestCase):
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 6f006b5f5 -> 91777a1c3


[SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

Author: Davies Liu 

Closes #6532 from davies/decimal and squashes the following commits:

c7fcbce [Davies Liu] Update tests.py
1425359 [Davies Liu] DecimalType should not be singleton


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/91777a1c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/91777a1c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/91777a1c

Branch: refs/heads/master
Commit: 91777a1c3ad3b3ec7b65d5a0413209a9baf6b36a
Parents: 6f006b5
Author: Davies Liu 
Authored: Sun May 31 19:55:57 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 19:55:57 2015 -0700

--
 python/pyspark/sql/tests.py |  9 +
 python/pyspark/sql/types.py | 18 --
 2 files changed, 25 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/91777a1c/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 5c53c3a..76384d3 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -100,6 +100,15 @@ class DataTypeTests(unittest.TestCase):
 lt2 = pickle.loads(pickle.dumps(LongType()))
 self.assertEquals(lt, lt2)
 
+# regression test for SPARK-7978
+def test_decimal_type(self):
+t1 = DecimalType()
+t2 = DecimalType(10, 2)
+self.assertTrue(t2 is not t1)
+self.assertNotEqual(t1, t2)
+t3 = DecimalType(8)
+self.assertNotEqual(t2, t3)
+
 
 class SQLTests(ReusedPySparkTestCase):
 

http://git-wip-us.apache.org/repos/asf/spark/blob/91777a1c/python/pyspark/sql/types.py
--
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 9e7e9f0..b6ec613 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -97,8 +97,6 @@ class AtomicType(DataType):
 """An internal type used to represent everything that is not
 null, UDTs, arrays, structs, and maps."""
 
-__metaclass__ = DataTypeSingleton
-
 
 class NumericType(AtomicType):
 """Numeric data types.
@@ -109,6 +107,8 @@ class IntegralType(NumericType):
 """Integral data types.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class FractionalType(NumericType):
 """Fractional data types.
@@ -119,26 +119,36 @@ class StringType(AtomicType):
 """String data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class BinaryType(AtomicType):
 """Binary (byte array) data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class BooleanType(AtomicType):
 """Boolean data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class DateType(AtomicType):
 """Date (datetime.date) data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class TimestampType(AtomicType):
 """Timestamp (datetime.datetime) data type.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class DecimalType(FractionalType):
 """Decimal (decimal.Decimal) data type.
@@ -172,11 +182,15 @@ class DoubleType(FractionalType):
 """Double data type, representing double precision floats.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class FloatType(FractionalType):
 """Float data type, representing single precision floats.
 """
 
+__metaclass__ = DataTypeSingleton
+
 
 class ByteType(IntegralType):
 """Byte data type, i.e. a signed integer in a single byte.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-7952][SPARK-7984][SQL] equality check between boolean type and numeric type is broken.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 91777a1c3 -> a0e46a0d2


[SPARK-7952][SPARK-7984][SQL] equality check between boolean type and numeric 
type is broken.

The origin code has several problems:
* `true <=> 1` will return false as we didn't set a rule to handle it.
* `true = a` where `a` is not `Literal` and its value is 1, will return false 
as we only handle literal values.

Author: Wenchen Fan 

Closes #6505 from cloud-fan/tmp1 and squashes the following commits:

77f0f39 [Wenchen Fan] minor fix
b6401ba [Wenchen Fan] add type coercion for CaseKeyWhen and address comments
ebc8c61 [Wenchen Fan] use SQLTestUtils and If
625973c [Wenchen Fan] improve
9ba2130 [Wenchen Fan] address comments
fc0d741 [Wenchen Fan] fix style
2846a04 [Wenchen Fan] fix 7952


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0e46a0d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0e46a0d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0e46a0d

Branch: refs/heads/master
Commit: a0e46a0d2ad23ce6a64e6ebdf2ccc776208696b6
Parents: 91777a1
Author: Wenchen Fan 
Authored: Sun May 31 21:01:46 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 21:01:46 2015 -0700

--
 .../catalyst/analysis/HiveTypeCoercion.scala| 101 ++-
 .../sql/catalyst/expressions/predicates.scala   |   5 +-
 .../analysis/HiveTypeCoercionSuite.scala|  55 --
 .../expressions/ExpressionEvaluationSuite.scala |   8 +-
 .../org/apache/spark/sql/SQLQuerySuite.scala|  36 +--
 5 files changed, 158 insertions(+), 47 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a0e46a0d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index 96d7b96..edcc918 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -76,7 +76,7 @@ trait HiveTypeCoercion {
 WidenTypes ::
 PromoteStrings ::
 DecimalPrecision ::
-BooleanComparisons ::
+BooleanEqualization ::
 StringToIntegralCasts ::
 FunctionArgumentConversion ::
 CaseWhenCoercion ::
@@ -119,7 +119,7 @@ trait HiveTypeCoercion {
* the appropriate numeric equivalent.
*/
   object ConvertNaNs extends Rule[LogicalPlan] {
-val stringNaN = Literal("NaN")
+private val stringNaN = Literal("NaN")
 
 def apply(plan: LogicalPlan): LogicalPlan = plan transform {
   case q: LogicalPlan => q transformExpressions {
@@ -349,17 +349,17 @@ trait HiveTypeCoercion {
 import scala.math.{max, min}
 
 // Conversion rules for integer types into fixed-precision decimals
-val intTypeToFixed: Map[DataType, DecimalType] = Map(
+private val intTypeToFixed: Map[DataType, DecimalType] = Map(
   ByteType -> DecimalType(3, 0),
   ShortType -> DecimalType(5, 0),
   IntegerType -> DecimalType(10, 0),
   LongType -> DecimalType(20, 0)
 )
 
-def isFloat(t: DataType): Boolean = t == FloatType || t == DoubleType
+private def isFloat(t: DataType): Boolean = t == FloatType || t == 
DoubleType
 
 // Conversion rules for float and double into fixed-precision decimals
-val floatTypeToFixed: Map[DataType, DecimalType] = Map(
+private val floatTypeToFixed: Map[DataType, DecimalType] = Map(
   FloatType -> DecimalType(7, 7),
   DoubleType -> DecimalType(15, 15)
 )
@@ -482,30 +482,66 @@ trait HiveTypeCoercion {
   }
 
   /**
-   * Changes Boolean values to Bytes so that expressions like true < false can 
be Evaluated.
+   * Changes numeric values to booleans so that expressions like true = 1 can 
be evaluated.
*/
-  object BooleanComparisons extends Rule[LogicalPlan] {
-val trueValues = Seq(1, 1L, 1.toByte, 1.toShort, new 
java.math.BigDecimal(1)).map(Literal(_))
-val falseValues = Seq(0, 0L, 0.toByte, 0.toShort, new 
java.math.BigDecimal(0)).map(Literal(_))
+  object BooleanEqualization extends Rule[LogicalPlan] {
+private val trueValues = Seq(1.toByte, 1.toShort, 1, 1L, new 
java.math.BigDecimal(1))
+private val falseValues = Seq(0.toByte, 0.toShort, 0, 0L, new 
java.math.BigDecimal(0))
+
+private def buildCaseKeyWhen(booleanExpr: Expression, numericExpr: 
Expression) = {
+  CaseKeyWhen(numericExpr, Seq(
+Literal(trueValues.head), booleanExpr,
+Literal(falseValues.head), Not(booleanExpr),
+Literal(false)))
+}
+
+private def transform(booleanExpr: Expression, 

spark git commit: Update README to include DataFrames and zinc.

2015-05-31 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master a0e46a0d2 -> 3c0156899


Update README to include DataFrames and zinc.

Also cut trailing whitespaces.

Author: Reynold Xin 

Closes #6548 from rxin/readme and squashes the following commits:

630efc3 [Reynold Xin] Update README to include DataFrames and zinc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3c015689
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3c015689
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3c015689

Branch: refs/heads/master
Commit: 3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf
Parents: a0e46a0
Author: Reynold Xin 
Authored: Sun May 31 23:55:45 2015 -0700
Committer: Reynold Xin 
Committed: Sun May 31 23:55:45 2015 -0700

--
 README.md | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3c015689/README.md
--
diff --git a/README.md b/README.md
index 9c09d40..380422c 100644
--- a/README.md
+++ b/README.md
@@ -3,8 +3,8 @@
 Spark is a fast and general cluster computing system for Big Data. It provides
 high-level APIs in Scala, Java, and Python, and an optimized engine that
 supports general computation graphs for data analysis. It also supports a
-rich set of higher-level tools including Spark SQL for SQL and structured
-data processing, MLlib for machine learning, GraphX for graph processing,
+rich set of higher-level tools including Spark SQL for SQL and DataFrames,
+MLlib for machine learning, GraphX for graph processing,
 and Spark Streaming for stream processing.
 
 
@@ -22,7 +22,7 @@ This README file only contains basic setup instructions.
 Spark is built using [Apache Maven](http://maven.apache.org/).
 To build Spark and its example programs, run:
 
-mvn -DskipTests clean package
+build/mvn -DskipTests clean package
 
 (You do not need to do this if you downloaded a pre-built package.)
 More detailed documentation is available from the project site, at
@@ -43,7 +43,7 @@ Try the following command, which should return 1000:
 Alternatively, if you prefer Python, you can use the Python shell:
 
 ./bin/pyspark
-
+
 And run the following command, which should also return 1000:
 
 >>> sc.parallelize(range(1000)).count()
@@ -58,9 +58,9 @@ To run one of them, use `./bin/run-example  [params]`. 
For example:
 will run the Pi example locally.
 
 You can set the MASTER environment variable when running examples to submit
-examples to a cluster. This can be a mesos:// or spark:// URL, 
-"yarn-cluster" or "yarn-client" to run on YARN, and "local" to run 
-locally with one thread, or "local[N]" to run locally with N threads. You 
+examples to a cluster. This can be a mesos:// or spark:// URL,
+"yarn-cluster" or "yarn-client" to run on YARN, and "local" to run
+locally with one thread, or "local[N]" to run locally with N threads. You
 can also use an abbreviated class name if the class is in the `examples`
 package. For instance:
 
@@ -75,7 +75,7 @@ can be run using:
 
 ./dev/run-tests
 
-Please see the guidance on how to 
+Please see the guidance on how to
 [run tests for a module, or individual 
tests](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools).
 
 ## A Note About Hadoop Versions


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org