date:20140802

git commit: [SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 4bc3bb29a - adc830329


[SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly 
sfl4j api

Author: GuoQiang Li wi...@qq.com

Closes #1369 from witgo/SPARK-1470_new and squashes the following commits:

66a1641 [GuoQiang Li] IncompatibleResultTypeProblem
73a89ba [GuoQiang Li] Use the scala-logging wrapper instead of the directly 
sfl4j api.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/adc83032
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/adc83032
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/adc83032

Branch: refs/heads/master
Commit: adc8303294e26efb4ed15e5f5ba1062f7988625d
Parents: 4bc3bb2
Author: GuoQiang Li wi...@qq.com
Authored: Fri Aug 1 23:55:11 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Fri Aug 1 23:55:11 2014 -0700

--
 core/pom.xml|  4 +
 .../main/scala/org/apache/spark/Logging.scala   | 39 +
 .../org/apache/spark/util/SignalLogger.scala|  2 +-
 mllib/pom.xml   |  4 +
 pom.xml |  5 ++
 project/MimaExcludes.scala  | 91 +++-
 sql/catalyst/pom.xml|  5 --
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  4 +-
 .../catalyst/analysis/HiveTypeCoercion.scala|  8 +-
 .../catalyst/expressions/BoundAttribute.scala   |  2 +-
 .../expressions/codegen/GenerateOrdering.scala  |  4 +-
 .../org/apache/spark/sql/catalyst/package.scala |  1 -
 .../sql/catalyst/planning/QueryPlanner.scala|  2 +-
 .../spark/sql/catalyst/planning/patterns.scala  |  6 +-
 .../apache/spark/sql/catalyst/rules/Rule.scala  |  2 +-
 .../spark/sql/catalyst/rules/RuleExecutor.scala | 12 +--
 .../spark/sql/catalyst/trees/package.scala  |  8 +-
 .../scala/org/apache/spark/sql/SQLContext.scala |  2 +-
 .../compression/CompressibleColumnBuilder.scala |  5 +-
 .../apache/spark/sql/execution/Exchange.scala   |  2 +-
 .../org/apache/spark/sql/json/JsonRDD.scala |  2 +-
 .../scala/org/apache/spark/sql/package.scala|  2 -
 .../spark/sql/columnar/ColumnTypeSuite.scala|  4 +-
 .../hive/thriftserver/HiveThriftServer2.scala   | 12 +--
 .../hive/thriftserver/SparkSQLCLIDriver.scala   |  2 +-
 .../sql/hive/thriftserver/SparkSQLDriver.scala  |  6 +-
 .../sql/hive/thriftserver/SparkSQLEnv.scala |  6 +-
 .../server/SparkSQLOperationManager.scala   | 13 +--
 .../thriftserver/HiveThriftServer2Suite.scala   |  2 +-
 .../org/apache/spark/sql/hive/HiveContext.scala |  2 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  3 +-
 .../org/apache/spark/sql/hive/TestHive.scala| 10 +--
 .../org/apache/spark/sql/hive/hiveUdfs.scala|  4 +-
 .../sql/hive/execution/HiveComparisonTest.scala | 22 ++---
 .../sql/hive/execution/HiveQueryFileTest.scala  |  2 +-
 35 files changed, 203 insertions(+), 97 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/adc83032/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 7c60cf1..47766ae 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -99,6 +99,10 @@
   artifactIdjcl-over-slf4j/artifactId
 /dependency
 dependency
+  groupIdcom.typesafe.scala-logging/groupId
+  artifactIdscala-logging-slf4j_${scala.binary.version}/artifactId
+/dependency
+dependency
   groupIdlog4j/groupId
   artifactIdlog4j/artifactId
 /dependency

http://git-wip-us.apache.org/repos/asf/spark/blob/adc83032/core/src/main/scala/org/apache/spark/Logging.scala
--
diff --git a/core/src/main/scala/org/apache/spark/Logging.scala 
b/core/src/main/scala/org/apache/spark/Logging.scala
index 807ef3e..6e61c00 100644
--- a/core/src/main/scala/org/apache/spark/Logging.scala
+++ b/core/src/main/scala/org/apache/spark/Logging.scala
@@ -18,8 +18,9 @@
 package org.apache.spark
 
 import org.apache.log4j.{LogManager, PropertyConfigurator}
-import org.slf4j.{Logger, LoggerFactory}
+import org.slf4j.LoggerFactory
 import org.slf4j.impl.StaticLoggerBinder
+import com.typesafe.scalalogging.slf4j.Logger
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.util.Utils
@@ -39,61 +40,69 @@ trait Logging {
   // be serialized and used on another machine
   @transient private var log_ : Logger = null
 
+  // Method to get the logger name for this object
+  protected def logName = {
+var className = this.getClass.getName
+// Ignore trailing $'s in the class names for Scala objects
+if (className.endsWith($)) {
+  className = className.substring(0, className.length - 1)
+}
+className
+  }
+
   // Method

git commit: [SPARK-2316] Avoid O(blocks) operations in listeners

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master dab37966b - d934801d5


[SPARK-2316] Avoid O(blocks) operations in listeners

The existing code in `StorageUtils` is not the most efficient. Every time we 
want to update an `RDDInfo` we end up iterating through all blocks on all block 
managers just to discard most of them. The symptoms manifest themselves in the 
bountiful UI bugs observed in the wild. Many of these bugs are caused by the 
slow consumption of events in `LiveListenerBus`, which frequently leads to the 
event queue overflowing and `SparkListenerEvent`s being dropped on the floor. 
The changes made in this PR avoid this by first filtering out only the blocks 
relevant to us before computing storage information from them.

It's worth a mention that this corner of the Spark code is also not very 
well-tested at all. The bulk of the changes in this PR (more than 60%) is 
actually test cases for the various logic in `StorageUtils.scala` as well as 
`StorageTab.scala`. These will eventually be extended to cover the various 
listeners that constitute the `SparkUI`.

Author: Andrew Or andrewo...@gmail.com

Closes #1679 from andrewor14/fix-drop-events and squashes the following commits:

f80c1fa [Andrew Or] Rewrite fold and reduceOption as sum
e132d69 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
fix-drop-events
14fa1c3 [Andrew Or] Simplify some code + update a few comments
a91be46 [Andrew Or] Make ExecutorsPage blazingly fast
bf6f09b [Andrew Or] Minor changes
8981de1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
fix-drop-events
af19bc0 [Andrew Or] *UsedByRDD - *UsedByRdd (minor)
6970bc8 [Andrew Or] Add extensive tests for StorageListener and the new code in 
StorageUtils
e080b9e [Andrew Or] Reduce run time of StorageUtils.updateRddInfo to near 
constant
2c3ef6a [Andrew Or] Actually filter out only the relevant RDDs
6fef86a [Andrew Or] Add extensive tests for new code in StorageStatus
b66b6b0 [Andrew Or] Use more efficient underlying data structures for blocks
6a7b7c0 [Andrew Or] Avoid chained operations on TraversableLike
a9ec384 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
fix-drop-events
b12fcd7 [Andrew Or] Fix tests + simplify sc.getRDDStorageInfo
da8e322 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
fix-drop-events
8e91921 [Andrew Or] Iterate through a filtered set of blocks when updating 
RDDInfo
7b2c4aa [Andrew Or] Rewrite blockLocationsFromStorageStatus + clean up method 
signatures
41fa50d [Andrew Or] Add a legacy constructor for StorageStatus
53af15d [Andrew Or] Refactor StorageStatus + add a bunch of tests


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d934801d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d934801d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d934801d

Branch: refs/heads/master
Commit: d934801d53fc2f1d57d3534ae4e1e9384c7dda99
Parents: dab3796
Author: Andrew Or andrewo...@gmail.com
Authored: Fri Aug 1 23:56:24 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Fri Aug 1 23:56:24 2014 -0700

--
 .../scala/org/apache/spark/SparkContext.scala   |   6 +-
 .../spark/storage/BlockManagerMasterActor.scala |  14 +-
 .../spark/storage/BlockManagerSource.scala  |  14 +-
 .../org/apache/spark/storage/RDDInfo.scala  |   2 +
 .../spark/storage/StorageStatusListener.scala   |  12 +-
 .../org/apache/spark/storage/StorageUtils.scala | 316 -
 .../apache/spark/ui/exec/ExecutorsPage.scala|  12 +-
 .../org/apache/spark/ui/storage/RDDPage.scala   |  17 +-
 .../apache/spark/ui/storage/StorageTab.scala|  13 +-
 .../apache/spark/SparkContextInfoSuite.scala|  22 +-
 .../storage/StorageStatusListenerSuite.scala|  72 ++--
 .../org/apache/spark/storage/StorageSuite.scala | 354 +++
 .../spark/ui/storage/StorageTabSuite.scala  | 165 +
 13 files changed, 843 insertions(+), 176 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d934801d/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 368835a..9ba21cf 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -48,7 +48,7 @@ import org.apache.spark.scheduler._
 import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, 
SparkDeploySchedulerBackend, SimrSchedulerBackend}
 import org.apache.spark.scheduler.cluster.mesos.{CoarseMesosSchedulerBackend, 
MesosSchedulerBackend}
 import org.apache.spark.scheduler.local.LocalBackend
-import

git commit: [SPARK-2454] Do not ship spark home to Workers

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master d934801d5 - 148af6082


[SPARK-2454] Do not ship spark home to Workers

When standalone Workers launch executors, they inherit the Spark home set by 
the driver. This means if the worker machines do not share the same directory 
structure as the driver node, the Workers will attempt to run scripts (e.g. 
bin/compute-classpath.sh) that do not exist locally and fail. This is a common 
scenario if the driver is launched from outside of the cluster.

The solution is to simply not pass the driver's Spark home to the Workers. This 
PR further makes an attempt to avoid overloading the usages of `spark.home`, 
which is now only used for setting executor Spark home on Mesos and in python.

This is based on top of #1392 and originally reported by YanTangZhai. Tested on 
standalone cluster.

Author: Andrew Or andrewo...@gmail.com

Closes #1734 from andrewor14/spark-home-reprise and squashes the following 
commits:

f71f391 [Andrew Or] Revert changes in python
1c2532c [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
spark-home-reprise
188fc5d [Andrew Or] Avoid using spark.home where possible
09272b7 [Andrew Or] Always use Worker's working directory as spark home


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/148af608
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/148af608
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/148af608

Branch: refs/heads/master
Commit: 148af6082cdb44840bbd61c7a4f67a95badad10b
Parents: d934801
Author: Andrew Or andrewo...@gmail.com
Authored: Sat Aug 2 00:45:38 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 00:45:38 2014 -0700

--
 .../org/apache/spark/deploy/ApplicationDescription.scala  | 1 -
 .../src/main/scala/org/apache/spark/deploy/JsonProtocol.scala | 1 -
 .../scala/org/apache/spark/deploy/client/TestClient.scala | 5 ++---
 .../main/scala/org/apache/spark/deploy/worker/Worker.scala| 7 +++
 .../spark/scheduler/cluster/SparkDeploySchedulerBackend.scala | 3 +--
 core/src/test/scala/org/apache/spark/DriverSuite.scala| 2 +-
 .../scala/org/apache/spark/deploy/JsonProtocolSuite.scala | 5 ++---
 .../test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala | 2 +-
 .../org/apache/spark/deploy/worker/ExecutorRunnerTest.scala   | 7 +++
 project/SparkBuild.scala  | 2 +-
 python/pyspark/context.py | 2 +-
 repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala| 3 ---
 .../main/scala/org/apache/spark/streaming/Checkpoint.scala| 1 -
 13 files changed, 15 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/148af608/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala 
b/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
index 86305d2..65a1a8f 100644
--- a/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala
@@ -22,7 +22,6 @@ private[spark] class ApplicationDescription(
 val maxCores: Option[Int],
 val memoryPerSlave: Int,
 val command: Command,
-val sparkHome: Option[String],
 var appUiUrl: String,
 val eventLogDir: Option[String] = None)
   extends Serializable {

http://git-wip-us.apache.org/repos/asf/spark/blob/148af608/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
index c4f5e29..696f32a 100644
--- a/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
@@ -56,7 +56,6 @@ private[spark] object JsonProtocol {
 (cores - obj.maxCores) ~
 (memoryperslave - obj.memoryPerSlave) ~
 (user - obj.user) ~
-(sparkhome - obj.sparkHome) ~
 (command - obj.command.toString)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/148af608/core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala 
b/core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
index b8ffa9a..88a0862 100644
--- a/core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
@@

git commit: [SPARK-1812] sql/catalyst - Provide explicit type information

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 148af6082 - 08c095b66


[SPARK-1812]  sql/catalyst - Provide explicit type information

For Scala 2.11 compatibility.

Without the explicit type specification, withNullability
return type is inferred to be Attribute, and thus calling
at() on the returned object fails in these tests:

[ERROR] 
/Users/avati/work/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:370:
 value at is not a
[ERROR] val c4_notNull = 'a.boolean.notNull.at(3)
[ERROR] ^
[ERROR] 
/Users/avati/work/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:371:
 value at is not a
[ERROR] val c5_notNull = 'a.boolean.notNull.at(4)
[ERROR] ^
[ERROR] 
/Users/avati/work/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:372:
 value at is not a
[ERROR] val c6_notNull = 'a.boolean.notNull.at(5)
[ERROR] ^
[ERROR] 
/Users/avati/work/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:558:
 value at is not a
[ERROR] val s_notNull = 'a.string.notNull.at(0)

Signed-off-by: Anand Avati avatiredhat.com

Author: Anand Avati av...@redhat.com

Closes #1709 from avati/SPARK-1812-notnull and squashes the following commits:

0470eb3 [Anand Avati] SPARK-1812: sql/catalyst - Provide explicit type 
information


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/08c095b6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/08c095b6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/08c095b6

Branch: refs/heads/master
Commit: 08c095b6647033285e8f6703922bdacecce3fc71
Parents: 148af60
Author: Anand Avati av...@redhat.com
Authored: Sat Aug 2 00:48:17 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 00:48:17 2014 -0700

--
 .../apache/spark/sql/catalyst/expressions/namedExpressions.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/08c095b6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index ed69928..02d0476 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -134,7 +134,7 @@ case class AttributeReference(name: String, dataType: 
DataType, nullable: Boolea
   /**
* Returns a copy of this [[AttributeReference]] with changed nullability.
*/
-  override def withNullability(newNullability: Boolean) = {
+  override def withNullability(newNullability: Boolean): AttributeReference = {
 if (nullable == newNullability) {
   this
 } else {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: HOTFIX: Fixing test error in maven for flume-sink.

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 08c095b66 - 25cad6adf


HOTFIX: Fixing test error in maven for flume-sink.

We needed to add an explicit dependency on scalatest since this
module will not get it from spark core like others do.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/25cad6ad
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/25cad6ad
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/25cad6ad

Branch: refs/heads/master
Commit: 25cad6adf6479fb00265df06d5f77599f8defd26
Parents: 08c095b
Author: Patrick Wendell pwend...@gmail.com
Authored: Sat Aug 2 00:57:47 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 00:58:33 2014 -0700

--
 external/flume-sink/pom.xml | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/25cad6ad/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml b/external/flume-sink/pom.xml
index d11129c..d0bf1cf 100644
--- a/external/flume-sink/pom.xml
+++ b/external/flume-sink/pom.xml
@@ -67,7 +67,10 @@
 dependency
   groupIdorg.scala-lang/groupId
   artifactIdscala-library/artifactId
-  version2.10.4/version
+/dependency
+dependency
+  groupIdorg.scalatest/groupId
+  artifactIdscalatest_${scala.binary.version}/artifactId
 /dependency
   /dependencies
   build


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: HOTFIX: Fix concurrency issue in FlumePollingStreamSuite.

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 25cad6adf - 44460ba59


HOTFIX: Fix concurrency issue in FlumePollingStreamSuite.

This has been failing on master. One possible cause is that the port
gets contended if multiple test runs happen concurrently and they
hit this test at the same time. Since this test takes a long time
(60 seconds) that's very plausible. This patch randomizes the port
used in this test to avoid contention.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/44460ba5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/44460ba5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/44460ba5

Branch: refs/heads/master
Commit: 44460ba594fbfe5a6ee66e5121ead914bf16f9f6
Parents: 25cad6a
Author: Patrick Wendell pwend...@gmail.com
Authored: Sat Aug 2 01:11:03 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 01:16:13 2014 -0700

--
 .../spark/streaming/flume/FlumePollingStreamSuite.scala   | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/44460ba5/external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
--
diff --git 
a/external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
 
b/external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
index 47071d0..27bf2ac 100644
--- 
a/external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
+++ 
b/external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
@@ -20,6 +20,7 @@ package org.apache.spark.streaming.flume
 
 import java.net.InetSocketAddress
 import java.util.concurrent.{Callable, ExecutorCompletionService, Executors}
+import java.util.Random
 
 import scala.collection.JavaConversions._
 import scala.collection.mutable.{SynchronizedBuffer, ArrayBuffer}
@@ -37,13 +38,16 @@ import org.apache.spark.streaming.flume.sink._
 
 class FlumePollingStreamSuite extends TestSuiteBase {
 
-  val testPort = 
+  val random = new Random()
+  /** Return a port in the ephemeral range. */
+  def getTestPort = random.nextInt(16382) + 49152
   val batchCount = 5
   val eventsPerBatch = 100
   val totalEventsPerChannel = batchCount * eventsPerBatch
   val channelCapacity = 5000
 
   test(flume polling test) {
+val testPort = getTestPort
 // Set up the streaming context and input streams
 val ssc = new StreamingContext(conf, batchDuration)
 val flumeStream: ReceiverInputDStream[SparkFlumeEvent] =
@@ -77,6 +81,7 @@ class FlumePollingStreamSuite extends TestSuiteBase {
   }
 
   test(flume polling test multiple hosts) {
+val testPort = getTestPort
 // Set up the streaming context and input streams
 val ssc = new StreamingContext(conf, batchDuration)
 val addresses = Seq(testPort, testPort + 1).map(new 
InetSocketAddress(localhost, _))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: MAINTENANCE: Automated closing of pull requests.

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 44460ba59 - 87738bfa4


MAINTENANCE: Automated closing of pull requests.

This commit exists to close the following pull requests on Github:

Closes #706 (close requested by 'pwendell')
Closes #453 (close requested by 'pwendell')
Closes #557 (close requested by 'tdas')
Closes #495 (close requested by 'tdas')
Closes #1232 (close requested by 'pwendell')
Closes #82 (close requested by 'pwendell')
Closes #600 (close requested by 'pwendell')
Closes #473 (close requested by 'pwendell')
Closes #351 (close requested by 'pwendell')


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87738bfa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87738bfa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87738bfa

Branch: refs/heads/master
Commit: 87738bfa4051771ddfb8c4a4c1eb142fd77e3a46
Parents: 44460ba
Author: Patrick Wendell pwend...@gmail.com
Authored: Sat Aug 2 01:26:16 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 01:26:16 2014 -0700

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [HOTFIX] Do not throw NPE if spark.test.home is not set

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 87738bfa4 - e09e18b31


[HOTFIX] Do not throw NPE if spark.test.home is not set

`spark.test.home` was introduced in #1734. This is fine for SBT but is failing 
maven tests. Either way it shouldn't throw an NPE.

Author: Andrew Or andrewo...@gmail.com

Closes #1739 from andrewor14/fix-spark-test-home and squashes the following 
commits:

ce2624c [Andrew Or] Do not throw NPE if spark.test.home is not set


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e09e18b3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e09e18b3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e09e18b3

Branch: refs/heads/master
Commit: e09e18b3123c20e9b9497cf606473da500349d4d
Parents: 87738bf
Author: Andrew Or andrewo...@gmail.com
Authored: Sat Aug 2 12:11:50 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 12:12:56 2014 -0700

--
 .../main/scala/org/apache/spark/deploy/worker/Worker.scala  | 9 +++--
 core/src/test/scala/org/apache/spark/DriverSuite.scala  | 2 +-
 .../scala/org/apache/spark/deploy/SparkSubmitSuite.scala| 2 +-
 .../org/apache/spark/deploy/worker/ExecutorRunnerTest.scala | 2 +-
 pom.xml | 8 
 5 files changed, 14 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e09e18b3/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
index c6ea42f..458d994 100755
--- a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
@@ -71,7 +71,7 @@ private[spark] class Worker(
   // TTL for app folders/data;  after TTL expires it will be cleaned up
   val APP_DATA_RETENTION_SECS = 
conf.getLong(spark.worker.cleanup.appDataTtl, 7 * 24 * 3600)
 
-
+  val testing: Boolean = sys.props.contains(spark.testing)
   val masterLock: Object = new Object()
   var master: ActorSelection = null
   var masterAddress: Address = null
@@ -82,7 +82,12 @@ private[spark] class Worker(
   @volatile var connected = false
   val workerId = generateWorkerId()
   val sparkHome =
-new 
File(sys.props.get(spark.test.home).orElse(sys.env.get(SPARK_HOME)).getOrElse(.))
+if (testing) {
+  assert(sys.props.contains(spark.test.home), spark.test.home is not 
set!)
+  new File(sys.props(spark.test.home))
+} else {
+  new File(sys.env.get(SPARK_HOME).getOrElse(.))
+}
   var workDir: File = null
   val executors = new HashMap[String, ExecutorRunner]
   val finishedExecutors = new HashMap[String, ExecutorRunner]

http://git-wip-us.apache.org/repos/asf/spark/blob/e09e18b3/core/src/test/scala/org/apache/spark/DriverSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/DriverSuite.scala 
b/core/src/test/scala/org/apache/spark/DriverSuite.scala
index e36902e..a73e1ef 100644
--- a/core/src/test/scala/org/apache/spark/DriverSuite.scala
+++ b/core/src/test/scala/org/apache/spark/DriverSuite.scala
@@ -34,7 +34,7 @@ import scala.language.postfixOps
 class DriverSuite extends FunSuite with Timeouts {
 
   test(driver should exit after finishing) {
-val sparkHome = sys.props(spark.test.home)
+val sparkHome = sys.props.getOrElse(spark.test.home, 
fail(spark.test.home is not set!))
 // Regression test for SPARK-530: Spark driver process doesn't exit after 
finishing
 val masters = Table((master), (local), (local-cluster[2,1,512]))
 forAll(masters) { (master: String) =

http://git-wip-us.apache.org/repos/asf/spark/blob/e09e18b3/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
index 8126ef1..a5cdcfb 100644
--- a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
@@ -295,7 +295,7 @@ class SparkSubmitSuite extends FunSuite with Matchers {
 
   // NOTE: This is an expensive operation in terms of time (10 seconds+). Use 
sparingly.
   def runSparkSubmit(args: Seq[String]): String = {
-val sparkHome = sys.props(spark.test.home)
+val sparkHome = sys.props.getOrElse(spark.test.home, 
fail(spark.test.home is not set!))
 Utils.executeAndGetOutput(
   Seq(./bin/spark-submit) ++ args,
   new File(sparkHome),

git commit: [HOTFIX] Do not throw NPE if spark.test.home is not set

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 87738bfa4 - e22110879


[HOTFIX] Do not throw NPE if spark.test.home is not set

`spark.test.home` was introduced in #1734. This is fine for SBT but is failing 
maven tests. Either way it shouldn't throw an NPE.

Author: Andrew Or andrewo...@gmail.com

Closes #1739 from andrewor14/fix-spark-test-home and squashes the following 
commits:

ce2624c [Andrew Or] Do not throw NPE if spark.test.home is not set


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e2211087
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e2211087
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e2211087

Branch: refs/heads/branch-1.1
Commit: e22110879cd149e94c9a5ca7466f787033572b15
Parents: 87738bf
Author: Andrew Or andrewo...@gmail.com
Authored: Sat Aug 2 12:11:50 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 12:46:52 2014 -0700

--
 .../main/scala/org/apache/spark/deploy/worker/Worker.scala  | 9 +++--
 core/src/test/scala/org/apache/spark/DriverSuite.scala  | 2 +-
 .../scala/org/apache/spark/deploy/SparkSubmitSuite.scala| 2 +-
 .../org/apache/spark/deploy/worker/ExecutorRunnerTest.scala | 2 +-
 pom.xml | 8 
 5 files changed, 14 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e2211087/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
index c6ea42f..458d994 100755
--- a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
@@ -71,7 +71,7 @@ private[spark] class Worker(
   // TTL for app folders/data;  after TTL expires it will be cleaned up
   val APP_DATA_RETENTION_SECS = 
conf.getLong(spark.worker.cleanup.appDataTtl, 7 * 24 * 3600)
 
-
+  val testing: Boolean = sys.props.contains(spark.testing)
   val masterLock: Object = new Object()
   var master: ActorSelection = null
   var masterAddress: Address = null
@@ -82,7 +82,12 @@ private[spark] class Worker(
   @volatile var connected = false
   val workerId = generateWorkerId()
   val sparkHome =
-new 
File(sys.props.get(spark.test.home).orElse(sys.env.get(SPARK_HOME)).getOrElse(.))
+if (testing) {
+  assert(sys.props.contains(spark.test.home), spark.test.home is not 
set!)
+  new File(sys.props(spark.test.home))
+} else {
+  new File(sys.env.get(SPARK_HOME).getOrElse(.))
+}
   var workDir: File = null
   val executors = new HashMap[String, ExecutorRunner]
   val finishedExecutors = new HashMap[String, ExecutorRunner]

http://git-wip-us.apache.org/repos/asf/spark/blob/e2211087/core/src/test/scala/org/apache/spark/DriverSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/DriverSuite.scala 
b/core/src/test/scala/org/apache/spark/DriverSuite.scala
index e36902e..a73e1ef 100644
--- a/core/src/test/scala/org/apache/spark/DriverSuite.scala
+++ b/core/src/test/scala/org/apache/spark/DriverSuite.scala
@@ -34,7 +34,7 @@ import scala.language.postfixOps
 class DriverSuite extends FunSuite with Timeouts {
 
   test(driver should exit after finishing) {
-val sparkHome = sys.props(spark.test.home)
+val sparkHome = sys.props.getOrElse(spark.test.home, 
fail(spark.test.home is not set!))
 // Regression test for SPARK-530: Spark driver process doesn't exit after 
finishing
 val masters = Table((master), (local), (local-cluster[2,1,512]))
 forAll(masters) { (master: String) =

http://git-wip-us.apache.org/repos/asf/spark/blob/e2211087/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
index 8126ef1..a5cdcfb 100644
--- a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
@@ -295,7 +295,7 @@ class SparkSubmitSuite extends FunSuite with Matchers {
 
   // NOTE: This is an expensive operation in terms of time (10 seconds+). Use 
sparingly.
   def runSparkSubmit(args: Seq[String]): String = {
-val sparkHome = sys.props(spark.test.home)
+val sparkHome = sys.props.getOrElse(spark.test.home, 
fail(spark.test.home is not set!))
 Utils.executeAndGetOutput(
   Seq(./bin/spark-submit) ++ args,
   new File(sparkHome),

git commit: [SPARK-2478] [mllib] DecisionTree Python API

2014-08-02 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master e09e18b31 - 3f67382e7


[SPARK-2478] [mllib] DecisionTree Python API

Added experimental Python API for Decision Trees.

API:
* class DecisionTreeModel
** predict() for single examples and RDDs, taking both feature vectors and 
LabeledPoints
** numNodes()
** depth()
** __str__()
* class DecisionTree
** trainClassifier()
** trainRegressor()
** train()

Examples and testing:
* Added example testing classification and regression with batch prediction: 
examples/src/main/python/mllib/tree.py
* Have also tested example usage in doc of python/pyspark/mllib/tree.py which 
tests single-example prediction with dense and sparse vectors

Also: Small bug fix in python/pyspark/mllib/_common.py: In 
_linear_predictor_typecheck, changed check for RDD to use isinstance() instead 
of type() in order to catch RDD subclasses.

CC mengxr manishamde

Author: Joseph K. Bradley joseph.kurata.brad...@gmail.com

Closes #1727 from jkbradley/decisiontree-python-new and squashes the following 
commits:

3744488 [Joseph K. Bradley] Renamed test tree.py to decision_tree_runner.py 
Small updates based on github review.
6b86a9d [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
affceb9 [Joseph K. Bradley] * Fixed bug in doc tests in pyspark/mllib/util.py 
caused by change in loadLibSVMFile behavior.  (It used to threshold labels at 0 
to make them 0/1, but it now leaves them as they are.) * Fixed small bug in 
loadLibSVMFile: If a data file had no features, then loadLibSVMFile would 
create a single all-zero feature.
67a29bc [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
cf46ad7 [Joseph K. Bradley] Python DecisionTreeModel * predict(empty RDD) 
returns an empty RDD instead of an error. * Removed support for calling 
predict() on LabeledPoint and RDD[LabeledPoint] * predict() does not cache 
serialized RDD any more.
aa29873 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
bf21be4 [Joseph K. Bradley] removed old run() func from DecisionTree
fa10ea7 [Joseph K. Bradley] Small style update
7968692 [Joseph K. Bradley] small braces typo fix
e34c263 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
4801b40 [Joseph K. Bradley] Small style update to DecisionTreeSuite
db0eab2 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix2' into 
decisiontree-python-new
6873fa9 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
225822f [Joseph K. Bradley] Bug: In DecisionTree, the method 
sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins 
from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is 
the bound for unordered categorical features, not ordered ones. The upper bound 
should be the arity (i.e., max value) of the feature.
93953f1 [Joseph K. Bradley] Likely done with Python API.
6df89a9 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
4562c08 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
665ba78 [Joseph K. Bradley] Small updates towards Python DecisionTree API
188cb0d [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into 
decisiontree-python-new
6622247 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
b8fac57 [Joseph K. Bradley] Finished Python DecisionTree API and example but 
need to test a bit more.
2b20c61 [Joseph K. Bradley] Small doc and style updates
1b29c13 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into 
decisiontree-python-new
584449a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
dab0b67 [Joseph K. Bradley] Added documentation for DecisionTree internals
8bb8aa0 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-bugfix
978cfcf [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-bugfix
6eed482 [Joseph K. Bradley] In DecisionTree: Changed from using procedural 
syntax for functions returning Unit to explicitly writing Unit return type.
376dca2 [Joseph K. Bradley] Updated meaning of maxDepth by 1 to fit 
scikit-learn and rpart. * In code, replaced usages of maxDepth -- maxDepth + 1 
* In params, replace settings of maxDepth -- maxDepth - 1
e06e423 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into 
decisiontree-python-new
bab3f19 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
59750f8 [Joseph K. Bradley] * Updated Strategy to check 
numClassesForClassification only if algo=Classification. * Updates based on 
comments: ** DecisionTreeRunner *** Made dataFormat arg default to libsvm ** 
Small cleanups ** tree.Node: Made recursive helper methods private, and renamed 
them.

git commit: [SPARK-2478] [mllib] DecisionTree Python API

2014-08-02 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 e22110879 - 8d6ac2b95


[SPARK-2478] [mllib] DecisionTree Python API

Added experimental Python API for Decision Trees.

API:
* class DecisionTreeModel
** predict() for single examples and RDDs, taking both feature vectors and 
LabeledPoints
** numNodes()
** depth()
** __str__()
* class DecisionTree
** trainClassifier()
** trainRegressor()
** train()

Examples and testing:
* Added example testing classification and regression with batch prediction: 
examples/src/main/python/mllib/tree.py
* Have also tested example usage in doc of python/pyspark/mllib/tree.py which 
tests single-example prediction with dense and sparse vectors

Also: Small bug fix in python/pyspark/mllib/_common.py: In 
_linear_predictor_typecheck, changed check for RDD to use isinstance() instead 
of type() in order to catch RDD subclasses.

CC mengxr manishamde

Author: Joseph K. Bradley joseph.kurata.brad...@gmail.com

Closes #1727 from jkbradley/decisiontree-python-new and squashes the following 
commits:

3744488 [Joseph K. Bradley] Renamed test tree.py to decision_tree_runner.py 
Small updates based on github review.
6b86a9d [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
affceb9 [Joseph K. Bradley] * Fixed bug in doc tests in pyspark/mllib/util.py 
caused by change in loadLibSVMFile behavior.  (It used to threshold labels at 0 
to make them 0/1, but it now leaves them as they are.) * Fixed small bug in 
loadLibSVMFile: If a data file had no features, then loadLibSVMFile would 
create a single all-zero feature.
67a29bc [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
cf46ad7 [Joseph K. Bradley] Python DecisionTreeModel * predict(empty RDD) 
returns an empty RDD instead of an error. * Removed support for calling 
predict() on LabeledPoint and RDD[LabeledPoint] * predict() does not cache 
serialized RDD any more.
aa29873 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
bf21be4 [Joseph K. Bradley] removed old run() func from DecisionTree
fa10ea7 [Joseph K. Bradley] Small style update
7968692 [Joseph K. Bradley] small braces typo fix
e34c263 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
4801b40 [Joseph K. Bradley] Small style update to DecisionTreeSuite
db0eab2 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix2' into 
decisiontree-python-new
6873fa9 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
225822f [Joseph K. Bradley] Bug: In DecisionTree, the method 
sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins 
from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is 
the bound for unordered categorical features, not ordered ones. The upper bound 
should be the arity (i.e., max value) of the feature.
93953f1 [Joseph K. Bradley] Likely done with Python API.
6df89a9 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
4562c08 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
665ba78 [Joseph K. Bradley] Small updates towards Python DecisionTree API
188cb0d [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into 
decisiontree-python-new
6622247 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
b8fac57 [Joseph K. Bradley] Finished Python DecisionTree API and example but 
need to test a bit more.
2b20c61 [Joseph K. Bradley] Small doc and style updates
1b29c13 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into 
decisiontree-python-new
584449a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
dab0b67 [Joseph K. Bradley] Added documentation for DecisionTree internals
8bb8aa0 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-bugfix
978cfcf [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-bugfix
6eed482 [Joseph K. Bradley] In DecisionTree: Changed from using procedural 
syntax for functions returning Unit to explicitly writing Unit return type.
376dca2 [Joseph K. Bradley] Updated meaning of maxDepth by 1 to fit 
scikit-learn and rpart. * In code, replaced usages of maxDepth -- maxDepth + 1 
* In params, replace settings of maxDepth -- maxDepth - 1
e06e423 [Joseph K. Bradley] Merge branch 'decisiontree-bugfix' into 
decisiontree-python-new
bab3f19 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into 
decisiontree-python-new
59750f8 [Joseph K. Bradley] * Updated Strategy to check 
numClassesForClassification only if algo=Classification. * Updates based on 
comments: ** DecisionTreeRunner *** Made dataFormat arg default to libsvm ** 
Small cleanups ** tree.Node: Made recursive helper methods private, and renamed 
them.

git commit: [SQL] Set outputPartitioning of BroadcastHashJoin correctly.

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 3f67382e7 - 67bd8e3c2


[SQL] Set outputPartitioning of BroadcastHashJoin correctly.

I think we will not generate the plan triggering this bug at this moment. But, 
let me explain it...

Right now, we are using `left.outputPartitioning` as the `outputPartitioning` 
of a `BroadcastHashJoin`. We may have a wrong physical plan for cases like...
```sql
SELECT l.key, count(*)
FROM (SELECT key, count(*) as cnt
  FROM src
  GROUP BY key) l // This is buildPlan
JOIN r // This is the streamedPlan
ON (l.cnt = r.value)
GROUP BY l.key
```
Let's say we have a `BroadcastHashJoin` on `l` and `r`. For this case, we will 
pick `l`'s `outputPartitioning` for the `outputPartitioning`of the 
`BroadcastHashJoin` on `l` and `r`. Also, because the last `GROUP BY` is using 
`l.key` as the key, we will not introduce an `Exchange` for this aggregation. 
However, `r`'s outputPartitioning may not match the required distribution of 
the last `GROUP BY` and we fail to group data correctly.

JIRA is being reindexed. I will create a JIRA ticket once it is back online.

Author: Yin Huai h...@cse.ohio-state.edu

Closes #1735 from yhuai/BroadcastHashJoin and squashes the following commits:

96d9cb3 [Yin Huai] Set outputPartitioning correctly.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/67bd8e3c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/67bd8e3c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/67bd8e3c

Branch: refs/heads/master
Commit: 67bd8e3c217a80c3117a6e3853aa60fe13d08c91
Parents: 3f67382
Author: Yin Huai h...@cse.ohio-state.edu
Authored: Sat Aug 2 13:16:41 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 13:16:41 2014 -0700

--
 .../src/main/scala/org/apache/spark/sql/execution/joins.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/67bd8e3c/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
index cc138c7..51bb615 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
@@ -405,8 +405,7 @@ case class BroadcastHashJoin(
  left: SparkPlan,
  right: SparkPlan) extends BinaryNode with HashJoin {
 
-
-  override def outputPartitioning: Partitioning = left.outputPartitioning
+  override def outputPartitioning: Partitioning = 
streamedPlan.outputPartitioning
 
   override def requiredChildDistribution =
 UnspecifiedDistribution :: UnspecifiedDistribution :: Nil


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SQL] Set outputPartitioning of BroadcastHashJoin correctly.

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 8d6ac2b95 - 91de0dc16


[SQL] Set outputPartitioning of BroadcastHashJoin correctly.

I think we will not generate the plan triggering this bug at this moment. But, 
let me explain it...

Right now, we are using `left.outputPartitioning` as the `outputPartitioning` 
of a `BroadcastHashJoin`. We may have a wrong physical plan for cases like...
```sql
SELECT l.key, count(*)
FROM (SELECT key, count(*) as cnt
  FROM src
  GROUP BY key) l // This is buildPlan
JOIN r // This is the streamedPlan
ON (l.cnt = r.value)
GROUP BY l.key
```
Let's say we have a `BroadcastHashJoin` on `l` and `r`. For this case, we will 
pick `l`'s `outputPartitioning` for the `outputPartitioning`of the 
`BroadcastHashJoin` on `l` and `r`. Also, because the last `GROUP BY` is using 
`l.key` as the key, we will not introduce an `Exchange` for this aggregation. 
However, `r`'s outputPartitioning may not match the required distribution of 
the last `GROUP BY` and we fail to group data correctly.

JIRA is being reindexed. I will create a JIRA ticket once it is back online.

Author: Yin Huai h...@cse.ohio-state.edu

Closes #1735 from yhuai/BroadcastHashJoin and squashes the following commits:

96d9cb3 [Yin Huai] Set outputPartitioning correctly.

(cherry picked from commit 67bd8e3c217a80c3117a6e3853aa60fe13d08c91)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/91de0dc1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/91de0dc1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/91de0dc1

Branch: refs/heads/branch-1.1
Commit: 91de0dc1654d609dc1ff8fa9a07ba18043ad61c6
Parents: 8d6ac2b
Author: Yin Huai h...@cse.ohio-state.edu
Authored: Sat Aug 2 13:16:41 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 13:16:53 2014 -0700

--
 .../src/main/scala/org/apache/spark/sql/execution/joins.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/91de0dc1/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
index cc138c7..51bb615 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
@@ -405,8 +405,7 @@ case class BroadcastHashJoin(
  left: SparkPlan,
  right: SparkPlan) extends BinaryNode with HashJoin {
 
-
-  override def outputPartitioning: Partitioning = left.outputPartitioning
+  override def outputPartitioning: Partitioning = 
streamedPlan.outputPartitioning
 
   override def requiredChildDistribution =
 UnspecifiedDistribution :: UnspecifiedDistribution :: Nil


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] [SPARK-1981] Add AWS Kinesis streaming support

2014-08-02 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master 67bd8e3c2 - 91f9504e6


http://git-wip-us.apache.org/repos/asf/spark/blob/91f9504e/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
--
diff --git 
a/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
 
b/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
new file mode 100644
index 000..41dbd64
--- /dev/null
+++ 
b/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.nio.ByteBuffer
+
+import scala.collection.JavaConversions.seqAsJavaList
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming.Milliseconds
+import org.apache.spark.streaming.Seconds
+import org.apache.spark.streaming.StreamingContext
+import org.apache.spark.streaming.TestSuiteBase
+import org.apache.spark.streaming.util.Clock
+import org.apache.spark.streaming.util.ManualClock
+import org.scalatest.BeforeAndAfter
+import org.scalatest.Matchers
+import org.scalatest.mock.EasyMockSugar
+
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.InvalidStateException
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.KinesisClientLibDependencyException
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.ShutdownException
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.ThrottlingException
+import 
com.amazonaws.services.kinesis.clientlibrary.interfaces.IRecordProcessorCheckpointer
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+import com.amazonaws.services.kinesis.clientlibrary.types.ShutdownReason
+import com.amazonaws.services.kinesis.model.Record
+
+/**
+ *  Suite of Kinesis streaming receiver tests focusing mostly on the 
KinesisRecordProcessor 
+ */
+class KinesisReceiverSuite extends TestSuiteBase with Matchers with 
BeforeAndAfter
+with EasyMockSugar {
+
+  val app = TestKinesisReceiver
+  val stream = mySparkStream
+  val endpoint = endpoint-url
+  val workerId = dummyWorkerId
+  val shardId = dummyShardId
+
+  val record1 = new Record()
+  record1.setData(ByteBuffer.wrap(Spark In Action.getBytes()))
+  val record2 = new Record()
+  record2.setData(ByteBuffer.wrap(Learning Spark.getBytes()))
+  val batch = List[Record](record1, record2)
+
+  var receiverMock: KinesisReceiver = _
+  var checkpointerMock: IRecordProcessorCheckpointer = _
+  var checkpointClockMock: ManualClock = _
+  var checkpointStateMock: KinesisCheckpointState = _
+  var currentClockMock: Clock = _
+
+  override def beforeFunction() = {
+receiverMock = mock[KinesisReceiver]
+checkpointerMock = mock[IRecordProcessorCheckpointer]
+checkpointClockMock = mock[ManualClock]
+checkpointStateMock = mock[KinesisCheckpointState]
+currentClockMock = mock[Clock]
+  }
+
+  test(kinesis utils api) {
+val ssc = new StreamingContext(master, framework, batchDuration)
+// Tests the API, does not actually test data receiving
+val kinesisStream = KinesisUtils.createStream(ssc, mySparkStream,
+  https://kinesis.us-west-2.amazonaws.com;, Seconds(2),
+  InitialPositionInStream.LATEST, StorageLevel.MEMORY_AND_DISK_2);
+ssc.stop()
+  }
+
+  test(process records including store and checkpoint) {
+val expectedCheckpointIntervalMillis = 10
+expecting {
+  receiverMock.isStopped().andReturn(false).once()
+  receiverMock.store(record1.getData().array()).once()
+  receiverMock.store(record2.getData().array()).once()
+  checkpointStateMock.shouldCheckpoint().andReturn(true).once()
+  checkpointerMock.checkpoint().once()
+  checkpointStateMock.advanceCheckpoint().once()
+}
+whenExecuting(receiverMock, checkpointerMock, checkpointStateMock) {
+  val recordProcessor = new KinesisRecordProcessor(receiverMock, workerId,
+  checkpointStateMock)
+

[2/2] git commit: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-02 Thread tdas

[SPARK-1981] Add AWS Kinesis streaming support

Author: Chris Fregly ch...@fregly.com

Closes #1434 from cfregly/master and squashes the following commits:

4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more 
clear, removed retries around store() method
0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into 
extras/kinesis-asl
691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with 
JavaKinesisWordCount during union of streams
0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated 
docs
e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into 
the examples/ dir
d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the 
KinesisUtils api
912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail 
class
db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and 
kinesis client
338997e [Chris Fregly] improve build docs for kinesis
828f8ae [Chris Fregly] more cleanup
e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
cd68c0d [Chris Fregly] fixed typos and backward compatibility
d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/91f9504e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/91f9504e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/91f9504e

Branch: refs/heads/master
Commit: 91f9504e6086fac05b40545099f9818949c24bca
Parents: 67bd8e3
Author: Chris Fregly ch...@fregly.com
Authored: Sat Aug 2 13:35:35 2014 -0700
Committer: Tathagata Das tathagata.das1...@gmail.com
Committed: Sat Aug 2 13:35:35 2014 -0700

--
 bin/run-example |   3 +-
 bin/run-example2.cmd|   3 +-
 dev/audit-release/audit_release.py  |   4 +-
 .../sbt_app_core/src/main/scala/SparkApp.scala  |   7 +
 dev/audit-release/sbt_app_kinesis/build.sbt |  28 ++
 .../src/main/scala/SparkApp.scala   |  33 +++
 dev/create-release/create-release.sh|   4 +-
 dev/run-tests   |   3 +
 docs/streaming-custom-receivers.md  |   4 +-
 docs/streaming-kinesis.md   |  58 
 docs/streaming-programming-guide.md |  12 +-
 examples/pom.xml|  13 +
 extras/kinesis-asl/pom.xml  |  96 +++
 .../streaming/JavaKinesisWordCountASL.java  | 180 
 .../src/main/resources/log4j.properties |  37 +++
 .../streaming/KinesisWordCountASL.scala | 251 +
 .../kinesis/KinesisCheckpointState.scala|  56 
 .../streaming/kinesis/KinesisReceiver.scala | 149 ++
 .../kinesis/KinesisRecordProcessor.scala| 212 ++
 .../spark/streaming/kinesis/KinesisUtils.scala  |  96 +++
 .../kinesis/JavaKinesisStreamSuite.java |  41 +++
 .../src/test/resources/log4j.properties |  26 ++
 .../kinesis/KinesisReceiverSuite.scala  | 275 +++
 pom.xml |  10 +
 project/SparkBuild.scala|   6 +-
 25 files changed, 1592 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/91f9504e/bin/run-example
--
diff --git a/bin/run-example b/bin/run-example
index 942706d..68a3570 100755
--- a/bin/run-example
+++ b/bin/run-example
@@ -29,7 +29,8 @@ if [ -n $1 ]; then
 else
   echo Usage: ./bin/run-example example-class [example-args] 12
   echo   - set MASTER=XX to use a specific master 12
-  echo   - can use abbreviated example class name (e.g. SparkPi, 
mllib.LinearRegression) 12
+  echo   - can use abbreviated example class name relative to 
com.apache.spark.examples 12
+  echo  (e.g. SparkPi, mllib.LinearRegression, 
streaming.KinesisWordCountASL) 12
   exit 1
 fi
 

http://git-wip-us.apache.org/repos/asf/spark/blob/91f9504e/bin/run-example2.cmd
--
diff --git a/bin/run-example2.cmd b/bin/run-example2.cmd
index eadedd7..b29bf90 100644
--- a/bin/run-example2.cmd
+++ b/bin/run-example2.cmd
@@ -32,7 +32,8 @@ rem Test that an argument was given
 if not x%1==x goto arg_given
   echo Usage: run-example ^example-class^ [example-args]
   echo   -

[2/2] git commit: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-02 Thread tdas

[SPARK-1981] Add AWS Kinesis streaming support

Author: Chris Fregly ch...@fregly.com

Closes #1434 from cfregly/master and squashes the following commits:

4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more 
clear, removed retries around store() method
0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into 
extras/kinesis-asl
691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with 
JavaKinesisWordCount during union of streams
0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated 
docs
e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into 
the examples/ dir
d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the 
KinesisUtils api
912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail 
class
db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and 
kinesis client
338997e [Chris Fregly] improve build docs for kinesis
828f8ae [Chris Fregly] more cleanup
e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
cd68c0d [Chris Fregly] fixed typos and backward compatibility
d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support

(cherry picked from commit 91f9504e6086fac05b40545099f9818949c24bca)
Signed-off-by: Tathagata Das tathagata.das1...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bb0ac6d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bb0ac6d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bb0ac6d7

Branch: refs/heads/branch-1.1
Commit: bb0ac6d7c91c491a99c252e6cb4aea40efe9b190
Parents: 91de0dc
Author: Chris Fregly ch...@fregly.com
Authored: Sat Aug 2 13:35:35 2014 -0700
Committer: Tathagata Das tathagata.das1...@gmail.com
Committed: Sat Aug 2 13:35:57 2014 -0700

--
 bin/run-example |   3 +-
 bin/run-example2.cmd|   3 +-
 dev/audit-release/audit_release.py  |   4 +-
 .../sbt_app_core/src/main/scala/SparkApp.scala  |   7 +
 dev/audit-release/sbt_app_kinesis/build.sbt |  28 ++
 .../src/main/scala/SparkApp.scala   |  33 +++
 dev/create-release/create-release.sh|   4 +-
 dev/run-tests   |   3 +
 docs/streaming-custom-receivers.md  |   4 +-
 docs/streaming-kinesis.md   |  58 
 docs/streaming-programming-guide.md |  12 +-
 examples/pom.xml|  13 +
 extras/kinesis-asl/pom.xml  |  96 +++
 .../streaming/JavaKinesisWordCountASL.java  | 180 
 .../src/main/resources/log4j.properties |  37 +++
 .../streaming/KinesisWordCountASL.scala | 251 +
 .../kinesis/KinesisCheckpointState.scala|  56 
 .../streaming/kinesis/KinesisReceiver.scala | 149 ++
 .../kinesis/KinesisRecordProcessor.scala| 212 ++
 .../spark/streaming/kinesis/KinesisUtils.scala  |  96 +++
 .../kinesis/JavaKinesisStreamSuite.java |  41 +++
 .../src/test/resources/log4j.properties |  26 ++
 .../kinesis/KinesisReceiverSuite.scala  | 275 +++
 pom.xml |  10 +
 project/SparkBuild.scala|   6 +-
 25 files changed, 1592 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bb0ac6d7/bin/run-example
--
diff --git a/bin/run-example b/bin/run-example
index 942706d..68a3570 100755
--- a/bin/run-example
+++ b/bin/run-example
@@ -29,7 +29,8 @@ if [ -n $1 ]; then
 else
   echo Usage: ./bin/run-example example-class [example-args] 12
   echo   - set MASTER=XX to use a specific master 12
-  echo   - can use abbreviated example class name (e.g. SparkPi, 
mllib.LinearRegression) 12
+  echo   - can use abbreviated example class name relative to 
com.apache.spark.examples 12
+  echo  (e.g. SparkPi, mllib.LinearRegression, 
streaming.KinesisWordCountASL) 12
   exit 1
 fi
 

http://git-wip-us.apache.org/repos/asf/spark/blob/bb0ac6d7/bin/run-example2.cmd
--
diff --git a/bin/run-example2.cmd b/bin/run-example2.cmd
index eadedd7..b29bf90 100644
--- a/bin/run-example2.cmd
+++ b/bin/run-example2.cmd
@@ -32,7 +32,8 @@ rem

[1/2] [SPARK-1981] Add AWS Kinesis streaming support

2014-08-02 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 91de0dc16 - bb0ac6d7c


http://git-wip-us.apache.org/repos/asf/spark/blob/bb0ac6d7/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
--
diff --git 
a/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
 
b/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
new file mode 100644
index 000..41dbd64
--- /dev/null
+++ 
b/extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.nio.ByteBuffer
+
+import scala.collection.JavaConversions.seqAsJavaList
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming.Milliseconds
+import org.apache.spark.streaming.Seconds
+import org.apache.spark.streaming.StreamingContext
+import org.apache.spark.streaming.TestSuiteBase
+import org.apache.spark.streaming.util.Clock
+import org.apache.spark.streaming.util.ManualClock
+import org.scalatest.BeforeAndAfter
+import org.scalatest.Matchers
+import org.scalatest.mock.EasyMockSugar
+
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.InvalidStateException
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.KinesisClientLibDependencyException
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.ShutdownException
+import 
com.amazonaws.services.kinesis.clientlibrary.exceptions.ThrottlingException
+import 
com.amazonaws.services.kinesis.clientlibrary.interfaces.IRecordProcessorCheckpointer
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+import com.amazonaws.services.kinesis.clientlibrary.types.ShutdownReason
+import com.amazonaws.services.kinesis.model.Record
+
+/**
+ *  Suite of Kinesis streaming receiver tests focusing mostly on the 
KinesisRecordProcessor 
+ */
+class KinesisReceiverSuite extends TestSuiteBase with Matchers with 
BeforeAndAfter
+with EasyMockSugar {
+
+  val app = TestKinesisReceiver
+  val stream = mySparkStream
+  val endpoint = endpoint-url
+  val workerId = dummyWorkerId
+  val shardId = dummyShardId
+
+  val record1 = new Record()
+  record1.setData(ByteBuffer.wrap(Spark In Action.getBytes()))
+  val record2 = new Record()
+  record2.setData(ByteBuffer.wrap(Learning Spark.getBytes()))
+  val batch = List[Record](record1, record2)
+
+  var receiverMock: KinesisReceiver = _
+  var checkpointerMock: IRecordProcessorCheckpointer = _
+  var checkpointClockMock: ManualClock = _
+  var checkpointStateMock: KinesisCheckpointState = _
+  var currentClockMock: Clock = _
+
+  override def beforeFunction() = {
+receiverMock = mock[KinesisReceiver]
+checkpointerMock = mock[IRecordProcessorCheckpointer]
+checkpointClockMock = mock[ManualClock]
+checkpointStateMock = mock[KinesisCheckpointState]
+currentClockMock = mock[Clock]
+  }
+
+  test(kinesis utils api) {
+val ssc = new StreamingContext(master, framework, batchDuration)
+// Tests the API, does not actually test data receiving
+val kinesisStream = KinesisUtils.createStream(ssc, mySparkStream,
+  https://kinesis.us-west-2.amazonaws.com;, Seconds(2),
+  InitialPositionInStream.LATEST, StorageLevel.MEMORY_AND_DISK_2);
+ssc.stop()
+  }
+
+  test(process records including store and checkpoint) {
+val expectedCheckpointIntervalMillis = 10
+expecting {
+  receiverMock.isStopped().andReturn(false).once()
+  receiverMock.store(record1.getData().array()).once()
+  receiverMock.store(record2.getData().array()).once()
+  checkpointStateMock.shouldCheckpoint().andReturn(true).once()
+  checkpointerMock.checkpoint().once()
+  checkpointStateMock.advanceCheckpoint().once()
+}
+whenExecuting(receiverMock, checkpointerMock, checkpointStateMock) {
+  val recordProcessor = new KinesisRecordProcessor(receiverMock, workerId,
+  checkpointStateMock)
+

git commit: SPARK-2804: Remove scalalogging-slf4j dependency

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 91f9504e6 - 4c477117b


SPARK-2804: Remove scalalogging-slf4j dependency

This also Closes #1701.

Author: GuoQiang Li wi...@qq.com

Closes #1208 from witgo/SPARK-1470 and squashes the following commits:

422646b [GuoQiang Li] Remove scalalogging-slf4j dependency


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c477117
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c477117
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c477117

Branch: refs/heads/master
Commit: 4c477117bb1ffef463776c86f925d35036f96b7a
Parents: 91f9504
Author: GuoQiang Li wi...@qq.com
Authored: Sat Aug 2 13:55:28 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 13:59:58 2014 -0700

--
 .../main/scala/org/apache/spark/Logging.scala   | 10 ++---
 sql/catalyst/pom.xml|  5 -
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  4 ++--
 .../catalyst/analysis/HiveTypeCoercion.scala|  8 +++
 .../catalyst/expressions/BoundAttribute.scala   |  2 +-
 .../expressions/codegen/GenerateOrdering.scala  |  4 ++--
 .../org/apache/spark/sql/catalyst/package.scala |  1 -
 .../sql/catalyst/planning/QueryPlanner.scala|  2 +-
 .../spark/sql/catalyst/planning/patterns.scala  |  6 +++---
 .../apache/spark/sql/catalyst/rules/Rule.scala  |  2 +-
 .../spark/sql/catalyst/rules/RuleExecutor.scala | 12 +--
 .../spark/sql/catalyst/trees/package.scala  |  8 ---
 .../scala/org/apache/spark/sql/SQLContext.scala |  2 +-
 .../compression/CompressibleColumnBuilder.scala |  5 +++--
 .../apache/spark/sql/execution/Exchange.scala   |  2 +-
 .../org/apache/spark/sql/json/JsonRDD.scala |  2 +-
 .../scala/org/apache/spark/sql/package.scala|  2 --
 .../spark/sql/columnar/ColumnTypeSuite.scala|  4 ++--
 .../hive/thriftserver/HiveThriftServer2.scala   | 12 +--
 .../hive/thriftserver/SparkSQLCLIDriver.scala   |  2 +-
 .../sql/hive/thriftserver/SparkSQLDriver.scala  |  6 +++---
 .../sql/hive/thriftserver/SparkSQLEnv.scala |  6 +++---
 .../server/SparkSQLOperationManager.scala   | 13 ++--
 .../thriftserver/HiveThriftServer2Suite.scala   |  2 +-
 .../org/apache/spark/sql/hive/HiveContext.scala |  2 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  3 ++-
 .../org/apache/spark/sql/hive/TestHive.scala| 10 -
 .../org/apache/spark/sql/hive/hiveUdfs.scala|  4 ++--
 .../sql/hive/execution/HiveComparisonTest.scala | 22 ++--
 .../sql/hive/execution/HiveQueryFileTest.scala  |  2 +-
 30 files changed, 83 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4c477117/core/src/main/scala/org/apache/spark/Logging.scala
--
diff --git a/core/src/main/scala/org/apache/spark/Logging.scala 
b/core/src/main/scala/org/apache/spark/Logging.scala
index 807ef3e..d4f2624 100644
--- a/core/src/main/scala/org/apache/spark/Logging.scala
+++ b/core/src/main/scala/org/apache/spark/Logging.scala
@@ -39,13 +39,17 @@ trait Logging {
   // be serialized and used on another machine
   @transient private var log_ : Logger = null
 
+  // Method to get the logger name for this object
+  protected def logName = {
+// Ignore trailing $'s in the class names for Scala objects
+this.getClass.getName.stripSuffix($)
+  }
+
   // Method to get or create the logger for this object
   protected def log: Logger = {
 if (log_ == null) {
   initializeIfNecessary()
-  var className = this.getClass.getName
-  // Ignore trailing $'s in the class names for Scala objects
-  log_ = LoggerFactory.getLogger(className.stripSuffix($))
+  log_ = LoggerFactory.getLogger(logName)
 }
 log_
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/4c477117/sql/catalyst/pom.xml
--
diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml
index 54fa96b..58d44e7 100644
--- a/sql/catalyst/pom.xml
+++ b/sql/catalyst/pom.xml
@@ -55,11 +55,6 @@
   version${project.version}/version
 /dependency
 dependency
-  groupIdcom.typesafe/groupId
-  artifactIdscalalogging-slf4j_${scala.binary.version}/artifactId
-  version1.0.1/version
-/dependency
-dependency
   groupIdorg.scalatest/groupId
   artifactIdscalatest_${scala.binary.version}/artifactId
   scopetest/scope

http://git-wip-us.apache.org/repos/asf/spark/blob/4c477117/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

[1/2] [SPARK-2097][SQL] UDF Support

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 4c477117b - 158ad0bba


http://git-wip-us.apache.org/repos/asf/spark/blob/158ad0bb/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala
new file mode 100644
index 000..158f26e
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala
@@ -0,0 +1,252 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.api.java
+
+import org.apache.spark.sql.catalyst.expressions.{Expression, ScalaUdf}
+import org.apache.spark.sql.types.util.DataTypeConversions._
+
+/**
+ * A collection of functions that allow Java users to register UDFs.  In order 
to handle functions
+ * of varying airities with minimal boilerplate for our users, we generate 
classes and functions
+ * for each airity up to 22.  The code for this generation can be found in 
comments in this trait.
+ */
+private[java] trait UDFRegistration {
+  self: JavaSQLContext =
+
+  /* The following functions and required interfaces are generated with these 
code fragments:
+
+   (1 to 22).foreach { i =
+ val extTypeArgs = (1 to i).map(_ = _).mkString(, )
+ val anyTypeArgs = (1 to i).map(_ = Any).mkString(, )
+ val anyCast = s.asInstanceOf[UDF$i[$anyTypeArgs, Any]]
+ val anyParams = (1 to i).map(_ = _: Any).mkString(, )
+ println(s
+ |def registerFunction(
+ |name: String, f: UDF$i[$extTypeArgs, _], @transient dataType: 
DataType) = {
+ |  val scalaType = asScalaDataType(dataType)
+ |  sqlContext.functionRegistry.registerFunction(
+ |name,
+ |(e: Seq[Expression]) = ScalaUdf(f$anyCast.call($anyParams), 
scalaType, e))
+ |}
+   .stripMargin)
+   }
+
+  import java.io.File
+  import org.apache.spark.sql.catalyst.util.stringToFile
+  val directory = new 
File(sql/core/src/main/java/org/apache/spark/sql/api/java/)
+  (1 to 22).foreach { i =
+val typeArgs = (1 to i).map(i = sT$i).mkString(, )
+val args = (1 to i).map(i = sT$i t$i).mkString(, )
+
+val contents =
+  s/*
+ | * Licensed to the Apache Software Foundation (ASF) under one or more
+ | * contributor license agreements.  See the NOTICE file distributed 
with
+ | * this work for additional information regarding copyright 
ownership.
+ | * The ASF licenses this file to You under the Apache License, 
Version 2.0
+ | * (the License); you may not use this file except in compliance 
with
+ | * the License.  You may obtain a copy of the License at
+ | *
+ | *http://www.apache.org/licenses/LICENSE-2.0
+ | *
+ | * Unless required by applicable law or agreed to in writing, 
software
+ | * distributed under the License is distributed on an AS IS BASIS,
+ | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ | * See the License for the specific language governing permissions 
and
+ | * limitations under the License.
+ | */
+ |
+ |package org.apache.spark.sql.api.java;
+ |
+ |import java.io.Serializable;
+ |
+ |// **
+ |// THIS FILE IS AUTOGENERATED BY CODE IN
+ |// org.apache.spark.sql.api.java.FunctionRegistration
+ |// **
+ |
+ |/**
+ | * A Spark SQL UDF that has $i arguments.
+ | */
+ |public interface UDF$i$typeArgs, R extends Serializable {
+ |  public R call($args) throws Exception;
+ |}
+ |.stripMargin
+
+  stringToFile(new File(directory, sUDF$i.java), contents)
+  }
+
+  */
+
+  // scalastyle:off
+  def registerFunction(name: String, f: UDF1[_, _], dataType: DataType) = {
+val scalaType = asScalaDataType(dataType)
+sqlContext.functionRegistry.registerFunction(
+  name,
+  (e: Seq[Expression]) =

[1/2] [SPARK-2097][SQL] UDF Support

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 7924d72cf - 3b9f25f42


http://git-wip-us.apache.org/repos/asf/spark/blob/3b9f25f4/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala
new file mode 100644
index 000..158f26e
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala
@@ -0,0 +1,252 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.api.java
+
+import org.apache.spark.sql.catalyst.expressions.{Expression, ScalaUdf}
+import org.apache.spark.sql.types.util.DataTypeConversions._
+
+/**
+ * A collection of functions that allow Java users to register UDFs.  In order 
to handle functions
+ * of varying airities with minimal boilerplate for our users, we generate 
classes and functions
+ * for each airity up to 22.  The code for this generation can be found in 
comments in this trait.
+ */
+private[java] trait UDFRegistration {
+  self: JavaSQLContext =
+
+  /* The following functions and required interfaces are generated with these 
code fragments:
+
+   (1 to 22).foreach { i =
+ val extTypeArgs = (1 to i).map(_ = _).mkString(, )
+ val anyTypeArgs = (1 to i).map(_ = Any).mkString(, )
+ val anyCast = s.asInstanceOf[UDF$i[$anyTypeArgs, Any]]
+ val anyParams = (1 to i).map(_ = _: Any).mkString(, )
+ println(s
+ |def registerFunction(
+ |name: String, f: UDF$i[$extTypeArgs, _], @transient dataType: 
DataType) = {
+ |  val scalaType = asScalaDataType(dataType)
+ |  sqlContext.functionRegistry.registerFunction(
+ |name,
+ |(e: Seq[Expression]) = ScalaUdf(f$anyCast.call($anyParams), 
scalaType, e))
+ |}
+   .stripMargin)
+   }
+
+  import java.io.File
+  import org.apache.spark.sql.catalyst.util.stringToFile
+  val directory = new 
File(sql/core/src/main/java/org/apache/spark/sql/api/java/)
+  (1 to 22).foreach { i =
+val typeArgs = (1 to i).map(i = sT$i).mkString(, )
+val args = (1 to i).map(i = sT$i t$i).mkString(, )
+
+val contents =
+  s/*
+ | * Licensed to the Apache Software Foundation (ASF) under one or more
+ | * contributor license agreements.  See the NOTICE file distributed 
with
+ | * this work for additional information regarding copyright 
ownership.
+ | * The ASF licenses this file to You under the Apache License, 
Version 2.0
+ | * (the License); you may not use this file except in compliance 
with
+ | * the License.  You may obtain a copy of the License at
+ | *
+ | *http://www.apache.org/licenses/LICENSE-2.0
+ | *
+ | * Unless required by applicable law or agreed to in writing, 
software
+ | * distributed under the License is distributed on an AS IS BASIS,
+ | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ | * See the License for the specific language governing permissions 
and
+ | * limitations under the License.
+ | */
+ |
+ |package org.apache.spark.sql.api.java;
+ |
+ |import java.io.Serializable;
+ |
+ |// **
+ |// THIS FILE IS AUTOGENERATED BY CODE IN
+ |// org.apache.spark.sql.api.java.FunctionRegistration
+ |// **
+ |
+ |/**
+ | * A Spark SQL UDF that has $i arguments.
+ | */
+ |public interface UDF$i$typeArgs, R extends Serializable {
+ |  public R call($args) throws Exception;
+ |}
+ |.stripMargin
+
+  stringToFile(new File(directory, sUDF$i.java), contents)
+  }
+
+  */
+
+  // scalastyle:off
+  def registerFunction(name: String, f: UDF1[_, _], dataType: DataType) = {
+val scalaType = asScalaDataType(dataType)
+sqlContext.functionRegistry.registerFunction(
+  name,
+  (e: Seq[Expression]) =

[2/2] git commit: [SPARK-2097][SQL] UDF Support

2014-08-02 Thread marmbrus

[SPARK-2097][SQL] UDF Support

This patch adds the ability to register lambda functions written in Python, 
Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction(strLenScala, (_: String).length)
sql(SELECT strLenScala('test'))
```
Python:
```python
sqlCtx.registerFunction(strLenPython, lambda x: len(x), IntegerType())
sqlCtx.sql(SELECT strLenPython('test'))
```
Java:
```java
sqlContext.registerFunction(stringLengthJava, new UDF1String, Integer() {
  Override
  public Integer call(String str) throws Exception {
return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql(SELECT stringLengthJava('test'));
```

Author: Michael Armbrust mich...@databricks.com

Closes #1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be 
serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, 
address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for 
Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/158ad0bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/158ad0bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/158ad0bb

Branch: refs/heads/master
Commit: 158ad0bba9382fd494b4789b5628a9cec00cfa19
Parents: 4c47711
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 16:33:48 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 16:33:48 2014 -0700

--
 python/pyspark/sql.py   |  39 ++-
 .../catalyst/analysis/FunctionRegistry.scala|  32 ++
 .../sql/catalyst/expressions/ScalaUdf.scala | 307 +++
 .../org/apache/spark/sql/api/java/UDF1.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF10.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF11.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF12.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF13.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF14.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF15.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF16.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF17.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF18.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF19.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF2.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF20.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF21.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF22.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF3.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF4.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF5.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF6.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF7.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF8.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF9.java |  32 ++
 .../scala/org/apache/spark/sql/SQLContext.scala |  11 +-
 .../org/apache/spark/sql/UdfRegistration.scala  | 196 
 .../spark/sql/api/java/JavaSQLContext.scala |   5 +-
 .../spark/sql/api/java/UDFRegistration.scala| 252 +++
 .../spark/sql/execution/SparkStrategies.scala   |   2 +
 .../apache/spark/sql/execution/pythonUdfs.scala | 177 +++
 .../apache/spark/sql/api/java/JavaAPISuite.java |  90 ++
 .../org/apache/spark/sql/InsertIntoSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/UDFSuite.scala   |  36 +++
 .../org/apache/spark/sql/hive/HiveContext.scala |  13 +-
 .../org/apache/spark/sql/hive/TestHive.scala|   4 +-
 .../org/apache/spark/sql/hive/hiveUdfs.scala|   6 +-
 .../scala/org/apache/spark/sql/QueryTest.scala  |   4 +-
 38 files changed, 1861 insertions(+), 19 deletions(-)
--

[2/2] git commit: [SPARK-2097][SQL] UDF Support

2014-08-02 Thread marmbrus

[SPARK-2097][SQL] UDF Support

This patch adds the ability to register lambda functions written in Python, 
Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction(strLenScala, (_: String).length)
sql(SELECT strLenScala('test'))
```
Python:
```python
sqlCtx.registerFunction(strLenPython, lambda x: len(x), IntegerType())
sqlCtx.sql(SELECT strLenPython('test'))
```
Java:
```java
sqlContext.registerFunction(stringLengthJava, new UDF1String, Integer() {
  Override
  public Integer call(String str) throws Exception {
return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql(SELECT stringLengthJava('test'));
```

Author: Michael Armbrust mich...@databricks.com

Closes #1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be 
serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, 
address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for 
Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.

(cherry picked from commit 158ad0bba9382fd494b4789b5628a9cec00cfa19)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b9f25f4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b9f25f4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b9f25f4

Branch: refs/heads/branch-1.1
Commit: 3b9f25f4259b254f3faa2a7d61e547089a69c259
Parents: 7924d72
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 16:33:48 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 16:34:00 2014 -0700

--
 python/pyspark/sql.py   |  39 ++-
 .../catalyst/analysis/FunctionRegistry.scala|  32 ++
 .../sql/catalyst/expressions/ScalaUdf.scala | 307 +++
 .../org/apache/spark/sql/api/java/UDF1.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF10.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF11.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF12.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF13.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF14.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF15.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF16.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF17.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF18.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF19.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF2.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF20.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF21.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF22.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF3.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF4.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF5.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF6.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF7.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF8.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF9.java |  32 ++
 .../scala/org/apache/spark/sql/SQLContext.scala |  11 +-
 .../org/apache/spark/sql/UdfRegistration.scala  | 196 
 .../spark/sql/api/java/JavaSQLContext.scala |   5 +-
 .../spark/sql/api/java/UDFRegistration.scala| 252 +++
 .../spark/sql/execution/SparkStrategies.scala   |   2 +
 .../apache/spark/sql/execution/pythonUdfs.scala | 177 +++
 .../apache/spark/sql/api/java/JavaAPISuite.java |  90 ++
 .../org/apache/spark/sql/InsertIntoSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/UDFSuite.scala   |  36 +++
 .../org/apache/spark/sql/hive/HiveContext.scala |  13 +-
 .../org/apache/spark/sql/hive/TestHive.scala|   4 +-
 .../org/apache/spark/sql/hive/hiveUdfs.scala|   6 +-
 .../scala/org/apache/spark/sql/QueryTest.scala  |   4 +-
 38 files changed, 1861 insertions(+), 19 deletions(-)

git commit: [SPARK-2785][SQL] Remove assertions that throw when users try unsupported Hive commands.

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 158ad0bba - 198df11f1


[SPARK-2785][SQL] Remove assertions that throw when users try unsupported Hive 
commands.

Author: Michael Armbrust mich...@databricks.com

Closes #1742 from marmbrus/asserts and squashes the following commits:

5182d54 [Michael Armbrust] Remove assertions that throw when users try 
unsupported Hive commands.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/198df11f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/198df11f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/198df11f

Branch: refs/heads/master
Commit: 198df11f1a9f419f820f47eba0e9f2ab371a824b
Parents: 158ad0b
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 16:48:07 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 16:48:07 2014 -0700

--
 .../main/scala/org/apache/spark/sql/hive/HiveQl.scala  | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/198df11f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
index 3d2eb1e..bc2fefa 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
@@ -297,8 +297,11 @@ private[hive] object HiveQl {
   matches.headOption
 }
 
-assert(remainingNodes.isEmpty,
-  sUnhandled clauses: ${remainingNodes.map(dumpTree(_)).mkString(\n)})
+if (remainingNodes.nonEmpty) {
+  sys.error(
+sUnhandled clauses: 
${remainingNodes.map(dumpTree(_)).mkString(\n)}.
+   |You are likely trying to use an unsupported Hive 
feature..stripMargin)
+}
 clauses
   }
 
@@ -748,7 +751,10 @@ private[hive] object HiveQl {
 case Token(allJoinTokens(joinToken),
relation1 ::
relation2 :: other) =
-  assert(other.size = 1, sUnhandled join child $other)
+  if (!(other.size = 1)) {
+sys.error(sUnsupported join operation: $other)
+  }
+
   val joinType = joinToken match {
 case TOK_JOIN = Inner
 case TOK_RIGHTOUTERJOIN = RightOuter
@@ -756,7 +762,6 @@ private[hive] object HiveQl {
 case TOK_FULLOUTERJOIN = FullOuter
 case TOK_LEFTSEMIJOIN = LeftSemi
   }
-  assert(other.size = 1, Unhandled join clauses.)
   Join(nodeToRelation(relation1),
 nodeToRelation(relation2),
 joinType,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2785][SQL] Remove assertions that throw when users try unsupported Hive commands.

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 3b9f25f42 - 4230df4e1


[SPARK-2785][SQL] Remove assertions that throw when users try unsupported Hive 
commands.

Author: Michael Armbrust mich...@databricks.com

Closes #1742 from marmbrus/asserts and squashes the following commits:

5182d54 [Michael Armbrust] Remove assertions that throw when users try 
unsupported Hive commands.

(cherry picked from commit 198df11f1a9f419f820f47eba0e9f2ab371a824b)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4230df4e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4230df4e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4230df4e

Branch: refs/heads/branch-1.1
Commit: 4230df4e1d6c59dc3405f46f5edf18c3825a5447
Parents: 3b9f25f
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 16:48:07 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 16:48:17 2014 -0700

--
 .../main/scala/org/apache/spark/sql/hive/HiveQl.scala  | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4230df4e/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
index 3d2eb1e..bc2fefa 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
@@ -297,8 +297,11 @@ private[hive] object HiveQl {
   matches.headOption
 }
 
-assert(remainingNodes.isEmpty,
-  sUnhandled clauses: ${remainingNodes.map(dumpTree(_)).mkString(\n)})
+if (remainingNodes.nonEmpty) {
+  sys.error(
+sUnhandled clauses: 
${remainingNodes.map(dumpTree(_)).mkString(\n)}.
+   |You are likely trying to use an unsupported Hive 
feature..stripMargin)
+}
 clauses
   }
 
@@ -748,7 +751,10 @@ private[hive] object HiveQl {
 case Token(allJoinTokens(joinToken),
relation1 ::
relation2 :: other) =
-  assert(other.size = 1, sUnhandled join child $other)
+  if (!(other.size = 1)) {
+sys.error(sUnsupported join operation: $other)
+  }
+
   val joinType = joinToken match {
 case TOK_JOIN = Inner
 case TOK_RIGHTOUTERJOIN = RightOuter
@@ -756,7 +762,6 @@ private[hive] object HiveQl {
 case TOK_FULLOUTERJOIN = FullOuter
 case TOK_LEFTSEMIJOIN = LeftSemi
   }
-  assert(other.size = 1, Unhandled join clauses.)
   Join(nodeToRelation(relation1),
 nodeToRelation(relation2),
 joinType,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2729][SQL] Added test case for SPARK-2729

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 4230df4e1 - 460fad817


[SPARK-2729][SQL] Added test case for SPARK-2729

This is a follow up of #1636.

Author: Cheng Lian lian.cs@gmail.com

Closes #1738 from liancheng/test-for-spark-2729 and squashes the following 
commits:

b13692a [Cheng Lian] Added test case for SPARK-2729

(cherry picked from commit 866cf1f822cfda22294054be026ef2d96307eb75)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/460fad81
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/460fad81
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/460fad81

Branch: refs/heads/branch-1.1
Commit: 460fad817da1fb6619d2456f637c1b7c7f5e8c7c
Parents: 4230df4
Author: Cheng Lian lian.cs@gmail.com
Authored: Sat Aug 2 17:12:49 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 17:12:59 2014 -0700

--
 .../src/test/scala/org/apache/spark/sql/TestData.scala  | 12 ++--
 .../spark/sql/columnar/InMemoryColumnarQuerySuite.scala | 12 
 2 files changed, 22 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/460fad81/sql/core/src/test/scala/org/apache/spark/sql/TestData.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/TestData.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/TestData.scala
index 58cee21..088e6e3 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/TestData.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/TestData.scala
@@ -17,11 +17,13 @@
 
 package org.apache.spark.sql
 
+import java.sql.Timestamp
+
 import org.apache.spark.sql.catalyst.plans.logical
 import org.apache.spark.sql.test._
 
 /* Implicits */
-import TestSQLContext._
+import org.apache.spark.sql.test.TestSQLContext._
 
 case class TestData(key: Int, value: String)
 
@@ -40,7 +42,7 @@ object TestData {
   LargeAndSmallInts(2147483646, 1) ::
   LargeAndSmallInts(3, 2) :: Nil)
   largeAndSmallInts.registerAsTable(largeAndSmallInts)
-  
+
   case class TestData2(a: Int, b: Int)
   val testData2: SchemaRDD =
 TestSQLContext.sparkContext.parallelize(
@@ -143,4 +145,10 @@ object TestData {
   2, B2, false, null ::
   3, C3, true, null ::
   4, D4, true, 2147483644 :: Nil)
+
+  case class TimestampField(time: Timestamp)
+  val timestamps = TestSQLContext.sparkContext.parallelize((1 to 3).map { i =
+TimestampField(new Timestamp(i))
+  })
+  timestamps.registerAsTable(timestamps)
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/460fad81/sql/core/src/test/scala/org/apache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala
index 86727b9..b561b44 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala
@@ -73,4 +73,16 @@ class InMemoryColumnarQuerySuite extends QueryTest {
   sql(SELECT * FROM nullableRepeatedData),
   nullableRepeatedData.collect().toSeq)
   }
+
+  test(SPARK-2729 regression: timestamp data type) {
+checkAnswer(
+  sql(SELECT time FROM timestamps),
+  timestamps.collect().toSeq)
+
+TestSQLContext.cacheTable(timestamps)
+
+checkAnswer(
+  sql(SELECT time FROM timestamps),
+  timestamps.collect().toSeq)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2797] [SQL] SchemaRDDs don't support unpersist()

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 866cf1f82 - d210022e9


[SPARK-2797] [SQL] SchemaRDDs don't support unpersist()

The cause is explained in https://issues.apache.org/jira/browse/SPARK-2797.

Author: Yin Huai h...@cse.ohio-state.edu

Closes #1745 from yhuai/SPARK-2797 and squashes the following commits:

7b1627d [Yin Huai] The unpersist method of the Scala RDD cannot be called 
without the input parameter (blocking) from PySpark.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d210022e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d210022e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d210022e

Branch: refs/heads/master
Commit: d210022e96804e59e42ab902e53637e50884a9ab
Parents: 866cf1f
Author: Yin Huai h...@cse.ohio-state.edu
Authored: Sat Aug 2 17:55:22 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 17:55:22 2014 -0700

--
 python/pyspark/sql.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d210022e/python/pyspark/sql.py
--
diff --git a/python/pyspark/sql.py b/python/pyspark/sql.py
index e7c35ac..36e50e4 100644
--- a/python/pyspark/sql.py
+++ b/python/pyspark/sql.py
@@ -1589,9 +1589,9 @@ class SchemaRDD(RDD):
 self._jschema_rdd.persist(javaStorageLevel)
 return self
 
-def unpersist(self):
+def unpersist(self, blocking=True):
 self.is_cached = False
-self._jschema_rdd.unpersist()
+self._jschema_rdd.unpersist(blocking)
 return self
 
 def checkpoint(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2739][SQL] Rename registerAsTable to registerTempTable

2014-08-02 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master d210022e9 - 1a8043739


[SPARK-2739][SQL] Rename registerAsTable to registerTempTable

There have been user complaints that the difference between `registerAsTable` 
and `saveAsTable` is too subtle.  This PR addresses this by renaming 
`registerAsTable` to `registerTempTable`, which more clearly reflects what is 
happening.  `registerAsTable` remains, but will cause a deprecation warning.

Author: Michael Armbrust mich...@databricks.com

Closes #1743 from marmbrus/registerTempTable and squashes the following commits:

d031348 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
registerTempTable
4dff086 [Michael Armbrust] Fix .java files too
89a2f12 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
registerTempTable
0b7b71e [Michael Armbrust] Rename registerAsTable to registerTempTable


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1a804373
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1a804373
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1a804373

Branch: refs/heads/master
Commit: 1a8043739dc1d9435def6ea3c6341498ba52b708
Parents: d210022
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 18:27:04 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 18:27:04 2014 -0700

--
 .../sbt_app_sql/src/main/scala/SqlApp.scala |  2 +-
 docs/sql-programming-guide.md   | 18 ++---
 .../apache/spark/examples/sql/JavaSparkSQL.java |  8 +++---
 .../apache/spark/examples/sql/RDDRelation.scala |  4 +--
 .../spark/examples/sql/hive/HiveFromSpark.scala |  2 +-
 python/pyspark/sql.py   | 12 ++---
 .../scala/org/apache/spark/sql/SQLContext.scala |  4 +--
 .../scala/org/apache/spark/sql/SchemaRDD.scala  |  2 +-
 .../org/apache/spark/sql/SchemaRDDLike.scala|  5 +++-
 .../spark/sql/api/java/JavaSQLContext.scala |  2 +-
 .../sql/api/java/JavaApplySchemaSuite.java  |  6 ++---
 .../org/apache/spark/sql/CachedTableSuite.scala |  2 +-
 .../org/apache/spark/sql/InsertIntoSuite.scala  |  4 +--
 .../scala/org/apache/spark/sql/JoinSuite.scala  |  4 +--
 .../org/apache/spark/sql/SQLQuerySuite.scala|  6 ++---
 .../sql/ScalaReflectionRelationSuite.scala  |  8 +++---
 .../scala/org/apache/spark/sql/TestData.scala   | 28 ++--
 .../spark/sql/api/java/JavaSQLSuite.scala   | 10 +++
 .../org/apache/spark/sql/json/JsonSuite.scala   | 22 +++
 .../spark/sql/parquet/ParquetQuerySuite.scala   | 26 +-
 .../sql/hive/InsertIntoHiveTableSuite.scala |  2 +-
 .../sql/hive/api/java/JavaHiveQLSuite.scala |  4 +--
 .../sql/hive/execution/HiveQuerySuite.scala |  6 ++---
 .../hive/execution/HiveResolutionSuite.scala|  4 +--
 .../spark/sql/parquet/HiveParquetSuite.scala|  8 +++---
 25 files changed, 103 insertions(+), 96 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1a804373/dev/audit-release/sbt_app_sql/src/main/scala/SqlApp.scala
--
diff --git a/dev/audit-release/sbt_app_sql/src/main/scala/SqlApp.scala 
b/dev/audit-release/sbt_app_sql/src/main/scala/SqlApp.scala
index 50af90c..d888de9 100644
--- a/dev/audit-release/sbt_app_sql/src/main/scala/SqlApp.scala
+++ b/dev/audit-release/sbt_app_sql/src/main/scala/SqlApp.scala
@@ -38,7 +38,7 @@ object SparkSqlExample {
 
 import sqlContext._
 val people = sc.makeRDD(1 to 100, 10).map(x = Person(sName$x, x))
-people.registerAsTable(people)
+people.registerTempTable(people)
 val teenagers = sql(SELECT name FROM people WHERE age = 13 AND age = 
19)
 val teenagerNames = teenagers.map(t = Name:  + t(0)).collect()
 teenagerNames.foreach(println)

http://git-wip-us.apache.org/repos/asf/spark/blob/1a804373/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 7261bad..0465468 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -142,7 +142,7 @@ case class Person(name: String, age: Int)
 
 // Create an RDD of Person objects and register it as a table.
 val people = 
sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p 
= Person(p(0), p(1).trim.toInt))
-people.registerAsTable(people)
+people.registerTempTable(people)
 
 // SQL statements can be run by using the sql methods provided by sqlContext.
 val teenagers = sqlContext.sql(SELECT name FROM people WHERE age = 13 AND 
age = 19)
@@ -210,7 +210,7 @@ JavaRDDPerson people = 
sc.textFile(examples/src/main/resources/people.txt).m
 
 // Apply a schema to an RDD of JavaBeans and register

git commit: SPARK-2602 [BUILD] Tests steal focus under Java 6

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 5b30e0018 - 0d47bb642


SPARK-2602 [BUILD] Tests steal focus under Java 6

As per https://issues.apache.org/jira/browse/SPARK-2602 , this may be resolved 
for Java 6 with the java.awt.headless system property, which never hurt anyone 
running a command line app. I tested it and seemed to get rid of focus stealing.

Author: Sean Owen sro...@gmail.com

Closes #1747 from srowen/SPARK-2602 and squashes the following commits:

b141018 [Sean Owen] Set java.awt.headless during tests
(cherry picked from commit 33f167d762483b55d5d874dcc1e3075f661d4375)

Signed-off-by: Patrick Wendell pwend...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d47bb64
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d47bb64
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d47bb64

Branch: refs/heads/branch-1.1
Commit: 0d47bb642f645c3c8663f4bdf869b5337ef9cb35
Parents: 5b30e00
Author: Sean Owen sro...@gmail.com
Authored: Sat Aug 2 21:44:19 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 21:44:33 2014 -0700

--
 pom.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0d47bb64/pom.xml
--
diff --git a/pom.xml b/pom.xml
index a427591..cc9377c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -871,6 +871,7 @@
 argLine-Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=512m/argLine
 stderr/
 systemProperties
+  java.awt.headlesstrue/java.awt.headless
   
spark.test.home${session.executionRootDirectory}/spark.test.home
   spark.testing1/spark.testing
 /systemProperties


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: SPARK-2602 [BUILD] Tests steal focus under Java 6

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 1a8043739 - 33f167d76


SPARK-2602 [BUILD] Tests steal focus under Java 6

As per https://issues.apache.org/jira/browse/SPARK-2602 , this may be resolved 
for Java 6 with the java.awt.headless system property, which never hurt anyone 
running a command line app. I tested it and seemed to get rid of focus stealing.

Author: Sean Owen sro...@gmail.com

Closes #1747 from srowen/SPARK-2602 and squashes the following commits:

b141018 [Sean Owen] Set java.awt.headless during tests


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33f167d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33f167d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33f167d7

Branch: refs/heads/master
Commit: 33f167d762483b55d5d874dcc1e3075f661d4375
Parents: 1a80437
Author: Sean Owen sro...@gmail.com
Authored: Sat Aug 2 21:44:19 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 21:44:19 2014 -0700

--
 pom.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/33f167d7/pom.xml
--
diff --git a/pom.xml b/pom.xml
index a427591..cc9377c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -871,6 +871,7 @@
 argLine-Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=512m/argLine
 stderr/
 systemProperties
+  java.awt.headlesstrue/java.awt.headless
   
spark.test.home${session.executionRootDirectory}/spark.test.home
   spark.testing1/spark.testing
 /systemProperties


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: SPARK-2414 [BUILD] Add LICENSE entry for jquery

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 0d47bb642 - c137928cb


SPARK-2414 [BUILD] Add LICENSE entry for jquery

The JIRA concerned removing jquery, and this does not remove jquery. While it 
is distributed by Spark it should have an accompanying line in LICENSE, very 
technically, as per http://www.apache.org/dev/licensing-howto.html

Author: Sean Owen sro...@gmail.com

Closes #1748 from srowen/SPARK-2414 and squashes the following commits:

2fdb03c [Sean Owen] Add LICENSE entry for jquery
(cherry picked from commit 9cf429aaf529e91f619910c33cfe46bf33a66982)

Signed-off-by: Patrick Wendell pwend...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c137928c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c137928c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c137928c

Branch: refs/heads/branch-1.1
Commit: c137928cbe74446254fdbd656c50c1a1c8930094
Parents: 0d47bb6
Author: Sean Owen sro...@gmail.com
Authored: Sat Aug 2 21:55:56 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 21:56:29 2014 -0700

--
 LICENSE | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c137928c/LICENSE
--
diff --git a/LICENSE b/LICENSE
index 76a3601..e9a1153 100644
--- a/LICENSE
+++ b/LICENSE
@@ -549,3 +549,4 @@ The following components are provided under the MIT 
License. See project link fo
  (MIT License) pyrolite (org.spark-project:pyrolite:2.0.1 - 
http://pythonhosted.org/Pyro4/)
  (MIT License) scopt (com.github.scopt:scopt_2.10:3.2.0 - 
https://github.com/scopt/scopt)
  (The MIT License) Mockito (org.mockito:mockito-all:1.8.5 - 
http://www.mockito.org)
+ (MIT License) jquery (https://jquery.org/license/)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [Minor] Fixes on top of #1679

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 9cf429aaf - 3dc55fdf4


[Minor] Fixes on top of #1679

Minor fixes on top of #1679.

Author: Andrew Or andrewo...@gmail.com

Closes #1736 from andrewor14/amend-#1679 and squashes the following commits:

3b46f5e [Andrew Or] Minor fixes


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3dc55fdf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3dc55fdf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3dc55fdf

Branch: refs/heads/master
Commit: 3dc55fdf450b4237f7c592fce56d1467fd206366
Parents: 9cf429a
Author: Andrew Or andrewo...@gmail.com
Authored: Sat Aug 2 22:00:46 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 22:00:46 2014 -0700

--
 .../org/apache/spark/storage/BlockManagerSource.scala|  5 ++---
 .../scala/org/apache/spark/storage/StorageUtils.scala| 11 ---
 2 files changed, 6 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3dc55fdf/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
index e939318..3f14c40 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
@@ -46,9 +46,8 @@ private[spark] class BlockManagerSource(val blockManager: 
BlockManager, sc: Spar
   metricRegistry.register(MetricRegistry.name(memory, memUsed_MB), new 
Gauge[Long] {
 override def getValue: Long = {
   val storageStatusList = blockManager.master.getStorageStatus
-  val maxMem = storageStatusList.map(_.maxMem).sum
-  val remainingMem = storageStatusList.map(_.memRemaining).sum
-  (maxMem - remainingMem) / 1024 / 1024
+  val memUsed = storageStatusList.map(_.memUsed).sum
+  memUsed / 1024 / 1024
 }
   })
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3dc55fdf/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
b/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
index 0a0a448..2bd6b74 100644
--- a/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
+++ b/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
@@ -172,16 +172,13 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
   def memRemaining: Long = maxMem - memUsed
 
   /** Return the memory used by this block manager. */
-  def memUsed: Long =
-_nonRddStorageInfo._1 + _rddBlocks.keys.toSeq.map(memUsedByRdd).sum
+  def memUsed: Long = _nonRddStorageInfo._1 + 
_rddBlocks.keys.toSeq.map(memUsedByRdd).sum
 
   /** Return the disk space used by this block manager. */
-  def diskUsed: Long =
-_nonRddStorageInfo._2 + _rddBlocks.keys.toSeq.map(diskUsedByRdd).sum
+  def diskUsed: Long = _nonRddStorageInfo._2 + 
_rddBlocks.keys.toSeq.map(diskUsedByRdd).sum
 
   /** Return the off-heap space used by this block manager. */
-  def offHeapUsed: Long =
-_nonRddStorageInfo._3 + _rddBlocks.keys.toSeq.map(offHeapUsedByRdd).sum
+  def offHeapUsed: Long = _nonRddStorageInfo._3 + 
_rddBlocks.keys.toSeq.map(offHeapUsedByRdd).sum
 
   /** Return the memory used by the given RDD in this block manager in O(1) 
time. */
   def memUsedByRdd(rddId: Int): Long = 
_rddStorageInfo.get(rddId).map(_._1).getOrElse(0L)
@@ -246,7 +243,7 @@ private[spark] object StorageUtils {
   val rddId = rddInfo.id
   // Assume all blocks belonging to the same RDD have the same storage 
level
   val storageLevel = statuses
-.map(_.rddStorageLevel(rddId)).flatMap(s = 
s).headOption.getOrElse(StorageLevel.NONE)
+
.flatMap(_.rddStorageLevel(rddId)).headOption.getOrElse(StorageLevel.NONE)
   val numCachedPartitions = statuses.map(_.numRddBlocksById(rddId)).sum
   val memSize = statuses.map(_.memUsedByRdd(rddId)).sum
   val diskSize = statuses.map(_.diskUsedByRdd(rddId)).sum


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [Minor] Fixes on top of #1679

2014-08-02 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 c137928cb - fb2a2079f


[Minor] Fixes on top of #1679

Minor fixes on top of #1679.

Author: Andrew Or andrewo...@gmail.com

Closes #1736 from andrewor14/amend-#1679 and squashes the following commits:

3b46f5e [Andrew Or] Minor fixes
(cherry picked from commit 3dc55fdf450b4237f7c592fce56d1467fd206366)

Signed-off-by: Patrick Wendell pwend...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fb2a2079
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fb2a2079
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fb2a2079

Branch: refs/heads/branch-1.1
Commit: fb2a2079fa10ea8f338d68945a94238dda9fbd66
Parents: c137928
Author: Andrew Or andrewo...@gmail.com
Authored: Sat Aug 2 22:00:46 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sat Aug 2 22:01:01 2014 -0700

--
 .../org/apache/spark/storage/BlockManagerSource.scala|  5 ++---
 .../scala/org/apache/spark/storage/StorageUtils.scala| 11 ---
 2 files changed, 6 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fb2a2079/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
index e939318..3f14c40 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala
@@ -46,9 +46,8 @@ private[spark] class BlockManagerSource(val blockManager: 
BlockManager, sc: Spar
   metricRegistry.register(MetricRegistry.name(memory, memUsed_MB), new 
Gauge[Long] {
 override def getValue: Long = {
   val storageStatusList = blockManager.master.getStorageStatus
-  val maxMem = storageStatusList.map(_.maxMem).sum
-  val remainingMem = storageStatusList.map(_.memRemaining).sum
-  (maxMem - remainingMem) / 1024 / 1024
+  val memUsed = storageStatusList.map(_.memUsed).sum
+  memUsed / 1024 / 1024
 }
   })
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fb2a2079/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
b/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
index 0a0a448..2bd6b74 100644
--- a/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
+++ b/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
@@ -172,16 +172,13 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
   def memRemaining: Long = maxMem - memUsed
 
   /** Return the memory used by this block manager. */
-  def memUsed: Long =
-_nonRddStorageInfo._1 + _rddBlocks.keys.toSeq.map(memUsedByRdd).sum
+  def memUsed: Long = _nonRddStorageInfo._1 + 
_rddBlocks.keys.toSeq.map(memUsedByRdd).sum
 
   /** Return the disk space used by this block manager. */
-  def diskUsed: Long =
-_nonRddStorageInfo._2 + _rddBlocks.keys.toSeq.map(diskUsedByRdd).sum
+  def diskUsed: Long = _nonRddStorageInfo._2 + 
_rddBlocks.keys.toSeq.map(diskUsedByRdd).sum
 
   /** Return the off-heap space used by this block manager. */
-  def offHeapUsed: Long =
-_nonRddStorageInfo._3 + _rddBlocks.keys.toSeq.map(offHeapUsedByRdd).sum
+  def offHeapUsed: Long = _nonRddStorageInfo._3 + 
_rddBlocks.keys.toSeq.map(offHeapUsedByRdd).sum
 
   /** Return the memory used by the given RDD in this block manager in O(1) 
time. */
   def memUsedByRdd(rddId: Int): Long = 
_rddStorageInfo.get(rddId).map(_._1).getOrElse(0L)
@@ -246,7 +243,7 @@ private[spark] object StorageUtils {
   val rddId = rddInfo.id
   // Assume all blocks belonging to the same RDD have the same storage 
level
   val storageLevel = statuses
-.map(_.rddStorageLevel(rddId)).flatMap(s = 
s).headOption.getOrElse(StorageLevel.NONE)
+
.flatMap(_.rddStorageLevel(rddId)).headOption.getOrElse(StorageLevel.NONE)
   val numCachedPartitions = statuses.map(_.numRddBlocksById(rddId)).sum
   val memSize = statuses.map(_.memUsedByRdd(rddId)).sum
   val diskSize = statuses.map(_.diskUsedByRdd(rddId)).sum


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api

git commit: [SPARK-2316] Avoid O(blocks) operations in listeners

git commit: [SPARK-2454] Do not ship spark home to Workers

git commit: [SPARK-1812] sql/catalyst - Provide explicit type information

git commit: HOTFIX: Fixing test error in maven for flume-sink.

git commit: HOTFIX: Fix concurrency issue in FlumePollingStreamSuite.

git commit: MAINTENANCE: Automated closing of pull requests.

git commit: [HOTFIX] Do not throw NPE if spark.test.home is not set

git commit: [HOTFIX] Do not throw NPE if spark.test.home is not set

git commit: [SPARK-2478] [mllib] DecisionTree Python API

git commit: [SPARK-2478] [mllib] DecisionTree Python API

git commit: [SQL] Set outputPartitioning of BroadcastHashJoin correctly.

git commit: [SQL] Set outputPartitioning of BroadcastHashJoin correctly.

[1/2] [SPARK-1981] Add AWS Kinesis streaming support

[2/2] git commit: [SPARK-1981] Add AWS Kinesis streaming support

[2/2] git commit: [SPARK-1981] Add AWS Kinesis streaming support

[1/2] [SPARK-1981] Add AWS Kinesis streaming support

git commit: SPARK-2804: Remove scalalogging-slf4j dependency

[1/2] [SPARK-2097][SQL] UDF Support

[1/2] [SPARK-2097][SQL] UDF Support

[2/2] git commit: [SPARK-2097][SQL] UDF Support

[2/2] git commit: [SPARK-2097][SQL] UDF Support

git commit: [SPARK-2785][SQL] Remove assertions that throw when users try unsupported Hive commands.

git commit: [SPARK-2785][SQL] Remove assertions that throw when users try unsupported Hive commands.

git commit: [SPARK-2729][SQL] Added test case for SPARK-2729

git commit: [SPARK-2797] [SQL] SchemaRDDs don't support unpersist()

git commit: [SPARK-2739][SQL] Rename registerAsTable to registerTempTable

git commit: SPARK-2602 [BUILD] Tests steal focus under Java 6

git commit: SPARK-2602 [BUILD] Tests steal focus under Java 6

git commit: SPARK-2414 [BUILD] Add LICENSE entry for jquery

git commit: [Minor] Fixes on top of #1679

git commit: [Minor] Fixes on top of #1679

32 matches

Site Navigation

Mail list logo

Footer information