date:20141107

spark git commit: [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages

2014-11-07 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 7f86c350c - d6262fa05


[SPARK-4187] [Core] Switch to binary protocol for external shuffle service 
messages

This PR elimiantes the network package's usage of the Java serializer and 
replaces it with Encodable, which is a lightweight binary protocol. Each 
message is preceded by a type id, which will allow us to change messages (by 
only adding new ones), or to change the format entirely by switching to a 
special id (such as -1).

This protocol has the advantage over Java that we can guarantee that messages 
will remain compatible across compiled versions and JVMs, though it does not 
provide a clean way to do schema migration. In the future, it may be good to 
use a more heavy-weight serialization format like protobuf, thrift, or avro, 
but these all add several dependencies which are unnecessary at the present 
time.

Additionally this unifies the RPC messages of NettyBlockTransferService and 
ExternalShuffleClient.

Author: Aaron Davidson aa...@databricks.com

Closes #3146 from aarondav/free and squashes the following commits:

ed1102a [Aaron Davidson] Remove some unused imports
b8e2a49 [Aaron Davidson] Add appId to test
538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for 
external shuffle service messages

(cherry picked from commit d4fa04e50d299e9cad349b3781772956453a696b)
Signed-off-by: Reynold Xin r...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6262fa0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6262fa0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6262fa0

Branch: refs/heads/branch-1.2
Commit: d6262fa05b9b7ffde00e6659810a3436e53df6b8
Parents: 7f86c35
Author: Aaron Davidson aa...@databricks.com
Authored: Fri Nov 7 09:42:21 2014 -0800
Committer: Reynold Xin r...@databricks.com
Committed: Fri Nov 7 09:42:32 2014 -0800

--
 .../spark/network/BlockTransferService.scala|   4 +-
 .../network/netty/NettyBlockRpcServer.scala |  31 ++---
 .../netty/NettyBlockTransferService.scala   |  15 ++-
 .../network/nio/NioBlockTransferService.scala   |   1 +
 .../org/apache/spark/storage/BlockManager.scala |   5 +-
 .../netty/NettyBlockTransferSecuritySuite.scala |   4 +-
 .../network/protocol/ChunkFetchFailure.java |  12 +-
 .../apache/spark/network/protocol/Encoders.java |  93 +++
 .../spark/network/protocol/RpcFailure.java  |  12 +-
 .../spark/network/protocol/RpcRequest.java  |   9 +-
 .../spark/network/protocol/RpcResponse.java |   9 +-
 .../apache/spark/network/util/JavaUtils.java|  27 -
 .../apache/spark/network/sasl/SaslMessage.java  |  24 ++--
 .../network/shuffle/ExecutorShuffleInfo.java|  64 ---
 .../shuffle/ExternalShuffleBlockHandler.java|  21 ++--
 .../shuffle/ExternalShuffleBlockManager.java|   1 +
 .../network/shuffle/ExternalShuffleClient.java  |  12 +-
 .../shuffle/ExternalShuffleMessages.java| 106 -
 .../network/shuffle/OneForOneBlockFetcher.java  |  17 ++-
 .../network/shuffle/ShuffleStreamHandle.java|  60 --
 .../shuffle/protocol/BlockTransferMessage.java  |  76 +
 .../shuffle/protocol/ExecutorShuffleInfo.java   |  88 +++
 .../network/shuffle/protocol/OpenBlocks.java|  87 ++
 .../shuffle/protocol/RegisterExecutor.java  |  91 +++
 .../network/shuffle/protocol/StreamHandle.java  |  80 +
 .../network/shuffle/protocol/UploadBlock.java   | 113 +++
 .../shuffle/BlockTransferMessagesSuite.java |  44 
 .../ExternalShuffleBlockHandlerSuite.java   |  29 ++---
 .../ExternalShuffleIntegrationSuite.java|   1 +
 .../shuffle/ExternalShuffleSecuritySuite.java   |   1 +
 .../shuffle/OneForOneBlockFetcherSuite.java |  18 +--
 .../network/shuffle/ShuffleMessagesSuite.java   |  51 -
 .../network/shuffle/TestShuffleDataContext.java |   2 +
 33 files changed, 782 insertions(+), 426 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d6262fa0/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala 
b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
index 210a581..dcbda5a 100644
--- a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
+++ b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
@@ -73,6 +73,7 @@ abstract class BlockTransferService extends ShuffleClient 
with Closeable with Lo
   def uploadBlock(
   hostname: String,
   port: Int,
+  execId: String,
   blockId: BlockId,
   blockData:

spark git commit: [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages

2014-11-07 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 3abdb1b24 - d4fa04e50


[SPARK-4187] [Core] Switch to binary protocol for external shuffle service 
messages

This PR elimiantes the network package's usage of the Java serializer and 
replaces it with Encodable, which is a lightweight binary protocol. Each 
message is preceded by a type id, which will allow us to change messages (by 
only adding new ones), or to change the format entirely by switching to a 
special id (such as -1).

This protocol has the advantage over Java that we can guarantee that messages 
will remain compatible across compiled versions and JVMs, though it does not 
provide a clean way to do schema migration. In the future, it may be good to 
use a more heavy-weight serialization format like protobuf, thrift, or avro, 
but these all add several dependencies which are unnecessary at the present 
time.

Additionally this unifies the RPC messages of NettyBlockTransferService and 
ExternalShuffleClient.

Author: Aaron Davidson aa...@databricks.com

Closes #3146 from aarondav/free and squashes the following commits:

ed1102a [Aaron Davidson] Remove some unused imports
b8e2a49 [Aaron Davidson] Add appId to test
538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for 
external shuffle service messages


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4fa04e5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4fa04e5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4fa04e5

Branch: refs/heads/master
Commit: d4fa04e50d299e9cad349b3781772956453a696b
Parents: 3abdb1b
Author: Aaron Davidson aa...@databricks.com
Authored: Fri Nov 7 09:42:21 2014 -0800
Committer: Reynold Xin r...@databricks.com
Committed: Fri Nov 7 09:42:21 2014 -0800

--
 .../spark/network/BlockTransferService.scala|   4 +-
 .../network/netty/NettyBlockRpcServer.scala |  31 ++---
 .../netty/NettyBlockTransferService.scala   |  15 ++-
 .../network/nio/NioBlockTransferService.scala   |   1 +
 .../org/apache/spark/storage/BlockManager.scala |   5 +-
 .../netty/NettyBlockTransferSecuritySuite.scala |   4 +-
 .../network/protocol/ChunkFetchFailure.java |  12 +-
 .../apache/spark/network/protocol/Encoders.java |  93 +++
 .../spark/network/protocol/RpcFailure.java  |  12 +-
 .../spark/network/protocol/RpcRequest.java  |   9 +-
 .../spark/network/protocol/RpcResponse.java |   9 +-
 .../apache/spark/network/util/JavaUtils.java|  27 -
 .../apache/spark/network/sasl/SaslMessage.java  |  24 ++--
 .../network/shuffle/ExecutorShuffleInfo.java|  64 ---
 .../shuffle/ExternalShuffleBlockHandler.java|  21 ++--
 .../shuffle/ExternalShuffleBlockManager.java|   1 +
 .../network/shuffle/ExternalShuffleClient.java  |  12 +-
 .../shuffle/ExternalShuffleMessages.java| 106 -
 .../network/shuffle/OneForOneBlockFetcher.java  |  17 ++-
 .../network/shuffle/ShuffleStreamHandle.java|  60 --
 .../shuffle/protocol/BlockTransferMessage.java  |  76 +
 .../shuffle/protocol/ExecutorShuffleInfo.java   |  88 +++
 .../network/shuffle/protocol/OpenBlocks.java|  87 ++
 .../shuffle/protocol/RegisterExecutor.java  |  91 +++
 .../network/shuffle/protocol/StreamHandle.java  |  80 +
 .../network/shuffle/protocol/UploadBlock.java   | 113 +++
 .../shuffle/BlockTransferMessagesSuite.java |  44 
 .../ExternalShuffleBlockHandlerSuite.java   |  29 ++---
 .../ExternalShuffleIntegrationSuite.java|   1 +
 .../shuffle/ExternalShuffleSecuritySuite.java   |   1 +
 .../shuffle/OneForOneBlockFetcherSuite.java |  18 +--
 .../network/shuffle/ShuffleMessagesSuite.java   |  51 -
 .../network/shuffle/TestShuffleDataContext.java |   2 +
 33 files changed, 782 insertions(+), 426 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d4fa04e5/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala 
b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
index 210a581..dcbda5a 100644
--- a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
+++ b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala
@@ -73,6 +73,7 @@ abstract class BlockTransferService extends ShuffleClient 
with Closeable with Lo
   def uploadBlock(
   hostname: String,
   port: Int,
+  execId: String,
   blockId: BlockId,
   blockData: ManagedBuffer,
   level: StorageLevel): Future[Unit]
@@ -110,9 +111,10 @@ abstract class BlockTransferService extends

[06/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
new file mode 100644
index 000..f966f25
--- /dev/null
+++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.repl
+
+import java.io._
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+import scala.concurrent.Await
+import scala.concurrent.duration._
+import scala.tools.nsc.interpreter.SparkILoop
+
+import com.google.common.io.Files
+import org.scalatest.FunSuite
+import org.apache.commons.lang3.StringEscapeUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.util.Utils
+
+
+
+class ReplSuite extends FunSuite {
+
+  def runInterpreter(master: String, input: String): String = {
+val CONF_EXECUTOR_CLASSPATH = spark.executor.extraClassPath
+
+val in = new BufferedReader(new StringReader(input + \n))
+val out = new StringWriter()
+val cl = getClass.getClassLoader
+var paths = new ArrayBuffer[String]
+if (cl.isInstanceOf[URLClassLoader]) {
+  val urlLoader = cl.asInstanceOf[URLClassLoader]
+  for (url - urlLoader.getURLs) {
+if (url.getProtocol == file) {
+  paths += url.getFile
+}
+  }
+}
+val classpath = paths.mkString(File.pathSeparator)
+
+val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH)
+System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath)
+
+System.setProperty(spark.master, master)
+val interp = {
+  new SparkILoop(in, new PrintWriter(out))
+}
+org.apache.spark.repl.Main.interp = interp
+Main.s.processArguments(List(-classpath, classpath), true)
+Main.main(Array()) // call main
+org.apache.spark.repl.Main.interp = null
+
+if (oldExecutorClasspath != null) {
+  System.setProperty(CONF_EXECUTOR_CLASSPATH, oldExecutorClasspath)
+} else {
+  System.clearProperty(CONF_EXECUTOR_CLASSPATH)
+}
+return out.toString
+  }
+
+  def assertContains(message: String, output: String) {
+val isContain = output.contains(message)
+assert(isContain,
+  Interpreter output did not contain ' + message + ':\n + output)
+  }
+
+  def assertDoesNotContain(message: String, output: String) {
+val isContain = output.contains(message)
+assert(!isContain,
+  Interpreter output contained ' + message + ':\n + output)
+  }
+
+  test(propagation of local properties) {
+// A mock ILoop that doesn't install the SIGINT handler.
+class ILoop(out: PrintWriter) extends SparkILoop(None, out) {
+  settings = new scala.tools.nsc.Settings
+  settings.usejavacp.value = true
+  org.apache.spark.repl.Main.interp = this
+  override def createInterpreter() {
+intp = new SparkILoopInterpreter
+intp.setContextClassLoader()
+  }
+}
+
+val out = new StringWriter()
+Main.interp = new ILoop(new PrintWriter(out))
+Main.sparkContext = new SparkContext(local, repl-test)
+Main.interp.createInterpreter()
+
+Main.sparkContext.setLocalProperty(someKey, someValue)
+
+// Make sure the value we set in the caller to interpret is propagated in 
the thread that
+// interprets the command.
+
Main.interp.interpret(org.apache.spark.repl.Main.sparkContext.getLocalProperty(\someKey\))
+assert(out.toString.contains(someValue))
+
+Main.sparkContext.stop()
+System.clearProperty(spark.driver.port)
+  }
+
+  test(simple foreach with accumulator) {
+val output = runInterpreter(local,
+  
+|val accum = sc.accumulator(0)
+|sc.parallelize(1 to 10).foreach(x = accum += x)
+|accum.value
+  .stripMargin)
+assertDoesNotContain(error:, output)
+assertDoesNotContain(Exception, output)
+assertContains(res1: Int = 55, output)
+  }
+
+  test(external vars) {
+val output = runInterpreter(local,
+  
+|var v = 7

[16/20] spark git commit: small correction

2014-11-07 Thread pwendell

small correction


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/899fc3c0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/899fc3c0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/899fc3c0

Branch: refs/heads/scala-2.11-prashant
Commit: 899fc3c0d295111eaeacbaad18aac413e9c2d7ff
Parents: d7c35e2
Author: Prashant Sharma prashan...@imaginea.com
Authored: Wed Nov 5 18:20:03 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 core/pom.xml | 1 -
 1 file changed, 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/899fc3c0/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 37970c9..492eddd 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -329,7 +329,6 @@
   plugin
 groupIdorg.scalatest/groupId
 artifactIdscalatest-maven-plugin/artifactId
-version1.0/version
 executions
   execution
 idtest/id


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-11-07 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/scala-2.11-prashant [deleted] fd849b0b4

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[17/20] spark git commit: Run against scala 2.11 on jenkins.

2014-11-07 Thread pwendell

Run against scala 2.11 on jenkins.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d7c35e2f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d7c35e2f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d7c35e2f

Branch: refs/heads/scala-2.11-prashant
Commit: d7c35e2ffc2379ec4ce61a48f7048d3c8cc7c29c
Parents: ed4f646
Author: Prashant Sharma prashan...@imaginea.com
Authored: Wed Nov 5 18:18:21 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 dev/run-tests | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d7c35e2f/dev/run-tests
--
diff --git a/dev/run-tests b/dev/run-tests
index de607e4..74b0ba5 100755
--- a/dev/run-tests
+++ b/dev/run-tests
@@ -53,7 +53,7 @@ function handle_error () {
   fi
 }
 
-export SBT_MAVEN_PROFILES_ARGS=$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl
+export SBT_MAVEN_PROFILES_ARGS=$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl 
-Pscala-2.11
 
 # Determine Java path and version.
 {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[15/20] spark git commit: Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10

2014-11-07 Thread pwendell

Fixed build after rebasing with master. We should use ${scala.binary.version} 
instead of just 2.10


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eeed1a68
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eeed1a68
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eeed1a68

Branch: refs/heads/scala-2.11-prashant
Commit: eeed1a68f01538abfcc1007f9e25f8e34b76e554
Parents: c696f39
Author: Prashant Sharma prashan...@imaginea.com
Authored: Wed Nov 5 15:57:30 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 core/pom.xml| 4 ++--
 network/shuffle/pom.xml | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eeed1a68/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index afa8c8c..d71f265 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -74,12 +74,12 @@
 /dependency
 dependency
   groupIdorg.apache.spark/groupId
-  artifactIdspark-network-common_2.10/artifactId
+  artifactIdspark-network-common_${scala.binary.version}/artifactId
   version${project.version}/version
 /dependency
 dependency
   groupIdorg.apache.spark/groupId
-  artifactIdspark-network-shuffle_2.10/artifactId
+  artifactIdspark-network-shuffle_${scala.binary.version}/artifactId
   version${project.version}/version
 /dependency
 dependency

http://git-wip-us.apache.org/repos/asf/spark/blob/eeed1a68/network/shuffle/pom.xml
--
diff --git a/network/shuffle/pom.xml b/network/shuffle/pom.xml
index fe5681d..662212f 100644
--- a/network/shuffle/pom.xml
+++ b/network/shuffle/pom.xml
@@ -39,7 +39,7 @@
 !-- Core dependencies --
 dependency
   groupIdorg.apache.spark/groupId
-  artifactIdspark-network-common_2.10/artifactId
+  artifactIdspark-network-common_${scala.binary.version}/artifactId
   version${project.version}/version
 /dependency
 dependency
@@ -58,7 +58,7 @@
 !-- Test dependencies --
 dependency
   groupIdorg.apache.spark/groupId
-  artifactIdspark-network-common_2.10/artifactId
+  artifactIdspark-network-common_${scala.binary.version}/artifactId
   version${project.version}/version
   typetest-jar/type
   scopetest/scope


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[10/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
new file mode 100644
index 000..e56b74e
--- /dev/null
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -0,0 +1,1091 @@
+// scalastyle:off
+
+/* NSC -- new Scala compiler
+ * Copyright 2005-2013 LAMP/EPFL
+ * @author Alexander Spoon
+ */
+
+package org.apache.spark.repl
+
+
+import java.net.URL
+
+import scala.reflect.io.AbstractFile
+import scala.tools.nsc._
+import scala.tools.nsc.backend.JavaPlatform
+import scala.tools.nsc.interpreter._
+
+import scala.tools.nsc.interpreter.{Results = IR}
+import Predef.{println = _, _}
+import java.io.{BufferedReader, FileReader}
+import java.net.URI
+import java.util.concurrent.locks.ReentrantLock
+import scala.sys.process.Process
+import scala.tools.nsc.interpreter.session._
+import scala.util.Properties.{jdkHome, javaVersion}
+import scala.tools.util.{Javap}
+import scala.annotation.tailrec
+import scala.collection.mutable.ListBuffer
+import scala.concurrent.ops
+import scala.tools.nsc.util._
+import scala.tools.nsc.interpreter._
+import scala.tools.nsc.io.{File, Directory}
+import scala.reflect.NameTransformer._
+import scala.tools.nsc.util.ScalaClassLoader._
+import scala.tools.util._
+import scala.language.{implicitConversions, existentials, postfixOps}
+import scala.reflect.{ClassTag, classTag}
+import scala.tools.reflect.StdRuntimeTags._
+
+import java.lang.{Class = jClass}
+import scala.reflect.api.{Mirror, TypeCreator, Universe = ApiUniverse}
+
+import org.apache.spark.Logging
+import org.apache.spark.SparkConf
+import org.apache.spark.SparkContext
+import org.apache.spark.util.Utils
+
+/** The Scala interactive shell.  It provides a read-eval-print loop
+ *  around the Interpreter class.
+ *  After instantiation, clients should call the main() method.
+ *
+ *  If no in0 is specified, then input will come from the console, and
+ *  the class will attempt to provide input editing feature such as
+ *  input history.
+ *
+ *  @author Moez A. Abdel-Gawad
+ *  @author  Lex Spoon
+ *  @version 1.2
+ */
+class SparkILoop(in0: Option[BufferedReader], protected val out: JPrintWriter,
+   val master: Option[String])
+extends AnyRef
+   with LoopCommands
+   with SparkILoopInit
+   with Logging
+{
+  def this(in0: BufferedReader, out: JPrintWriter, master: String) = 
this(Some(in0), out, Some(master))
+  def this(in0: BufferedReader, out: JPrintWriter) = this(Some(in0), out, None)
+  def this() = this(None, new JPrintWriter(Console.out, true), None)
+
+  var in: InteractiveReader = _   // the input stream from which commands come
+  var settings: Settings = _
+  var intp: SparkIMain = _
+
+  @deprecated(Use `intp` instead., 2.9.0) def interpreter = intp
+  @deprecated(Use `intp` instead., 2.9.0) def interpreter_= (i: 
SparkIMain): Unit = intp = i
+
+  /** Having inherited the difficult var-ness of the repl instance,
+   *  I'm trying to work around it by moving operations into a class from
+   *  which it will appear a stable prefix.
+   */
+  private def onIntp[T](f: SparkIMain = T): T = f(intp)
+
+  class IMainOps[T : SparkIMain](val intp: T) {
+import intp._
+import global._
+
+def printAfterTyper(msg: = String) =
+  intp.reporter printMessage afterTyper(msg)
+
+/** Strip NullaryMethodType artifacts. */
+private def replInfo(sym: Symbol) = {
+  sym.info match {
+case NullaryMethodType(restpe) if sym.isAccessor  = restpe
+case info = info
+  }
+}
+def echoTypeStructure(sym: Symbol) =
+  printAfterTyper( + deconstruct.show(replInfo(sym)))
+
+def echoTypeSignature(sym: Symbol, verbose: Boolean) = {
+  if (verbose) SparkILoop.this.echo(// Type signature)
+  printAfterTyper( + replInfo(sym))
+
+  if (verbose) {
+SparkILoop.this.echo(\n// Internal Type structure)
+echoTypeStructure(sym)
+  }
+}
+  }
+  implicit def stabilizeIMain(intp: SparkIMain) = new IMainOps[intp.type](intp)
+
+  /** TODO -
+   *  -n normalize
+   *  -l label with case class parameter names
+   *  -c complete - leave nothing out
+   */
+  private def typeCommandInternal(expr: String, verbose: Boolean): Result = {
+onIntp { intp =
+  val sym = intp.symbolOfLine(expr)
+  if (sym.exists) intp.echoTypeSignature(sym, verbose)
+  else 
+}
+  }
+
+  var sparkContext: SparkContext = _
+
+  override def echoCommandMessage(msg: String) {
+intp.reporter printMessage msg
+  }
+
+  // def isAsync = !settings.Yreplsync.value
+  def isAsync = false
+  // lazy val power = new Power(intp, new

[05/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git a/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala
deleted file mode 100644
index 646c68e..000
--- a/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala
+++ /dev/null
@@ -1,1445 +0,0 @@
-// scalastyle:off
-
-/* NSC -- new Scala compiler
- * Copyright 2005-2013 LAMP/EPFL
- * @author  Martin Odersky
- */
-
-package org.apache.spark.repl
-
-import java.io.File
-
-import scala.tools.nsc._
-import scala.tools.nsc.backend.JavaPlatform
-import scala.tools.nsc.interpreter._
-
-import Predef.{ println = _, _ }
-import scala.tools.nsc.util.{MergedClassPath, stringFromWriter, 
ScalaClassLoader, stackTraceString}
-import scala.reflect.internal.util._
-import java.net.URL
-import scala.sys.BooleanProp
-import io.{AbstractFile, PlainFile, VirtualDirectory}
-
-import reporters._
-import symtab.Flags
-import scala.reflect.internal.Names
-import scala.tools.util.PathResolver
-import ScalaClassLoader.URLClassLoader
-import scala.tools.nsc.util.Exceptional.unwrap
-import scala.collection.{ mutable, immutable }
-import scala.util.control.Exception.{ ultimately }
-import SparkIMain._
-import java.util.concurrent.Future
-import typechecker.Analyzer
-import scala.language.implicitConversions
-import scala.reflect.runtime.{ universe = ru }
-import scala.reflect.{ ClassTag, classTag }
-import scala.tools.reflect.StdRuntimeTags._
-import scala.util.control.ControlThrowable
-
-import org.apache.spark.{Logging, HttpServer, SecurityManager, SparkConf}
-import org.apache.spark.util.Utils
-
-// /** directory to save .class files to */
-// private class ReplVirtualDirectory(out: JPrintWriter) extends 
VirtualDirectory(((memory)), None) {
-//   private def pp(root: AbstractFile, indentLevel: Int) {
-// val spaces =  * indentLevel
-// out.println(spaces + root.name)
-// if (root.isDirectory)
-//   root.toList sortBy (_.name) foreach (x = pp(x, indentLevel + 1))
-//   }
-//   // print the contents hierarchically
-//   def show() = pp(this, 0)
-// }
-
-  /** An interpreter for Scala code.
-   *
-   *  The main public entry points are compile(), interpret(), and bind().
-   *  The compile() method loads a complete Scala file.  The interpret() method
-   *  executes one line of Scala code at the request of the user.  The bind()
-   *  method binds an object to a variable that can then be used by later
-   *  interpreted code.
-   *
-   *  The overall approach is based on compiling the requested code and then
-   *  using a Java classloader and Java reflection to run the code
-   *  and access its results.
-   *
-   *  In more detail, a single compiler instance is used
-   *  to accumulate all successfully compiled or interpreted Scala code.  To
-   *  interpret a line of code, the compiler generates a fresh object that
-   *  includes the line of code and which has public member(s) to export
-   *  all variables defined by that code.  To extract the result of an
-   *  interpreted line to show the user, a second result object is created
-   *  which imports the variables exported by the above object and then
-   *  exports members called $eval and $print. To accomodate user 
expressions
-   *  that read from variables or methods defined in previous statements, 
import
-   *  statements are used.
-   *
-   *  This interpreter shares the strengths and weaknesses of using the
-   *  full compiler-to-Java.  The main strength is that interpreted code
-   *  behaves exactly as does compiled code, including running at full speed.
-   *  The main weakness is that redefining classes and methods is not handled
-   *  properly, because rebinding at the Java level is technically difficult.
-   *
-   *  @author Moez A. Abdel-Gawad
-   *  @author Lex Spoon
-   */
-  class SparkIMain(
-  initialSettings: Settings,
-  val out: JPrintWriter,
-  propagateExceptions: Boolean = false)
-extends SparkImports with Logging { imain =
-
-val conf = new SparkConf()
-
-val SPARK_DEBUG_REPL: Boolean = (System.getenv(SPARK_DEBUG_REPL) == 1)
-/** Local directory to save .class files too */
-lazy val outputDir = {
-  val tmp = System.getProperty(java.io.tmpdir)
-  val rootDir = conf.get(spark.repl.classdir,  tmp)
-  Utils.createTempDir(rootDir)
-}
-if (SPARK_DEBUG_REPL) {
-  echo(Output directory:  + outputDir)
-}
-
-val virtualDirectory  = new 
PlainFile(outputDir) // directory for classfiles
-/** Jetty server that will serve our classes to worker nodes */
-val classServerPort   = 
conf.getInt(spark.replClassServer.port, 0)
-val classServer   = new 
HttpServer(outputDir, new SecurityManager(conf), classServerPort,

[01/20] spark git commit: Changed scripts to ignore target.

2014-11-07 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/scala-2.11-prashant [created] fd849b0b4


Changed scripts to ignore target.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3065ecd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3065ecd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3065ecd

Branch: refs/heads/scala-2.11-prashant
Commit: d3065ecda919a2fc929d5cbd838b8ed3210c61d4
Parents: 899fc3c
Author: Prashant Sharma prashan...@imaginea.com
Authored: Thu Nov 6 08:30:51 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 dev/change-version-to-2.10.sh | 2 +-
 dev/change-version-to-2.11.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d3065ecd/dev/change-version-to-2.10.sh
--
diff --git a/dev/change-version-to-2.10.sh b/dev/change-version-to-2.10.sh
index ca48e6f..152df60 100755
--- a/dev/change-version-to-2.10.sh
+++ b/dev/change-version-to-2.10.sh
@@ -17,4 +17,4 @@
 # limitations under the License.
 #
 
-find -name 'pom.xml' -exec sed -i 's|\(artifactId.*\)_2.11|\1_2.10|g' {}  \;
+find \( -name 'pom.xml' -a -not -path 'target' \) -exec sed -i 
's|\(artifactId.*\)_2.11|\1_2.10|g' {}  \;

http://git-wip-us.apache.org/repos/asf/spark/blob/d3065ecd/dev/change-version-to-2.11.sh
--
diff --git a/dev/change-version-to-2.11.sh b/dev/change-version-to-2.11.sh
index 07056b1..b52d5f7 100755
--- a/dev/change-version-to-2.11.sh
+++ b/dev/change-version-to-2.11.sh
@@ -17,4 +17,4 @@
 # limitations under the License.
 #
 
-find -name 'pom.xml' -exec sed -i 's|\(artifactId.*\)_2.10|\1_2.11|g' {}  \;
+find \( -name 'pom.xml' -a -not -path 'target' \) -exec sed -i 
's|\(artifactId.*\)_2.10|\1_2.11|g' {}  \;


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[12/20] spark git commit: Code review

2014-11-07 Thread pwendell

Code review


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b836d80
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b836d80
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b836d80

Branch: refs/heads/scala-2.11-prashant
Commit: 1b836d80a0d429a1518da41c034f2a511a6545a2
Parents: 4af9de7
Author: Prashant Sharma prashan...@imaginea.com
Authored: Fri Oct 24 09:47:48 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 bin/compute-classpath.sh   | 14 --
 conf/spark-env.sh.template |  3 ---
 core/pom.xml   |  4 
 examples/pom.xml   |  2 ++
 project/SparkBuild.scala   |  2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/bin/compute-classpath.sh
--
diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh
index 993d260..86dd4b2 100755
--- a/bin/compute-classpath.sh
+++ b/bin/compute-classpath.sh
@@ -20,8 +20,6 @@
 # This script computes Spark's classpath and prints it to stdout; it's used by 
both the run
 # script and the ExecutorRunner in standalone cluster mode.
 
-SCALA_VERSION=${SCALA_VERSION:-2.10}
-
 # Figure out where Spark is installed
 FWDIR=$(cd `dirname $0`/..; pwd)
 
@@ -36,6 +34,18 @@ else
   CLASSPATH=$CLASSPATH:$FWDIR/conf
 fi
 
+if [ -z $SCALA_VERSION ]; then
+
+ASSEMBLY_DIR2=$FWDIR/assembly/target/scala-2.11
+# if scala-2.11 directory for assembly exists,  we use that. Otherwise we 
default to 
+# scala 2.10.
+if [ -d $ASSEMBLY_DIR2 ]; then
+SCALA_VERSION=2.11
+else
+SCALA_VERSION=2.10
+fi
+fi
+
 ASSEMBLY_DIR=$FWDIR/assembly/target/scala-$SCALA_VERSION
 
 if [ -n $JAVA_HOME ]; then

http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/conf/spark-env.sh.template
--
diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index 6a5622e..f8ffbf6 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -3,9 +3,6 @@
 # This file is sourced when running various Spark programs.
 # Copy it as spark-env.sh and edit that to configure Spark for your site.
 
-# Uncomment this if you plan to use scala 2.11
-# SCALA_VERSION=2.11
-
 # Options read when launching programs locally with 
 # ./bin/run-example or ./bin/spark-submit
 # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files

http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 624aa96..afa8c8c 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -297,10 +297,6 @@
   scopetest/scope
 /dependency
 dependency
-  groupIdcom.twitter/groupId
-  artifactIdchill-java/artifactId
-/dependency
-dependency
   groupIdasm/groupId
   artifactIdasm/artifactId
   scopetest/scope

http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index e80c637..0cc15e5 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -282,6 +282,8 @@
   /dependencies
 /profile
 profile
+  !-- We add source directories specific to Scala 2.10 and 2.11 since 
some examples
+   work only in one and not the other --
   idscala-2.10/id
   activation
 activeByDefaulttrue/activeByDefault

http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 349cc27..0d8adcb 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -100,7 +100,7 @@ object SparkBuild extends PomBuild {
   conjunction with environment variable.)
   v.split((\\s+|,)).filterNot(_.isEmpty).map(_.trim.replaceAll(-P, 
)).toSeq
 }
-if(profiles.exists(_.contains(scala))) {
+if(profiles.exists(_.contains(scala-))) {
   profiles
 } else {
   println(Enabled default scala profile)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[02/20] spark git commit: MAven equivalent of setting spark.executor.extraClasspath during tests.

2014-11-07 Thread pwendell

MAven equivalent of setting spark.executor.extraClasspath during tests.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed4f6463
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed4f6463
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed4f6463

Branch: refs/heads/scala-2.11-prashant
Commit: ed4f646313b8f7224775de7072aaf4ee6c32d243
Parents: 4bcf66f
Author: Prashant Sharma prashan...@imaginea.com
Authored: Wed Nov 5 18:17:34 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 core/pom.xml | 17 ++---
 examples/pom.xml | 10 ++
 pom.xml  | 46 +-
 3 files changed, 65 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ed4f6463/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index d71f265..37970c9 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -329,14 +329,17 @@
   plugin
 groupIdorg.scalatest/groupId
 artifactIdscalatest-maven-plugin/artifactId
-configuration
-  environmentVariables
-SPARK_HOME${basedir}/../SPARK_HOME
-SPARK_TESTING1/SPARK_TESTING
-SPARK_CLASSPATH${spark.classpath}/SPARK_CLASSPATH
-  /environmentVariables
-/configuration
+version1.0/version
+executions
+  execution
+idtest/id
+goals
+  goaltest/goal
+/goals
+  /execution
+/executions
   /plugin
+
   !-- Unzip py4j so we can include its files in the jar --
   plugin
 groupIdorg.apache.maven.plugins/groupId

http://git-wip-us.apache.org/repos/asf/spark/blob/ed4f6463/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 5f9d0b5..027745a 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -185,6 +185,16 @@
 
testOutputDirectorytarget/scala-${scala.binary.version}/test-classes/testOutputDirectory
 plugins
   plugin
+groupIdorg.codehaus.gmaven/groupId
+artifactIdgmaven-plugin/artifactId
+version1.4/version
+executions
+  execution
+phasenone/phase
+  /execution
+/executions
+  /plugin
+  plugin
 groupIdorg.apache.maven.plugins/groupId
 artifactIdmaven-deploy-plugin/artifactId
 configuration

http://git-wip-us.apache.org/repos/asf/spark/blob/ed4f6463/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 1bd4970..b239356 100644
--- a/pom.xml
+++ b/pom.xml
@@ -142,9 +142,13 @@
 aws.kinesis.client.version1.1.0/aws.kinesis.client.version
 commons.httpclient.version4.2.6/commons.httpclient.version
 commons.math3.version3.1.1/commons.math3.version
-
+
test_classpath_file${project.build.directory}/spark-test-classpath.txt/test_classpath_file
 PermGen64m/PermGen
 MaxPermGen512m/MaxPermGen
+scala.version2.10.4/scala.version
+scala.binary.version2.10/scala.binary.version
+jline.version${scala.version}/jline.version
+jline.groupidorg.scala-lang/jline.groupid
   /properties
 
   repositories
@@ -961,6 +965,7 @@
   
spark.test.home${session.executionRootDirectory}/spark.test.home
   spark.testing1/spark.testing
   spark.ui.enabledfalse/spark.ui.enabled
+  
spark.executor.extraClassPath${test_classpath}/spark.executor.extraClassPath
 /systemProperties
   /configuration
   executions
@@ -1022,6 +1027,45 @@
 /pluginManagement
 
 plugins
+  !-- This plugin dumps the test classpath into a file --
+  plugin
+groupIdorg.apache.maven.plugins/groupId
+artifactIdmaven-dependency-plugin/artifactId
+version2.9/version
+executions
+  execution
+phasetest-compile/phase
+goals
+  goalbuild-classpath/goal
+/goals
+configuration
+  includeScopetest/includeScope
+  outputFile${test_classpath_file}/outputFile
+/configuration
+  /execution
+/executions
+  /plugin
+
+  !-- This plugin reads a file into maven property. And it lets us write 
groovy !! --
+  plugin
+groupIdorg.codehaus.gmaven/groupId
+artifactIdgmaven-plugin/artifactId
+version1.4/version
+executions
+  execution
+phaseprocess-test-classes/phase
+goals
+  goalexecute/goal
+/goals
+configuration
+  source
+

[08/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala
 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala
new file mode 100644
index 000..13cd2b7
--- /dev/null
+++ 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala
@@ -0,0 +1,232 @@
+// scalastyle:off
+
+/* NSC -- new Scala compiler
+ * Copyright 2005-2013 LAMP/EPFL
+ * @author  Martin Odersky
+ */
+
+package org.apache.spark.repl
+
+import scala.tools.nsc._
+import scala.tools.nsc.interpreter._
+
+import scala.collection.{ mutable, immutable }
+import scala.PartialFunction.cond
+import scala.reflect.internal.Chars
+import scala.reflect.internal.Flags._
+import scala.language.implicitConversions
+
+trait SparkMemberHandlers {
+  val intp: SparkIMain
+
+  import intp.{ Request, global, naming }
+  import global._
+  import naming._
+
+  private def codegenln(leadingPlus: Boolean, xs: String*): String = 
codegen(leadingPlus, (xs ++ Array(\n)): _*)
+  private def codegenln(xs: String*): String = codegenln(true, xs: _*)
+
+  private def codegen(xs: String*): String = codegen(true, xs: _*)
+  private def codegen(leadingPlus: Boolean, xs: String*): String = {
+val front = if (leadingPlus) +  else 
+front + (xs map string2codeQuoted mkString  + )
+  }
+  private implicit def name2string(name: Name) = name.toString
+
+  /** A traverser that finds all mentioned identifiers, i.e. things
+   *  that need to be imported.  It might return extra names.
+   */
+  private class ImportVarsTraverser extends Traverser {
+val importVars = new mutable.HashSet[Name]()
+
+override def traverse(ast: Tree) = ast match {
+  case Ident(name) =
+// XXX this is obviously inadequate but it's going to require some 
effort
+// to get right.
+if (name.toString startsWith x$) ()
+else importVars += name
+  case _= super.traverse(ast)
+}
+  }
+  private object ImportVarsTraverser {
+def apply(member: Tree) = {
+  val ivt = new ImportVarsTraverser()
+  ivt traverse member
+  ivt.importVars.toList
+}
+  }
+
+  def chooseHandler(member: Tree): MemberHandler = member match {
+case member: DefDef= new DefHandler(member)
+case member: ValDef= new ValHandler(member)
+case member: Assign= new AssignHandler(member)
+case member: ModuleDef = new ModuleHandler(member)
+case member: ClassDef  = new ClassHandler(member)
+case member: TypeDef   = new TypeAliasHandler(member)
+case member: Import= new ImportHandler(member)
+case DocDef(_, documented) = chooseHandler(documented)
+case member= new GenericHandler(member)
+  }
+
+  sealed abstract class MemberDefHandler(override val member: MemberDef) 
extends MemberHandler(member) {
+def symbol  = if (member.symbol eq null) NoSymbol else 
member.symbol
+def name: Name  = member.name
+def mods: Modifiers = member.mods
+def keyword = member.keyword
+def prettyName  = name.decode
+
+override def definesImplicit = member.mods.isImplicit
+override def definesTerm: Option[TermName] = Some(name.toTermName) filter 
(_ = name.isTermName)
+override def definesType: Option[TypeName] = Some(name.toTypeName) filter 
(_ = name.isTypeName)
+override def definedSymbols = if (symbol eq NoSymbol) Nil else List(symbol)
+  }
+
+  /** Class to handle one member among all the members included
+   *  in a single interpreter request.
+   */
+  sealed abstract class MemberHandler(val member: Tree) {
+def definesImplicit = false
+def definesValue= false
+def isLegalTopLevel = false
+
+def definesTerm = Option.empty[TermName]
+def definesType = Option.empty[TypeName]
+
+lazy val referencedNames = ImportVarsTraverser(member)
+def importedNames= List[Name]()
+def definedNames = definesTerm.toList ++ definesType.toList
+def definedOrImported= definedNames ++ importedNames
+def definedSymbols   = List[Symbol]()
+
+def extraCodeToEvaluate(req: Request): String = 
+def resultExtractionCode(req: Request): String = 
+
+private def shortName = this.getClass.toString split '.' last
+override def toString = shortName + referencedNames.mkString( (refs: , 
, , ))
+  }
+
+  class GenericHandler(member: Tree) extends MemberHandler(member)
+
+  class ValHandler(member: ValDef) extends MemberDefHandler(member) {
+val maxStringElements = 1000  // no need to mkString billions of elements
+override def definesValue = true
+
+override def resultExtractionCode(req: Request): String = {
+  val isInternal = isUserVarName(name)  req.lookupTypeOf(name) ==

[11/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

Scala 2.11 support with repl and all build changes.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4af9de7d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4af9de7d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4af9de7d

Branch: refs/heads/scala-2.11-prashant
Commit: 4af9de7dc72a809e10f4a287b509dec4ca12ae53
Parents: 48a19a6
Author: Prashant Sharma prashan...@imaginea.com
Authored: Mon Oct 20 17:43:38 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 .rat-excludes   |1 +
 assembly/pom.xml|8 +-
 bin/compute-classpath.sh|8 +-
 conf/spark-env.sh.template  |3 +
 core/pom.xml|   41 +-
 dev/change-version-to-2.10.sh   |   20 +
 dev/change-version-to-2.11.sh   |   20 +
 examples/pom.xml|  149 +-
 .../examples/streaming/JavaKafkaWordCount.java  |  113 ++
 .../examples/streaming/KafkaWordCount.scala |  102 ++
 .../examples/streaming/TwitterAlgebirdCMS.scala |  114 ++
 .../examples/streaming/TwitterAlgebirdHLL.scala |   92 ++
 .../examples/streaming/JavaKafkaWordCount.java  |  113 --
 .../examples/streaming/KafkaWordCount.scala |  102 --
 .../examples/streaming/TwitterAlgebirdCMS.scala |  114 --
 .../examples/streaming/TwitterAlgebirdHLL.scala |   92 --
 external/mqtt/pom.xml   |5 -
 pom.xml |  114 +-
 project/SparkBuild.scala|   12 +-
 project/project/SparkPluginBuild.scala  |2 +-
 repl/pom.xml|   90 +-
 .../main/scala/org/apache/spark/repl/Main.scala |   33 +
 .../apache/spark/repl/SparkCommandLine.scala|   37 +
 .../org/apache/spark/repl/SparkExprTyper.scala  |  114 ++
 .../org/apache/spark/repl/SparkHelper.scala |   22 +
 .../org/apache/spark/repl/SparkILoop.scala  | 1091 +
 .../org/apache/spark/repl/SparkILoopInit.scala  |  147 ++
 .../org/apache/spark/repl/SparkIMain.scala  | 1445 ++
 .../org/apache/spark/repl/SparkImports.scala|  238 +++
 .../spark/repl/SparkJLineCompletion.scala   |  377 +
 .../apache/spark/repl/SparkJLineReader.scala|   90 ++
 .../apache/spark/repl/SparkMemberHandlers.scala |  232 +++
 .../apache/spark/repl/SparkRunnerSettings.scala |   32 +
 .../scala/org/apache/spark/repl/ReplSuite.scala |  318 
 .../main/scala/org/apache/spark/repl/Main.scala |   85 ++
 .../org/apache/spark/repl/SparkExprTyper.scala  |   86 ++
 .../org/apache/spark/repl/SparkILoop.scala  |  966 
 .../org/apache/spark/repl/SparkIMain.scala  | 1319 
 .../org/apache/spark/repl/SparkImports.scala|  201 +++
 .../spark/repl/SparkJLineCompletion.scala   |  350 +
 .../apache/spark/repl/SparkMemberHandlers.scala |  221 +++
 .../apache/spark/repl/SparkReplReporter.scala   |   53 +
 .../scala/org/apache/spark/repl/ReplSuite.scala |  326 
 .../main/scala/org/apache/spark/repl/Main.scala |   33 -
 .../apache/spark/repl/SparkCommandLine.scala|   37 -
 .../org/apache/spark/repl/SparkExprTyper.scala  |  114 --
 .../org/apache/spark/repl/SparkHelper.scala |   22 -
 .../org/apache/spark/repl/SparkILoop.scala  | 1091 -
 .../org/apache/spark/repl/SparkILoopInit.scala  |  147 --
 .../org/apache/spark/repl/SparkIMain.scala  | 1445 --
 .../org/apache/spark/repl/SparkImports.scala|  238 ---
 .../spark/repl/SparkJLineCompletion.scala   |  377 -
 .../apache/spark/repl/SparkJLineReader.scala|   90 --
 .../apache/spark/repl/SparkMemberHandlers.scala |  232 ---
 .../apache/spark/repl/SparkRunnerSettings.scala |   32 -
 .../scala/org/apache/spark/repl/ReplSuite.scala |  318 
 sql/catalyst/pom.xml|   29 +-
 57 files changed, 8602 insertions(+), 4701 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/.rat-excludes
--
diff --git a/.rat-excludes b/.rat-excludes
index 20e3372..d8bee1f 100644
--- a/.rat-excludes
+++ b/.rat-excludes
@@ -44,6 +44,7 @@ SparkImports.scala
 SparkJLineCompletion.scala
 SparkJLineReader.scala
 SparkMemberHandlers.scala
+SparkReplReporter.scala
 sbt
 sbt-launch-lib.bash
 plugins.sbt

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 31a01e4..e592220 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -66,22 +66,22 @@
 /dependency

[18/20] spark git commit: Fixed Python Runner suite. null check should be first case in scala 2.11.

2014-11-07 Thread pwendell

Fixed Python Runner suite. null check should be first case in scala 2.11.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/29cc4d9f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/29cc4d9f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/29cc4d9f

Branch: refs/heads/scala-2.11-prashant
Commit: 29cc4d9f8d441a3fee48856433e5cbc131f1e3a1
Parents: eeed1a6
Author: Prashant Sharma prashan...@imaginea.com
Authored: Wed Nov 5 15:57:48 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/29cc4d9f/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
index af94b05..039c871 100644
--- a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
@@ -87,8 +87,8 @@ object PythonRunner {
 // Strip the URI scheme from the path
 formattedPath =
   new URI(formattedPath).getScheme match {
-case Utils.windowsDrive(d) if windows = formattedPath
 case null = formattedPath
+case Utils.windowsDrive(d) if windows = formattedPath
 case _ = new URI(formattedPath).getPath
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[14/20] spark git commit: Print an error if build for 2.10 and 2.11 is spotted.

2014-11-07 Thread pwendell

Print an error if build for 2.10 and 2.11 is spotted.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c696f394
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c696f394
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c696f394

Branch: refs/heads/scala-2.11-prashant
Commit: c696f394ce45d6899d5facabafcfa59488a728da
Parents: 99a0df1
Author: Prashant Sharma prashan...@imaginea.com
Authored: Mon Oct 27 13:59:13 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 bin/compute-classpath.sh | 10 --
 examples/pom.xml |  1 -
 2 files changed, 8 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c696f394/bin/compute-classpath.sh
--
diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh
index dea1592..108c9af 100755
--- a/bin/compute-classpath.sh
+++ b/bin/compute-classpath.sh
@@ -37,8 +37,14 @@ fi
 if [ -z $SPARK_SCALA_VERSION ]; then
 
 ASSEMBLY_DIR2=$FWDIR/assembly/target/scala-2.11
-# if scala-2.11 directory for assembly exists,  we use that. Otherwise we 
default to 
-# scala 2.10.
+ASSEMBLY_DIR1=$FWDIR/assembly/target/scala-2.10
+
+if [[ -d $ASSEMBLY_DIR2  -d $ASSEMBLY_DIR1 ]]; then
+echo -e Presence of build for both scala versions(SCALA 2.10 and 
SCALA 2.11) detected. 12
+echo -e 'Either clean one of them or, export SPARK_SCALA_VERSION=2.11 
in spark-env.sh.' 12
+exit 1
+fi
+
 if [ -d $ASSEMBLY_DIR2 ]; then
 SPARK_SCALA_VERSION=2.11
 else

http://git-wip-us.apache.org/repos/asf/spark/blob/c696f394/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 0cc15e5..5f9d0b5 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -129,7 +129,6 @@
   artifactIdjetty-server/artifactId
 /dependency
 dependency
-dependency
   groupIdorg.apache.commons/groupId
   artifactIdcommons-math3/artifactId
 /dependency


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[09/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
new file mode 100644
index 000..646c68e
--- /dev/null
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
@@ -0,0 +1,1445 @@
+// scalastyle:off
+
+/* NSC -- new Scala compiler
+ * Copyright 2005-2013 LAMP/EPFL
+ * @author  Martin Odersky
+ */
+
+package org.apache.spark.repl
+
+import java.io.File
+
+import scala.tools.nsc._
+import scala.tools.nsc.backend.JavaPlatform
+import scala.tools.nsc.interpreter._
+
+import Predef.{ println = _, _ }
+import scala.tools.nsc.util.{MergedClassPath, stringFromWriter, 
ScalaClassLoader, stackTraceString}
+import scala.reflect.internal.util._
+import java.net.URL
+import scala.sys.BooleanProp
+import io.{AbstractFile, PlainFile, VirtualDirectory}
+
+import reporters._
+import symtab.Flags
+import scala.reflect.internal.Names
+import scala.tools.util.PathResolver
+import ScalaClassLoader.URLClassLoader
+import scala.tools.nsc.util.Exceptional.unwrap
+import scala.collection.{ mutable, immutable }
+import scala.util.control.Exception.{ ultimately }
+import SparkIMain._
+import java.util.concurrent.Future
+import typechecker.Analyzer
+import scala.language.implicitConversions
+import scala.reflect.runtime.{ universe = ru }
+import scala.reflect.{ ClassTag, classTag }
+import scala.tools.reflect.StdRuntimeTags._
+import scala.util.control.ControlThrowable
+
+import org.apache.spark.{Logging, HttpServer, SecurityManager, SparkConf}
+import org.apache.spark.util.Utils
+
+// /** directory to save .class files to */
+// private class ReplVirtualDirectory(out: JPrintWriter) extends 
VirtualDirectory(((memory)), None) {
+//   private def pp(root: AbstractFile, indentLevel: Int) {
+// val spaces =  * indentLevel
+// out.println(spaces + root.name)
+// if (root.isDirectory)
+//   root.toList sortBy (_.name) foreach (x = pp(x, indentLevel + 1))
+//   }
+//   // print the contents hierarchically
+//   def show() = pp(this, 0)
+// }
+
+  /** An interpreter for Scala code.
+   *
+   *  The main public entry points are compile(), interpret(), and bind().
+   *  The compile() method loads a complete Scala file.  The interpret() method
+   *  executes one line of Scala code at the request of the user.  The bind()
+   *  method binds an object to a variable that can then be used by later
+   *  interpreted code.
+   *
+   *  The overall approach is based on compiling the requested code and then
+   *  using a Java classloader and Java reflection to run the code
+   *  and access its results.
+   *
+   *  In more detail, a single compiler instance is used
+   *  to accumulate all successfully compiled or interpreted Scala code.  To
+   *  interpret a line of code, the compiler generates a fresh object that
+   *  includes the line of code and which has public member(s) to export
+   *  all variables defined by that code.  To extract the result of an
+   *  interpreted line to show the user, a second result object is created
+   *  which imports the variables exported by the above object and then
+   *  exports members called $eval and $print. To accomodate user 
expressions
+   *  that read from variables or methods defined in previous statements, 
import
+   *  statements are used.
+   *
+   *  This interpreter shares the strengths and weaknesses of using the
+   *  full compiler-to-Java.  The main strength is that interpreted code
+   *  behaves exactly as does compiled code, including running at full speed.
+   *  The main weakness is that redefining classes and methods is not handled
+   *  properly, because rebinding at the Java level is technically difficult.
+   *
+   *  @author Moez A. Abdel-Gawad
+   *  @author Lex Spoon
+   */
+  class SparkIMain(
+  initialSettings: Settings,
+  val out: JPrintWriter,
+  propagateExceptions: Boolean = false)
+extends SparkImports with Logging { imain =
+
+val conf = new SparkConf()
+
+val SPARK_DEBUG_REPL: Boolean = (System.getenv(SPARK_DEBUG_REPL) == 1)
+/** Local directory to save .class files too */
+lazy val outputDir = {
+  val tmp = System.getProperty(java.io.tmpdir)
+  val rootDir = conf.get(spark.repl.classdir,  tmp)
+  Utils.createTempDir(rootDir)
+}
+if (SPARK_DEBUG_REPL) {
+  echo(Output directory:  + outputDir)
+}
+
+val virtualDirectory  = new 
PlainFile(outputDir) // directory for classfiles
+/** Jetty server that will serve our classes to worker nodes */
+val classServerPort   = 
conf.getInt(spark.replClassServer.port, 0)
+val classServer   = new 
HttpServer(outputDir, new

[07/20] spark git commit: Scala 2.11 support with repl and all build changes.

2014-11-07 Thread pwendell

http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git 
a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
new file mode 100644
index 000..1bb62c8
--- /dev/null
+++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala
@@ -0,0 +1,1319 @@
+/* NSC -- new Scala compiler
+ * Copyright 2005-2013 LAMP/EPFL
+ * @author  Martin Odersky
+ */
+
+package scala
+package tools.nsc
+package interpreter
+
+import PartialFunction.cond
+import scala.language.implicitConversions
+import scala.beans.BeanProperty
+import scala.collection.mutable
+import scala.concurrent.{ Future, ExecutionContext }
+import scala.reflect.runtime.{ universe = ru }
+import scala.reflect.{ ClassTag, classTag }
+import scala.reflect.internal.util.{ BatchSourceFile, SourceFile }
+import scala.tools.util.PathResolver
+import scala.tools.nsc.io.AbstractFile
+import scala.tools.nsc.typechecker.{ TypeStrings, StructuredTypeStrings }
+import scala.tools.nsc.util.{ ScalaClassLoader, stringFromReader, 
stringFromWriter, StackTraceOps }
+import scala.tools.nsc.util.Exceptional.unwrap
+import javax.script.{AbstractScriptEngine, Bindings, ScriptContext, 
ScriptEngine, ScriptEngineFactory, ScriptException, CompiledScript, Compilable}
+
+/** An interpreter for Scala code.
+  *
+  *  The main public entry points are compile(), interpret(), and bind().
+  *  The compile() method loads a complete Scala file.  The interpret() method
+  *  executes one line of Scala code at the request of the user.  The bind()
+  *  method binds an object to a variable that can then be used by later
+  *  interpreted code.
+  *
+  *  The overall approach is based on compiling the requested code and then
+  *  using a Java classloader and Java reflection to run the code
+  *  and access its results.
+  *
+  *  In more detail, a single compiler instance is used
+  *  to accumulate all successfully compiled or interpreted Scala code.  To
+  *  interpret a line of code, the compiler generates a fresh object that
+  *  includes the line of code and which has public member(s) to export
+  *  all variables defined by that code.  To extract the result of an
+  *  interpreted line to show the user, a second result object is created
+  *  which imports the variables exported by the above object and then
+  *  exports members called $eval and $print. To accomodate user 
expressions
+  *  that read from variables or methods defined in previous statements, 
import
+  *  statements are used.
+  *
+  *  This interpreter shares the strengths and weaknesses of using the
+  *  full compiler-to-Java.  The main strength is that interpreted code
+  *  behaves exactly as does compiled code, including running at full speed.
+  *  The main weakness is that redefining classes and methods is not handled
+  *  properly, because rebinding at the Java level is technically difficult.
+  *
+  *  @author Moez A. Abdel-Gawad
+  *  @author Lex Spoon
+  */
+class SparkIMain(@BeanProperty val factory: ScriptEngineFactory, 
initialSettings: Settings,
+  protected val out: JPrintWriter) extends AbstractScriptEngine with 
Compilable with SparkImports {
+  imain =
+
+  setBindings(createBindings, ScriptContext.ENGINE_SCOPE)
+  object replOutput extends ReplOutput(settings.Yreploutdir) { }
+
+  @deprecated(Use replOutput.dir instead, 2.11.0)
+  def virtualDirectory = replOutput.dir
+  // Used in a test case.
+  def showDirectory() = replOutput.show(out)
+
+  private[nsc] var printResults   = true  // whether to print 
result lines
+  private[nsc] var totalSilence   = false // whether to print 
anything
+  private var _initializeComplete = false // compiler is 
initialized
+  private var _isInitialized: Future[Boolean] = null  // set up 
initialization future
+  private var bindExceptions  = true  // whether to bind 
the lastException variable
+  private var _executionWrapper   = // code to be 
wrapped around all lines
+
+  /** We're going to go to some trouble to initialize the compiler 
asynchronously.
+*  It's critical that nothing call into it until it's been initialized or 
we will
+*  run into unrecoverable issues, but the perceived repl startup time goes
+*  through the roof if we wait for it.  So we initialize it with a future 
and
+*  use a lazy val to ensure that any attempt to use the compiler object 
waits
+*  on the future.
+*/
+  private var _classLoader: util.AbstractFileClassLoader = null
  // active classloader
+  private val _compiler: ReplGlobal = newCompiler(settings, 
reporter)   // our private compiler
+
+  def compilerClasspath: Seq[java.net.URL] = (
+if

[03/20] spark git commit: Setting test jars on executor classpath during tests from sbt.

2014-11-07 Thread pwendell

Setting test jars on executor classpath during tests from sbt.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4bcf66f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4bcf66f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4bcf66f2

Branch: refs/heads/scala-2.11-prashant
Commit: 4bcf66f2c9421f0c50e5a4578a80812aea4b8b10
Parents: 29cc4d9
Author: Prashant Sharma prashan...@imaginea.com
Authored: Wed Nov 5 16:51:09 2014 +0530
Committer: Prashant Sharma prashan...@imaginea.com
Committed: Fri Nov 7 11:22:47 2014 +0530

--
 bin/compute-classpath.sh | 6 --
 project/SparkBuild.scala | 8 +---
 2 files changed, 5 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4bcf66f2/bin/compute-classpath.sh
--
diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh
index 108c9af..14e972f 100755
--- a/bin/compute-classpath.sh
+++ b/bin/compute-classpath.sh
@@ -137,9 +137,6 @@ if [ -n $datanucleus_jars ]; then
   fi
 fi
 
-test_jars=$(find $FWDIR/lib_managed/test \( -name '*jar' -a -type f \) 
2/dev/null | \
-tr \n : | sed s/:$//g)
-
 # Add test classes if we're running from SBT or Maven with SPARK_TESTING set 
to 1
 if [[ $SPARK_TESTING == 1 ]]; then
   
CLASSPATH=$CLASSPATH:$FWDIR/core/target/scala-$SPARK_SCALA_VERSION/test-classes
@@ -151,9 +148,6 @@ if [[ $SPARK_TESTING == 1 ]]; then
   
CLASSPATH=$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SPARK_SCALA_VERSION/test-classes
   
CLASSPATH=$CLASSPATH:$FWDIR/sql/core/target/scala-$SPARK_SCALA_VERSION/test-classes
   
CLASSPATH=$CLASSPATH:$FWDIR/sql/hive/target/scala-$SPARK_SCALA_VERSION/test-classes
-  if [[ $SPARK_SCALA_VERSION == 2.11 ]]; then
-  CLASSPATH=$CLASSPATH:$test_jars
-  fi
 fi
 
 # Add hadoop conf dir if given -- otherwise FileSystem.*, etc fail !

http://git-wip-us.apache.org/repos/asf/spark/blob/4bcf66f2/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 0d8adcb..9e46a11 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -31,8 +31,8 @@ object BuildCommons {
   private val buildLocation = file(.).getAbsoluteFile.getParentFile
 
   val allProjects@Seq(bagel, catalyst, core, graphx, hive, hiveThriftServer, 
mllib, repl,
-  sql, networkCommon, networkShuffle, streaming, streamingFlumeSink, 
streamingFlume, streamingKafka,
-  streamingMqtt, streamingTwitter, streamingZeromq) =
+sql, networkCommon, networkShuffle, streaming, streamingFlumeSink, 
streamingFlume, streamingKafka,
+streamingMqtt, streamingTwitter, streamingZeromq) =
 Seq(bagel, catalyst, core, graphx, hive, hive-thriftserver, 
mllib, repl,
   sql, network-common, network-shuffle, streaming, 
streaming-flume-sink,
   streaming-flume, streaming-kafka, streaming-mqtt, 
streaming-twitter,
@@ -361,8 +361,10 @@ object TestSettings {
   .map { case (k,v) = s-D$k=$v }.toSeq,
 javaOptions in Test ++= -Xmx3g -XX:PermSize=128M -XX:MaxNewSize=256m 
-XX:MaxPermSize=1g
   .split( ).toSeq,
+javaOptions in Test += 
+  -Dspark.executor.extraClassPath= + (fullClasspath in Test).value.files.
+  map(_.getAbsolutePath).mkString(:).stripSuffix(:),
 javaOptions += -Xmx3g,
-retrievePattern := [conf]/[artifact](-[revision]).[ext],
 // Show full stack trace and duration in test cases.
 testOptions in Test += Tests.Argument(-oDF),
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v, -a),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[19/20] spark git commit: Testing new Hive version with shaded jline

2014-11-07 Thread pwendell

Testing new Hive version with shaded jline


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37d972c3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37d972c3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37d972c3

Branch: refs/heads/scala-2.11-prashant
Commit: 37d972c35b4752f76be473ee82a0f1ac995bd798
Parents: d3065ec
Author: Patrick Wendell pwend...@gmail.com
Authored: Fri Nov 7 11:09:52 2014 -0800
Committer: Patrick Wendell pwend...@gmail.com
Committed: Fri Nov 7 11:13:34 2014 -0800

--
 pom.xml | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/37d972c3/pom.xml
--
diff --git a/pom.xml b/pom.xml
index b239356..3053e7c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -126,7 +126,7 @@
 flume.version1.4.0/flume.version
 zookeeper.version3.4.5/zookeeper.version
 !-- Version used in Maven Hive dependency --
-hive.version0.13.1a/hive.version
+hive.version0.13.1b/hive.version
 !-- Version used for internal directory structure --
 hive.version.short0.13.1/hive.version.short
 derby.version10.10.1.1/derby.version
@@ -231,22 +231,10 @@
   /snapshots
 /repository
 repository
-  !-- This is temporarily included to fix issues with Hive 0.12 --
+  !-- This is temporarily included to fix issues with Hive 0.13 --
   idspark-staging/id
   nameSpring Staging Repository/name
-  
urlhttps://oss.sonatype.org/content/repositories/orgspark-project-1085/url
-  releases
-enabledtrue/enabled
-  /releases
-  snapshots
-enabledfalse/enabled
-  /snapshots
-/repository
-repository
-  !-- This is temporarily included to fix issues with Hive 0.13 --
-  idspark-staging-hive13/id
-  nameSpring Staging Repository Hive 13/name
-  
urlhttps://oss.sonatype.org/content/repositories/orgspark-project-1089//url
+  
urlhttps://oss.sonatype.org/content/repositories/orgspark-project-1090/url
   releases
 enabledtrue/enabled
   /releases
@@ -1401,7 +1389,7 @@
 activeByDefaultfalse/activeByDefault
   /activation
   properties
-hive.version0.13.1a/hive.version
+hive.version0.13.1b/hive.version
 hive.version.short0.13.1/hive.version.short
 derby.version10.10.1.1/derby.version
   /properties


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SQL][DOC][Minor] Spark SQL Hive now support dynamic partitioning

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 d6262fa05 - e5b8cea7e


[SQL][DOC][Minor] Spark SQL Hive now support dynamic partitioning

Author: wangfei wangf...@huawei.com

Closes #3127 from scwf/patch-9 and squashes the following commits:

e39a560 [wangfei] now support dynamic partitioning

(cherry picked from commit 636d7bcc96b912f5b5caa91110cd55b55fa38ad8)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e5b8cea7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e5b8cea7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e5b8cea7

Branch: refs/heads/branch-1.2
Commit: e5b8cea7ef219be33df1db77a0921885833a4254
Parents: d6262fa
Author: wangfei wangf...@huawei.com
Authored: Fri Nov 7 11:43:35 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:43:49 2014 -0800

--
 docs/sql-programming-guide.md | 1 -
 1 file changed, 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e5b8cea7/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index e399fec..ffcce2c 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1059,7 +1059,6 @@ in Hive deployments.
 
 **Major Hive Features**
 
-* Spark SQL does not currently support inserting to tables using dynamic 
partitioning.
 * Tables with buckets: bucket is the hash partitioning within a Hive table 
partition. Spark SQL
   doesn't support buckets yet.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 636d7bcc9 - 86e9eaa3f


[SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version

This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in 
the assembly jar to inspect Spark version. Currently, when built with Maven, 
the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 
MANIFEST.MF, probably because of the assembly/shading tricks.

Another related PR is #3103, which tries to fix the MANIFEST issue.

Author: Cheng Lian l...@databricks.com

Closes #3105 from liancheng/spark-4225 and squashes the following commits:

d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86e9eaa3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86e9eaa3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86e9eaa3

Branch: refs/heads/master
Commit: 86e9eaa3f0ec23cb38bce67585adb2d5f484f4ee
Parents: 636d7bc
Author: Cheng Lian l...@databricks.com
Authored: Fri Nov 7 11:45:25 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:45:25 2014 -0800

--
 .../scala/org/apache/spark/util/Utils.scala | 24 ++--
 .../hive/thriftserver/SparkSQLCLIService.scala  | 12 --
 2 files changed, 12 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/86e9eaa3/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index a14d612..6b85c03 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -21,10 +21,8 @@ import java.io._
 import java.lang.management.ManagementFactory
 import java.net._
 import java.nio.ByteBuffer
-import java.util.jar.Attributes.Name
-import java.util.{Properties, Locale, Random, UUID}
-import java.util.concurrent.{ThreadFactory, ConcurrentHashMap, Executors, 
ThreadPoolExecutor}
-import java.util.jar.{Manifest = JarManifest}
+import java.util.concurrent.{ConcurrentHashMap, Executors, ThreadFactory, 
ThreadPoolExecutor}
+import java.util.{Locale, Properties, Random, UUID}
 
 import scala.collection.JavaConversions._
 import scala.collection.Map
@@ -38,11 +36,11 @@ import com.google.common.io.{ByteStreams, Files}
 import com.google.common.util.concurrent.ThreadFactoryBuilder
 import org.apache.commons.lang3.SystemUtils
 import org.apache.hadoop.conf.Configuration
-import org.apache.log4j.PropertyConfigurator
 import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
+import org.apache.log4j.PropertyConfigurator
 import org.eclipse.jetty.util.MultiException
 import org.json4s._
-import tachyon.client.{TachyonFile,TachyonFS}
+import tachyon.client.{TachyonFS, TachyonFile}
 
 import org.apache.spark._
 import org.apache.spark.deploy.SparkHadoopUtil
@@ -352,8 +350,8 @@ private[spark] object Utils extends Logging {
* Download a file to target directory. Supports fetching the file in a 
variety of ways,
* including HTTP, HDFS and files on a standard filesystem, based on the URL 
parameter.
*
-   * If `useCache` is true, first attempts to fetch the file to a local cache 
that's shared 
-   * across executors running the same application. `useCache` is used mainly 
for 
+   * If `useCache` is true, first attempts to fetch the file to a local cache 
that's shared
+   * across executors running the same application. `useCache` is used mainly 
for
* the executors, and not in local mode.
*
* Throws SparkException if the target file already exists and has different 
contents than
@@ -400,7 +398,7 @@ private[spark] object Utils extends Logging {
 } else {
   doFetchFile(url, targetDir, fileName, conf, securityMgr, hadoopConf)
 }
-
+
 // Decompress the file if it's a .tar or .tar.gz
 if (fileName.endsWith(.tar.gz) || fileName.endsWith(.tgz)) {
   logInfo(Untarring  + fileName)
@@ -1776,13 +1774,6 @@ private[spark] object Utils extends Logging {
 s$libraryPathEnvName=$libraryPath$ampersand
   }
 
-  lazy val sparkVersion =
-SparkContext.jarOfObject(this).map { path =
-  val manifestUrl = new URL(sjar:file:$path!/META-INF/MANIFEST.MF)
-  val manifest = new JarManifest(manifestUrl.openStream())
-  manifest.getMainAttributes.getValue(Name.IMPLEMENTATION_VERSION)
-}.getOrElse(Unknown)
-
   /**
* Return the value of a config either through the SparkConf or the Hadoop 
configuration
* if this is Yarn mode. In the latter case, this defaults to the value set 
through SparkConf
@@ -1796,7 +1787,6 @@ private[spark]

spark git commit: [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 e5b8cea7e - 2cd8e3e2b


[SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version

This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in 
the assembly jar to inspect Spark version. Currently, when built with Maven, 
the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 
MANIFEST.MF, probably because of the assembly/shading tricks.

Another related PR is #3103, which tries to fix the MANIFEST issue.

Author: Cheng Lian l...@databricks.com

Closes #3105 from liancheng/spark-4225 and squashes the following commits:

d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version

(cherry picked from commit 86e9eaa3f0ec23cb38bce67585adb2d5f484f4ee)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cd8e3e2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cd8e3e2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cd8e3e2

Branch: refs/heads/branch-1.2
Commit: 2cd8e3e2b00c6191bccfb70743df7a4c9ffd98b2
Parents: e5b8cea
Author: Cheng Lian l...@databricks.com
Authored: Fri Nov 7 11:45:25 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:46:27 2014 -0800

--
 .../scala/org/apache/spark/util/Utils.scala | 24 ++--
 .../hive/thriftserver/SparkSQLCLIService.scala  | 12 --
 2 files changed, 12 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2cd8e3e2/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index a14d612..6b85c03 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -21,10 +21,8 @@ import java.io._
 import java.lang.management.ManagementFactory
 import java.net._
 import java.nio.ByteBuffer
-import java.util.jar.Attributes.Name
-import java.util.{Properties, Locale, Random, UUID}
-import java.util.concurrent.{ThreadFactory, ConcurrentHashMap, Executors, 
ThreadPoolExecutor}
-import java.util.jar.{Manifest = JarManifest}
+import java.util.concurrent.{ConcurrentHashMap, Executors, ThreadFactory, 
ThreadPoolExecutor}
+import java.util.{Locale, Properties, Random, UUID}
 
 import scala.collection.JavaConversions._
 import scala.collection.Map
@@ -38,11 +36,11 @@ import com.google.common.io.{ByteStreams, Files}
 import com.google.common.util.concurrent.ThreadFactoryBuilder
 import org.apache.commons.lang3.SystemUtils
 import org.apache.hadoop.conf.Configuration
-import org.apache.log4j.PropertyConfigurator
 import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
+import org.apache.log4j.PropertyConfigurator
 import org.eclipse.jetty.util.MultiException
 import org.json4s._
-import tachyon.client.{TachyonFile,TachyonFS}
+import tachyon.client.{TachyonFS, TachyonFile}
 
 import org.apache.spark._
 import org.apache.spark.deploy.SparkHadoopUtil
@@ -352,8 +350,8 @@ private[spark] object Utils extends Logging {
* Download a file to target directory. Supports fetching the file in a 
variety of ways,
* including HTTP, HDFS and files on a standard filesystem, based on the URL 
parameter.
*
-   * If `useCache` is true, first attempts to fetch the file to a local cache 
that's shared 
-   * across executors running the same application. `useCache` is used mainly 
for 
+   * If `useCache` is true, first attempts to fetch the file to a local cache 
that's shared
+   * across executors running the same application. `useCache` is used mainly 
for
* the executors, and not in local mode.
*
* Throws SparkException if the target file already exists and has different 
contents than
@@ -400,7 +398,7 @@ private[spark] object Utils extends Logging {
 } else {
   doFetchFile(url, targetDir, fileName, conf, securityMgr, hadoopConf)
 }
-
+
 // Decompress the file if it's a .tar or .tar.gz
 if (fileName.endsWith(.tar.gz) || fileName.endsWith(.tgz)) {
   logInfo(Untarring  + fileName)
@@ -1776,13 +1774,6 @@ private[spark] object Utils extends Logging {
 s$libraryPathEnvName=$libraryPath$ampersand
   }
 
-  lazy val sparkVersion =
-SparkContext.jarOfObject(this).map { path =
-  val manifestUrl = new URL(sjar:file:$path!/META-INF/MANIFEST.MF)
-  val manifest = new JarManifest(manifestUrl.openStream())
-  manifest.getMainAttributes.getValue(Name.IMPLEMENTATION_VERSION)
-}.getOrElse(Unknown)
-
   /**
* Return the value of a config either through the SparkConf or the Hadoop 
configuration

spark git commit: [SQL] Support ScalaReflection of schema in different universes

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 86e9eaa3f - 8154ed7df


[SQL] Support ScalaReflection of schema in different universes

Author: Michael Armbrust mich...@databricks.com

Closes #3096 from marmbrus/reflectionContext and squashes the following commits:

adc221f [Michael Armbrust] Support ScalaReflection of schema in different 
universes


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8154ed7d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8154ed7d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8154ed7d

Branch: refs/heads/master
Commit: 8154ed7df6c5407e638f465d3bd86b43f36216ef
Parents: 86e9eaa
Author: Michael Armbrust mich...@databricks.com
Authored: Fri Nov 7 11:51:20 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:51:20 2014 -0800

--
 .../spark/sql/catalyst/ScalaReflection.scala  | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8154ed7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 9cda373..71034c2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -26,14 +26,26 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LocalRelation
 import org.apache.spark.sql.catalyst.types._
 import org.apache.spark.sql.catalyst.types.decimal.Decimal
 
+
 /**
- * Provides experimental support for generating catalyst schemas for scala 
objects.
+ * A default version of ScalaReflection that uses the runtime universe.
  */
-object ScalaReflection {
+object ScalaReflection extends ScalaReflection {
+  val universe: scala.reflect.runtime.universe.type = 
scala.reflect.runtime.universe
+}
+
+/**
+ * Support for generating catalyst schemas for scala objects.
+ */
+trait ScalaReflection {
+  /** The universe we work in (runtime or macro) */
+  val universe: scala.reflect.api.Universe
+
+  import universe._
+
   // The Predef.Map is scala.collection.immutable.Map.
   // Since the map values can be mutable, we explicitly import 
scala.collection.Map at here.
   import scala.collection.Map
-  import scala.reflect.runtime.universe._
 
   case class Schema(dataType: DataType, nullable: Boolean)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SQL] Support ScalaReflection of schema in different universes

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 2cd8e3e2b - f1f1ae418


[SQL] Support ScalaReflection of schema in different universes

Author: Michael Armbrust mich...@databricks.com

Closes #3096 from marmbrus/reflectionContext and squashes the following commits:

adc221f [Michael Armbrust] Support ScalaReflection of schema in different 
universes

(cherry picked from commit 8154ed7df6c5407e638f465d3bd86b43f36216ef)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f1f1ae41
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f1f1ae41
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f1f1ae41

Branch: refs/heads/branch-1.2
Commit: f1f1ae418031957256e7dac896e29d64c81bf1a4
Parents: 2cd8e3e
Author: Michael Armbrust mich...@databricks.com
Authored: Fri Nov 7 11:51:20 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:51:33 2014 -0800

--
 .../spark/sql/catalyst/ScalaReflection.scala  | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f1f1ae41/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 9cda373..71034c2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -26,14 +26,26 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LocalRelation
 import org.apache.spark.sql.catalyst.types._
 import org.apache.spark.sql.catalyst.types.decimal.Decimal
 
+
 /**
- * Provides experimental support for generating catalyst schemas for scala 
objects.
+ * A default version of ScalaReflection that uses the runtime universe.
  */
-object ScalaReflection {
+object ScalaReflection extends ScalaReflection {
+  val universe: scala.reflect.runtime.universe.type = 
scala.reflect.runtime.universe
+}
+
+/**
+ * Support for generating catalyst schemas for scala objects.
+ */
+trait ScalaReflection {
+  /** The universe we work in (runtime or macro) */
+  val universe: scala.reflect.api.Universe
+
+  import universe._
+
   // The Predef.Map is scala.collection.immutable.Map.
   // Since the map values can be mutable, we explicitly import 
scala.collection.Map at here.
   import scala.collection.Map
-  import scala.reflect.runtime.universe._
 
   case class Schema(dataType: DataType, nullable: Boolean)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SQL] Modify keyword val location according to ordering

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 8154ed7df - 68609c51a


[SQL] Modify keyword val location according to ordering

'DOUBLE' should be moved before 'ELSE' according to the ordering convension

Author: Jacky Li jacky.li...@gmail.com

Closes #3080 from jackylk/patch-5 and squashes the following commits:

3c11df7 [Jacky Li] [SQL] Modify keyword val location according to ordering


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68609c51
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68609c51
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68609c51

Branch: refs/heads/master
Commit: 68609c51ad1ab2def302df3c4a1c0bc1ec6e1075
Parents: 8154ed7
Author: Jacky Li jacky.li...@gmail.com
Authored: Fri Nov 7 11:52:08 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:52:08 2014 -0800

--
 .../src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/68609c51/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
index 5e613e0..affef27 100755
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
@@ -55,10 +55,10 @@ class SqlParser extends AbstractSparkSQLParser {
   protected val DECIMAL = Keyword(DECIMAL)
   protected val DESC = Keyword(DESC)
   protected val DISTINCT = Keyword(DISTINCT)
+  protected val DOUBLE = Keyword(DOUBLE)
   protected val ELSE = Keyword(ELSE)
   protected val END = Keyword(END)
   protected val EXCEPT = Keyword(EXCEPT)
-  protected val DOUBLE = Keyword(DOUBLE)
   protected val FALSE = Keyword(FALSE)
   protected val FIRST = Keyword(FIRST)
   protected val FROM = Keyword(FROM)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 68609c51a - 14c54f187


[SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators

Following description is quoted from JIRA:

When I issue a hql query against a HiveContext where my predicate uses a column 
of string type with one of LT, LTE, GT, or GTE operator, I get the following 
error:
scala.MatchError: StringType (of class 
org.apache.spark.sql.catalyst.types.StringType$)
Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType 
is absent from the corresponding functions for creating these filters.
To reproduce, in a Hive 0.13.1 shell, I created the following table (at a 
specified DB):

create table sparkbug (
id int,
event string
) stored as parquet;

Insert some sample data:

insert into table sparkbug select 1, '2011-06-18' from some table limit 1;
insert into table sparkbug select 2, '2012-01-01' from some table limit 1;

Launch a spark shell and create a HiveContext to the metastore where the table 
above is located.

import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.setConf(spark.sql.shuffle.partitions, 10)
hc.setConf(spark.sql.hive.convertMetastoreParquet, true)
hc.setConf(spark.sql.parquet.compression.codec, snappy)
import hc._
hc.hql(select * from db.sparkbug where event = '2011-12-01')

A scala.MatchError will appear in the output.

Author: Kousuke Saruta saru...@oss.nttdata.co.jp

Closes #3083 from sarutak/SPARK-4213 and squashes the following commits:

4ab6e56 [Kousuke Saruta] WIP
b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark 
into SPARK-4213
9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/14c54f18
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/14c54f18
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/14c54f18

Branch: refs/heads/master
Commit: 14c54f1876fcf91b5c10e80be2df5421c7328557
Parents: 68609c5
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Authored: Fri Nov 7 11:56:40 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:56:40 2014 -0800

--
 .../spark/sql/parquet/ParquetFilters.scala  | 335 ++-
 .../spark/sql/parquet/ParquetQuerySuite.scala   |  40 +++
 2 files changed, 364 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/14c54f18/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
index 517a5cf..1e67799 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
@@ -18,13 +18,15 @@
 package org.apache.spark.sql.parquet
 
 import java.nio.ByteBuffer
+import java.sql.{Date, Timestamp}
 
 import org.apache.hadoop.conf.Configuration
 
+import parquet.common.schema.ColumnPath
 import parquet.filter2.compat.FilterCompat
 import parquet.filter2.compat.FilterCompat._
-import parquet.filter2.predicate.FilterPredicate
-import parquet.filter2.predicate.FilterApi
+import parquet.filter2.predicate.Operators.{Column, SupportsLtGt}
+import parquet.filter2.predicate.{FilterApi, FilterPredicate}
 import parquet.filter2.predicate.FilterApi._
 import parquet.io.api.Binary
 import parquet.column.ColumnReader
@@ -33,9 +35,11 @@ import com.google.common.io.BaseEncoding
 
 import org.apache.spark.SparkEnv
 import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.catalyst.types.decimal.Decimal
 import org.apache.spark.sql.catalyst.expressions.{Predicate = 
CatalystPredicate}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.execution.SparkSqlSerializer
+import org.apache.spark.sql.parquet.ParquetColumns._
 
 private[sql] object ParquetFilters {
   val PARQUET_FILTER_DATA = org.apache.spark.sql.parquet.row.filter
@@ -50,15 +54,25 @@ private[sql] object ParquetFilters {
 if (filters.length  0) FilterCompat.get(filters.reduce(FilterApi.and)) 
else null
   }
 
-  def createFilter(expression: Expression): Option[CatalystFilter] ={
+  def createFilter(expression: Expression): Option[CatalystFilter] = {
 def createEqualityFilter(
 name: String,
 literal: Literal,
 predicate: CatalystPredicate) = literal.dataType match {
   case BooleanType =
-ComparisonFilter.createBooleanFilter(
+

spark git commit: [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 51ef8ab8e - d530c3952


[SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators

Following description is quoted from JIRA:

When I issue a hql query against a HiveContext where my predicate uses a column 
of string type with one of LT, LTE, GT, or GTE operator, I get the following 
error:
scala.MatchError: StringType (of class 
org.apache.spark.sql.catalyst.types.StringType$)
Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType 
is absent from the corresponding functions for creating these filters.
To reproduce, in a Hive 0.13.1 shell, I created the following table (at a 
specified DB):

create table sparkbug (
id int,
event string
) stored as parquet;

Insert some sample data:

insert into table sparkbug select 1, '2011-06-18' from some table limit 1;
insert into table sparkbug select 2, '2012-01-01' from some table limit 1;

Launch a spark shell and create a HiveContext to the metastore where the table 
above is located.

import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.setConf(spark.sql.shuffle.partitions, 10)
hc.setConf(spark.sql.hive.convertMetastoreParquet, true)
hc.setConf(spark.sql.parquet.compression.codec, snappy)
import hc._
hc.hql(select * from db.sparkbug where event = '2011-12-01')

A scala.MatchError will appear in the output.

Author: Kousuke Saruta saru...@oss.nttdata.co.jp

Closes #3083 from sarutak/SPARK-4213 and squashes the following commits:

4ab6e56 [Kousuke Saruta] WIP
b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark 
into SPARK-4213
9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings

(cherry picked from commit 14c54f1876fcf91b5c10e80be2df5421c7328557)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d530c395
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d530c395
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d530c395

Branch: refs/heads/branch-1.2
Commit: d530c3952131b29fd4d7a3e54496bfe634517af1
Parents: 51ef8ab
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Authored: Fri Nov 7 11:56:40 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 11:57:12 2014 -0800

--
 .../spark/sql/parquet/ParquetFilters.scala  | 335 ++-
 .../spark/sql/parquet/ParquetQuerySuite.scala   |  40 +++
 2 files changed, 364 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d530c395/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
index 517a5cf..1e67799 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
@@ -18,13 +18,15 @@
 package org.apache.spark.sql.parquet
 
 import java.nio.ByteBuffer
+import java.sql.{Date, Timestamp}
 
 import org.apache.hadoop.conf.Configuration
 
+import parquet.common.schema.ColumnPath
 import parquet.filter2.compat.FilterCompat
 import parquet.filter2.compat.FilterCompat._
-import parquet.filter2.predicate.FilterPredicate
-import parquet.filter2.predicate.FilterApi
+import parquet.filter2.predicate.Operators.{Column, SupportsLtGt}
+import parquet.filter2.predicate.{FilterApi, FilterPredicate}
 import parquet.filter2.predicate.FilterApi._
 import parquet.io.api.Binary
 import parquet.column.ColumnReader
@@ -33,9 +35,11 @@ import com.google.common.io.BaseEncoding
 
 import org.apache.spark.SparkEnv
 import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.catalyst.types.decimal.Decimal
 import org.apache.spark.sql.catalyst.expressions.{Predicate = 
CatalystPredicate}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.execution.SparkSqlSerializer
+import org.apache.spark.sql.parquet.ParquetColumns._
 
 private[sql] object ParquetFilters {
   val PARQUET_FILTER_DATA = org.apache.spark.sql.parquet.row.filter
@@ -50,15 +54,25 @@ private[sql] object ParquetFilters {
 if (filters.length  0) FilterCompat.get(filters.reduce(FilterApi.and)) 
else null
   }
 
-  def createFilter(expression: Expression): Option[CatalystFilter] ={
+  def createFilter(expression: Expression): Option[CatalystFilter] = {
 def createEqualityFilter(
 name: String,
 literal: Literal,
 predicate:

spark git commit: [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 14c54f187 - 60ab80f50


[SPARK-4272] [SQL] Add more unwrapper functions for primitive type in 
TableReader

Currently, the data unwrap only support couple of primitive types, not all, 
it will not cause exception, but may get some performance in table scanning for 
the type like binary, date, timestamp, decimal etc.

Author: Cheng Hao hao.ch...@intel.com

Closes #3136 from chenghao-intel/table_reader and squashes the following 
commits:

fffb729 [Cheng Hao] fix bug for retrieving the timestamp object
e9c97a4 [Cheng Hao] Add more unwrapper functions for primitive type in 
TableReader


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60ab80f5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60ab80f5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60ab80f5

Branch: refs/heads/master
Commit: 60ab80f501b8384ddf48a9ac0ba0c2b9eb548b28
Parents: 14c54f1
Author: Cheng Hao hao.ch...@intel.com
Authored: Fri Nov 7 12:15:53 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:15:53 2014 -0800

--
 .../org/apache/spark/sql/hive/HiveInspectors.scala   |  4 
 .../org/apache/spark/sql/hive/TableReader.scala  | 15 +++
 2 files changed, 15 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60ab80f5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
index 58815da..bdc7e1d 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
@@ -119,10 +119,6 @@ private[hive] trait HiveInspectors {
* Wraps with Hive types based on object inspector.
* TODO: Consolidate all hive OI/data interface code.
*/
-  /**
-   * Wraps with Hive types based on object inspector.
-   * TODO: Consolidate all hive OI/data interface code.
-   */
   protected def wrapperFor(oi: ObjectInspector): Any = Any = oi match {
 case _: JavaHiveVarcharObjectInspector =
   (o: Any) = new HiveVarchar(o.asInstanceOf[String], 
o.asInstanceOf[String].size)

http://git-wip-us.apache.org/repos/asf/spark/blob/60ab80f5/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
index e49f095..f60bc37 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
@@ -290,6 +290,21 @@ private[hive] object HadoopTableReader extends 
HiveInspectors {
   (value: Any, row: MutableRow, ordinal: Int) = row.setFloat(ordinal, 
oi.get(value))
 case oi: DoubleObjectInspector =
   (value: Any, row: MutableRow, ordinal: Int) = 
row.setDouble(ordinal, oi.get(value))
+case oi: HiveVarcharObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.setString(ordinal, oi.getPrimitiveJavaObject(value).getValue)
+case oi: HiveDecimalObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, HiveShim.toCatalystDecimal(oi, value))
+case oi: TimestampObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, oi.getPrimitiveJavaObject(value).clone())
+case oi: DateObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, oi.getPrimitiveJavaObject(value))
+case oi: BinaryObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, oi.getPrimitiveJavaObject(value))
 case oi =
   (value: Any, row: MutableRow, ordinal: Int) = row(ordinal) = 
unwrap(value, oi)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 d530c3952 - ff1a08256


[SPARK-4272] [SQL] Add more unwrapper functions for primitive type in 
TableReader

Currently, the data unwrap only support couple of primitive types, not all, 
it will not cause exception, but may get some performance in table scanning for 
the type like binary, date, timestamp, decimal etc.

Author: Cheng Hao hao.ch...@intel.com

Closes #3136 from chenghao-intel/table_reader and squashes the following 
commits:

fffb729 [Cheng Hao] fix bug for retrieving the timestamp object
e9c97a4 [Cheng Hao] Add more unwrapper functions for primitive type in 
TableReader

(cherry picked from commit 60ab80f501b8384ddf48a9ac0ba0c2b9eb548b28)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff1a0825
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff1a0825
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff1a0825

Branch: refs/heads/branch-1.2
Commit: ff1a0825637690b3fce780d4dcaad68dce382fb9
Parents: d530c39
Author: Cheng Hao hao.ch...@intel.com
Authored: Fri Nov 7 12:15:53 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:16:18 2014 -0800

--
 .../org/apache/spark/sql/hive/HiveInspectors.scala   |  4 
 .../org/apache/spark/sql/hive/TableReader.scala  | 15 +++
 2 files changed, 15 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ff1a0825/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
index 58815da..bdc7e1d 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
@@ -119,10 +119,6 @@ private[hive] trait HiveInspectors {
* Wraps with Hive types based on object inspector.
* TODO: Consolidate all hive OI/data interface code.
*/
-  /**
-   * Wraps with Hive types based on object inspector.
-   * TODO: Consolidate all hive OI/data interface code.
-   */
   protected def wrapperFor(oi: ObjectInspector): Any = Any = oi match {
 case _: JavaHiveVarcharObjectInspector =
   (o: Any) = new HiveVarchar(o.asInstanceOf[String], 
o.asInstanceOf[String].size)

http://git-wip-us.apache.org/repos/asf/spark/blob/ff1a0825/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
index e49f095..f60bc37 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
@@ -290,6 +290,21 @@ private[hive] object HadoopTableReader extends 
HiveInspectors {
   (value: Any, row: MutableRow, ordinal: Int) = row.setFloat(ordinal, 
oi.get(value))
 case oi: DoubleObjectInspector =
   (value: Any, row: MutableRow, ordinal: Int) = 
row.setDouble(ordinal, oi.get(value))
+case oi: HiveVarcharObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.setString(ordinal, oi.getPrimitiveJavaObject(value).getValue)
+case oi: HiveDecimalObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, HiveShim.toCatalystDecimal(oi, value))
+case oi: TimestampObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, oi.getPrimitiveJavaObject(value).clone())
+case oi: DateObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, oi.getPrimitiveJavaObject(value))
+case oi: BinaryObjectInspector =
+  (value: Any, row: MutableRow, ordinal: Int) =
+row.update(ordinal, oi.getPrimitiveJavaObject(value))
 case oi =
   (value: Any, row: MutableRow, ordinal: Int) = row(ordinal) = 
unwrap(value, oi)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4270][SQL] Fix Cast from DateType to DecimalType.

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 60ab80f50 - a6405c5dd


[SPARK-4270][SQL] Fix Cast from DateType to DecimalType.

`Cast` from `DateType` to `DecimalType` throws `NullPointerException`.

Author: Takuya UESHIN ues...@happy-camper.st

Closes #3134 from ueshin/issues/SPARK-4270 and squashes the following commits:

7394e4b [Takuya UESHIN] Fix Cast from DateType to DecimalType.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6405c5d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6405c5d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6405c5d

Branch: refs/heads/master
Commit: a6405c5ddcda112f8efd7d50d8e5f44f78a0fa41
Parents: 60ab80f
Author: Takuya UESHIN ues...@happy-camper.st
Authored: Fri Nov 7 12:30:47 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:30:47 2014 -0800

--
 .../scala/org/apache/spark/sql/catalyst/expressions/Cast.scala | 2 +-
 .../spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a6405c5d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 2200966..55319e7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -281,7 +281,7 @@ case class Cast(child: Expression, dataType: DataType) 
extends UnaryExpression w
 case BooleanType =
   buildCast[Boolean](_, b = changePrecision(if (b) Decimal(1) else 
Decimal(0), target))
 case DateType =
-  buildCast[Date](_, d = changePrecision(null, target)) // date can't 
cast to decimal in Hive
+  buildCast[Date](_, d = null) // date can't cast to decimal in Hive
 case TimestampType =
   // Note that we lose precision here.
   buildCast[Timestamp](_, t = 
changePrecision(Decimal(timestampToDouble(t)), target))

http://git-wip-us.apache.org/repos/asf/spark/blob/a6405c5d/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
index 6bfa0db..918996f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
@@ -412,6 +412,8 @@ class ExpressionEvaluationSuite extends FunSuite {
 checkEvaluation(Cast(d, LongType), null)
 checkEvaluation(Cast(d, FloatType), null)
 checkEvaluation(Cast(d, DoubleType), null)
+checkEvaluation(Cast(d, DecimalType.Unlimited), null)
+checkEvaluation(Cast(d, DecimalType(10, 2)), null)
 checkEvaluation(Cast(d, StringType), 1970-01-01)
 checkEvaluation(Cast(Cast(d, TimestampType), StringType), 1970-01-01 
00:00:00)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4203][SQL] Partition directories in random order when inserting into hive table

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master a6405c5dd - ac70c972a


[SPARK-4203][SQL] Partition directories in random order when inserting into 
hive table

When doing an insert into hive table with partitions the folders written to the 
file system are in a random order instead of the order defined in table 
creation. Seems that the loadPartition method in Hive.java has a 
MapString,String parameter but expects to be called with a map that has a 
defined ordering such as LinkedHashMap. Working on a test but having intillij 
problems

Author: Matthew Taylor matthe...@tbfe.net

Closes #3076 from tbfenet/partition_dir_order_problem and squashes the 
following commits:

f1b9a52 [Matthew Taylor] Comment format fix
bca709f [Matthew Taylor] review changes
0e50f6b [Matthew Taylor] test fix
99f1a31 [Matthew Taylor] partition ordering fix
369e618 [Matthew Taylor] partition ordering fix


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac70c972
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac70c972
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac70c972

Branch: refs/heads/master
Commit: ac70c972a51952f801fd02dd5962c0a0c1aba8f8
Parents: a6405c5
Author: Matthew Taylor matthe...@tbfe.net
Authored: Fri Nov 7 12:53:08 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:53:08 2014 -0800

--
 .../hive/execution/InsertIntoHiveTable.scala| 13 ++--
 .../sql/hive/InsertIntoHiveTableSuite.scala | 34 ++--
 2 files changed, 43 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ac70c972/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
index 74b4e7a..81390f6 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.hive.execution
 
+import java.util
+
 import scala.collection.JavaConversions._
 
 import org.apache.hadoop.hive.common.`type`.HiveVarchar
@@ -203,6 +205,13 @@ case class InsertIntoHiveTable(
 // holdDDLTime will be true when TOK_HOLD_DDLTIME presents in the query as 
a hint.
 val holdDDLTime = false
 if (partition.nonEmpty) {
+
+  // loadPartition call orders directories created on the iteration order 
of the this map
+  val orderedPartitionSpec = new util.LinkedHashMap[String,String]()
+  table.hiveQlTable.getPartCols().foreach{
+entry=
+  
orderedPartitionSpec.put(entry.getName,partitionSpec.get(entry.getName).getOrElse())
+  }
   val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, 
partitionSpec)
   db.validatePartitionNameCharacters(partVals)
   // inheritTableSpecs is set to true. It should be set to false for a 
IMPORT query
@@ -214,7 +223,7 @@ case class InsertIntoHiveTable(
 db.loadDynamicPartitions(
   outputPath,
   qualifiedTableName,
-  partitionSpec,
+  orderedPartitionSpec,
   overwrite,
   numDynamicPartitions,
   holdDDLTime,
@@ -224,7 +233,7 @@ case class InsertIntoHiveTable(
 db.loadPartition(
   outputPath,
   qualifiedTableName,
-  partitionSpec,
+  orderedPartitionSpec,
   overwrite,
   holdDDLTime,
   inheritTableSpecs,

http://git-wip-us.apache.org/repos/asf/spark/blob/ac70c972/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
index 18dc937..5dbfb92 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
@@ -17,8 +17,10 @@
 
 package org.apache.spark.sql.hive
 
-import org.apache.spark.sql.QueryTest
-import org.apache.spark.sql._
+import java.io.File
+
+import com.google.common.io.Files
+import org.apache.spark.sql.{QueryTest, _}
 import org.apache.spark.sql.hive.test.TestHive
 
 /* Implicits */
@@ -91,4 +93,32 @@ class InsertIntoHiveTableSuite extends QueryTest {
 
 sql(DROP TABLE hiveTableWithMapValue)
   }
+
+  test(SPARK-4203:random partition directory order) {
+

spark git commit: [SPARK-4203][SQL] Partition directories in random order when inserting into hive table

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 684d1f0ec - c96da3676


[SPARK-4203][SQL] Partition directories in random order when inserting into 
hive table

When doing an insert into hive table with partitions the folders written to the 
file system are in a random order instead of the order defined in table 
creation. Seems that the loadPartition method in Hive.java has a 
MapString,String parameter but expects to be called with a map that has a 
defined ordering such as LinkedHashMap. Working on a test but having intillij 
problems

Author: Matthew Taylor matthe...@tbfe.net

Closes #3076 from tbfenet/partition_dir_order_problem and squashes the 
following commits:

f1b9a52 [Matthew Taylor] Comment format fix
bca709f [Matthew Taylor] review changes
0e50f6b [Matthew Taylor] test fix
99f1a31 [Matthew Taylor] partition ordering fix
369e618 [Matthew Taylor] partition ordering fix

(cherry picked from commit ac70c972a51952f801fd02dd5962c0a0c1aba8f8)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c96da367
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c96da367
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c96da367

Branch: refs/heads/branch-1.2
Commit: c96da3676c32579d0f97347d35d95353b1d2ef07
Parents: 684d1f0
Author: Matthew Taylor matthe...@tbfe.net
Authored: Fri Nov 7 12:53:08 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:53:32 2014 -0800

--
 .../hive/execution/InsertIntoHiveTable.scala| 13 ++--
 .../sql/hive/InsertIntoHiveTableSuite.scala | 34 ++--
 2 files changed, 43 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c96da367/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
index 74b4e7a..81390f6 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.hive.execution
 
+import java.util
+
 import scala.collection.JavaConversions._
 
 import org.apache.hadoop.hive.common.`type`.HiveVarchar
@@ -203,6 +205,13 @@ case class InsertIntoHiveTable(
 // holdDDLTime will be true when TOK_HOLD_DDLTIME presents in the query as 
a hint.
 val holdDDLTime = false
 if (partition.nonEmpty) {
+
+  // loadPartition call orders directories created on the iteration order 
of the this map
+  val orderedPartitionSpec = new util.LinkedHashMap[String,String]()
+  table.hiveQlTable.getPartCols().foreach{
+entry=
+  
orderedPartitionSpec.put(entry.getName,partitionSpec.get(entry.getName).getOrElse())
+  }
   val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, 
partitionSpec)
   db.validatePartitionNameCharacters(partVals)
   // inheritTableSpecs is set to true. It should be set to false for a 
IMPORT query
@@ -214,7 +223,7 @@ case class InsertIntoHiveTable(
 db.loadDynamicPartitions(
   outputPath,
   qualifiedTableName,
-  partitionSpec,
+  orderedPartitionSpec,
   overwrite,
   numDynamicPartitions,
   holdDDLTime,
@@ -224,7 +233,7 @@ case class InsertIntoHiveTable(
 db.loadPartition(
   outputPath,
   qualifiedTableName,
-  partitionSpec,
+  orderedPartitionSpec,
   overwrite,
   holdDDLTime,
   inheritTableSpecs,

http://git-wip-us.apache.org/repos/asf/spark/blob/c96da367/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
index 18dc937..5dbfb92 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
@@ -17,8 +17,10 @@
 
 package org.apache.spark.sql.hive
 
-import org.apache.spark.sql.QueryTest
-import org.apache.spark.sql._
+import java.io.File
+
+import com.google.common.io.Files
+import org.apache.spark.sql.{QueryTest, _}
 import org.apache.spark.sql.hive.test.TestHive
 
 /* Implicits */
@@ -91,4 +93,32 @@ class InsertIntoHiveTableSuite extends QueryTest {

spark git commit: [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 c96da3676 - 47bd8f302


[SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC

select * from src, get the wrong result set as follows:
```
...
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
...

```

Author: wangfei wangf...@huawei.com

Closes #3149 from scwf/SPARK-4292 and squashes the following commits:

1574a43 [wangfei] using result.collect
8b2d845 [wangfei] adding test
f64eddf [wangfei] result set iter bug

(cherry picked from commit d6e55524437026c0c76addeba8f99249a8316716)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47bd8f30
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47bd8f30
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47bd8f30

Branch: refs/heads/branch-1.2
Commit: 47bd8f3020149a009f605e8390c2c28f3f835191
Parents: c96da36
Author: wangfei wangf...@huawei.com
Authored: Fri Nov 7 12:55:11 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:55:40 2014 -0800

--
 .../thriftserver/HiveThriftServer2Suite.scala   | 23 
 .../spark/sql/hive/thriftserver/Shim12.scala|  5 ++---
 .../spark/sql/hive/thriftserver/Shim13.scala|  5 ++---
 3 files changed, 27 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/47bd8f30/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
index 65d910a..bba29b2 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
@@ -267,4 +267,27 @@ class HiveThriftServer2Suite extends FunSuite with Logging 
{
   assert(resultSet.getString(1) === 
sspark.sql.hive.version=${HiveShim.version})
 }
   }
+
+  test(SPARK-4292 regression: result set iterator issue) {
+withJdbcStatement() { statement =
+  val dataFilePath =
+
Thread.currentThread().getContextClassLoader.getResource(data/files/small_kv.txt)
+
+  val queries = Seq(
+DROP TABLE IF EXISTS test_4292,
+CREATE TABLE test_4292(key INT, val STRING),
+sLOAD DATA LOCAL INPATH '$dataFilePath' OVERWRITE INTO TABLE 
test_4292)
+
+  queries.foreach(statement.execute)
+
+  val resultSet = statement.executeQuery(SELECT key FROM test_4292)
+
+  Seq(238, 86, 311, 27, 165).foreach { key =
+resultSet.next()
+assert(resultSet.getInt(1) == key)
+  }
+
+  statement.executeQuery(DROP TABLE IF EXISTS test_4292)
+}
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/47bd8f30/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
--
diff --git 
a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
 
b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
index 8077d0e..e3ba991 100644
--- 
a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
+++ 
b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
@@ -202,13 +202,12 @@ private[hive] class SparkExecuteStatementOperation(
 hiveContext.sparkContext.setLocalProperty(spark.scheduler.pool, pool)
   }
   iter = {
-val resultRdd = result.queryExecution.toRdd
 val useIncrementalCollect =
   hiveContext.getConf(spark.sql.thriftServer.incrementalCollect, 
false).toBoolean
 if (useIncrementalCollect) {
-  resultRdd.toLocalIterator
+  result.toLocalIterator
 } else {
-  resultRdd.collect().iterator
+  result.collect().iterator
 }
   }
   dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray

spark git commit: [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC

2014-11-07 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master ac70c972a - d6e555244


[SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC

select * from src, get the wrong result set as follows:
```
...
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 309  | val_309  |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
| 97   | val_97   |
...

```

Author: wangfei wangf...@huawei.com

Closes #3149 from scwf/SPARK-4292 and squashes the following commits:

1574a43 [wangfei] using result.collect
8b2d845 [wangfei] adding test
f64eddf [wangfei] result set iter bug


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6e55524
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6e55524
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6e55524

Branch: refs/heads/master
Commit: d6e55524437026c0c76addeba8f99249a8316716
Parents: ac70c97
Author: wangfei wangf...@huawei.com
Authored: Fri Nov 7 12:55:11 2014 -0800
Committer: Michael Armbrust mich...@databricks.com
Committed: Fri Nov 7 12:55:11 2014 -0800

--
 .../thriftserver/HiveThriftServer2Suite.scala   | 23 
 .../spark/sql/hive/thriftserver/Shim12.scala|  5 ++---
 .../spark/sql/hive/thriftserver/Shim13.scala|  5 ++---
 3 files changed, 27 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d6e55524/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
index 65d910a..bba29b2 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
@@ -267,4 +267,27 @@ class HiveThriftServer2Suite extends FunSuite with Logging 
{
   assert(resultSet.getString(1) === 
sspark.sql.hive.version=${HiveShim.version})
 }
   }
+
+  test(SPARK-4292 regression: result set iterator issue) {
+withJdbcStatement() { statement =
+  val dataFilePath =
+
Thread.currentThread().getContextClassLoader.getResource(data/files/small_kv.txt)
+
+  val queries = Seq(
+DROP TABLE IF EXISTS test_4292,
+CREATE TABLE test_4292(key INT, val STRING),
+sLOAD DATA LOCAL INPATH '$dataFilePath' OVERWRITE INTO TABLE 
test_4292)
+
+  queries.foreach(statement.execute)
+
+  val resultSet = statement.executeQuery(SELECT key FROM test_4292)
+
+  Seq(238, 86, 311, 27, 165).foreach { key =
+resultSet.next()
+assert(resultSet.getInt(1) == key)
+  }
+
+  statement.executeQuery(DROP TABLE IF EXISTS test_4292)
+}
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/d6e55524/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
--
diff --git 
a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
 
b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
index 8077d0e..e3ba991 100644
--- 
a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
+++ 
b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala
@@ -202,13 +202,12 @@ private[hive] class SparkExecuteStatementOperation(
 hiveContext.sparkContext.setLocalProperty(spark.scheduler.pool, pool)
   }
   iter = {
-val resultRdd = result.queryExecution.toRdd
 val useIncrementalCollect =
   hiveContext.getConf(spark.sql.thriftServer.incrementalCollect, 
false).toBoolean
 if (useIncrementalCollect) {
-  resultRdd.toLocalIterator
+  result.toLocalIterator
 } else {
-  resultRdd.collect().iterator
+  result.collect().iterator
 }
   }
   dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray

http://git-wip-us.apache.org/repos/asf/spark/blob/d6e55524/sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala
--
diff --git

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master d6e555244 - 7c9ec529a


Update JavaCustomReceiver.java

æ°ç»ä¸æ è¶ç

Author: xiao321 1042460...@qq.com

Closes #3153 from xiao321/patch-1 and squashes the following commits:

0ed17b5 [xiao321] Update JavaCustomReceiver.java


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c9ec529
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c9ec529
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c9ec529

Branch: refs/heads/master
Commit: 7c9ec529a3483fab48f728481dd1d3663369e50a
Parents: d6e5552
Author: xiao321 1042460...@qq.com
Authored: Fri Nov 7 12:56:49 2014 -0800
Committer: Tathagata Das tathagata.das1...@gmail.com
Committed: Fri Nov 7 12:56:49 2014 -0800

--
 .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7c9ec529/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
index 981bc4f..99df259 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
@@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString {
 // Create a input stream with the custom receiver on target ip:port and 
count the
 // words in input stream of \n delimited text (eg. generated by 'nc')
 JavaReceiverInputDStreamString lines = ssc.receiverStream(
-  new JavaCustomReceiver(args[1], Integer.parseInt(args[2])));
+  new JavaCustomReceiver(args[0], Integer.parseInt(args[1])));
 JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, 
String() {
   @Override
   public IterableString call(String x) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 47bd8f302 - 8cefb63c1


Update JavaCustomReceiver.java

æ°ç»ä¸æ è¶ç

Author: xiao321 1042460...@qq.com

Closes #3153 from xiao321/patch-1 and squashes the following commits:

0ed17b5 [xiao321] Update JavaCustomReceiver.java

(cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a)
Signed-off-by: Tathagata Das tathagata.das1...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cefb63c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cefb63c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cefb63c

Branch: refs/heads/branch-1.2
Commit: 8cefb63c122e7c7cf4af959f9606f4491148d9f4
Parents: 47bd8f3
Author: xiao321 1042460...@qq.com
Authored: Fri Nov 7 12:56:49 2014 -0800
Committer: Tathagata Das tathagata.das1...@gmail.com
Committed: Fri Nov 7 12:57:17 2014 -0800

--
 .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8cefb63c/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
index 981bc4f..99df259 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
@@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString {
 // Create a input stream with the custom receiver on target ip:port and 
count the
 // words in input stream of \n delimited text (eg. generated by 'nc')
 JavaReceiverInputDStreamString lines = ssc.receiverStream(
-  new JavaCustomReceiver(args[1], Integer.parseInt(args[2])));
+  new JavaCustomReceiver(args[0], Integer.parseInt(args[1])));
 JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, 
String() {
   @Override
   public IterableString call(String x) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 0a40eac25 - 4fb26df87


Update JavaCustomReceiver.java

æ°ç»ä¸æ è¶ç

Author: xiao321 1042460...@qq.com

Closes #3153 from xiao321/patch-1 and squashes the following commits:

0ed17b5 [xiao321] Update JavaCustomReceiver.java

(cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a)
Signed-off-by: Tathagata Das tathagata.das1...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4fb26df8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4fb26df8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4fb26df8

Branch: refs/heads/branch-1.1
Commit: 4fb26df8748ea7dda11db8c2b99f4b08da25bb4e
Parents: 0a40eac
Author: xiao321 1042460...@qq.com
Authored: Fri Nov 7 12:56:49 2014 -0800
Committer: Tathagata Das tathagata.das1...@gmail.com
Committed: Fri Nov 7 12:57:38 2014 -0800

--
 .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4fb26df8/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
index 5622df5..f92615d 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
@@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString {
 // Create a input stream with the custom receiver on target ip:port and 
count the
 // words in input stream of \n delimited text (eg. generated by 'nc')
 JavaReceiverInputDStreamString lines = ssc.receiverStream(
-  new JavaCustomReceiver(args[1], Integer.parseInt(args[2])));
+  new JavaCustomReceiver(args[0], Integer.parseInt(args[1])));
 JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, 
String() {
   @Override
   public IterableString call(String x) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.0 76c20cac9 - 18c8c3833


Update JavaCustomReceiver.java

æ°ç»ä¸æ è¶ç

Author: xiao321 1042460...@qq.com

Closes #3153 from xiao321/patch-1 and squashes the following commits:

0ed17b5 [xiao321] Update JavaCustomReceiver.java

(cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a)
Signed-off-by: Tathagata Das tathagata.das1...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/18c8c383
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/18c8c383
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/18c8c383

Branch: refs/heads/branch-1.0
Commit: 18c8c3833d8508ff1ac1cf2c02060c41e46908c1
Parents: 76c20ca
Author: xiao321 1042460...@qq.com
Authored: Fri Nov 7 12:56:49 2014 -0800
Committer: Tathagata Das tathagata.das1...@gmail.com
Committed: Fri Nov 7 12:58:02 2014 -0800

--
 .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/18c8c383/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
index 5622df5..f92615d 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
@@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString {
 // Create a input stream with the custom receiver on target ip:port and 
count the
 // words in input stream of \n delimited text (eg. generated by 'nc')
 JavaReceiverInputDStreamString lines = ssc.receiverStream(
-  new JavaCustomReceiver(args[1], Integer.parseInt(args[2])));
+  new JavaCustomReceiver(args[0], Integer.parseInt(args[1])));
 JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, 
String() {
   @Override
   public IterableString call(String x) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: MAINTENANCE: Automated closing of pull requests.

2014-11-07 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 7c9ec529a - 5923dd986


MAINTENANCE: Automated closing of pull requests.

This commit exists to close the following pull requests on Github:

Closes #3016 (close requested by 'andrewor14')
Closes #2798 (close requested by 'andrewor14')
Closes #2864 (close requested by 'andrewor14')
Closes #3154 (close requested by 'JoshRosen')
Closes #3156 (close requested by 'JoshRosen')
Closes #214 (close requested by 'kayousterhout')
Closes #2584 (close requested by 'andrewor14')


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5923dd98
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5923dd98
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5923dd98

Branch: refs/heads/master
Commit: 5923dd986ba26d0fcc8707dd8d16863f1c1005cb
Parents: 7c9ec52
Author: Patrick Wendell pwend...@gmail.com
Authored: Fri Nov 7 13:08:25 2014 -0800
Committer: Patrick Wendell pwend...@gmail.com
Committed: Fri Nov 7 13:08:25 2014 -0800

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD

2014-11-07 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 5923dd986 - 777910979


[SPARK-4304] [PySpark] Fix sort on empty RDD

This PR fix sortBy()/sortByKey() on empty RDD.

This should be back ported into 1.1/1.2

Author: Davies Liu dav...@databricks.com

Closes #3162 from davies/fix_sort and squashes the following commits:

84f64b7 [Davies Liu] add tests
52995b5 [Davies Liu] fix sortByKey() on empty RDD


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/77791097
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/77791097
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/77791097

Branch: refs/heads/master
Commit: 7779109796c90d789464ab0be35917f963bbe867
Parents: 5923dd9
Author: Davies Liu dav...@databricks.com
Authored: Fri Nov 7 20:53:03 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Fri Nov 7 20:53:03 2014 -0800

--
 python/pyspark/rdd.py   | 2 ++
 python/pyspark/tests.py | 3 +++
 2 files changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/77791097/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 879655d..08d0474 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -521,6 +521,8 @@ class RDD(object):
 # the key-space into bins such that the bins have roughly the same
 # number of (key, value) pairs falling into them
 rddSize = self.count()
+if not rddSize:
+return self  # empty RDD
 maxSampleSize = numPartitions * 20.0  # constant from Spark's 
RangePartitioner
 fraction = min(maxSampleSize / max(rddSize, 1), 1.0)
 samples = self.sample(False, fraction, 1).map(lambda (k, v): 
k).collect()

http://git-wip-us.apache.org/repos/asf/spark/blob/77791097/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 9f625c5..491e445 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -649,6 +649,9 @@ class RDDTests(ReusedPySparkTestCase):
 self.assertEquals(result.getNumPartitions(), 5)
 self.assertEquals(result.count(), 3)
 
+def test_sort_on_empty_rdd(self):
+self.assertEqual([], self.sc.parallelize(zip([], 
[])).sortByKey().collect())
+
 def test_sample(self):
 rdd = self.sc.parallelize(range(0, 100), 4)
 wo = rdd.sample(False, 0.1, 2).collect()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD

2014-11-07 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 8cefb63c1 - 3b07c483a


[SPARK-4304] [PySpark] Fix sort on empty RDD

This PR fix sortBy()/sortByKey() on empty RDD.

This should be back ported into 1.1/1.2

Author: Davies Liu dav...@databricks.com

Closes #3162 from davies/fix_sort and squashes the following commits:

84f64b7 [Davies Liu] add tests
52995b5 [Davies Liu] fix sortByKey() on empty RDD

(cherry picked from commit 7779109796c90d789464ab0be35917f963bbe867)
Signed-off-by: Josh Rosen joshro...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b07c483
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b07c483
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b07c483

Branch: refs/heads/branch-1.2
Commit: 3b07c483aa98965ac9dc8fdcc40e593e4edb97fd
Parents: 8cefb63
Author: Davies Liu dav...@databricks.com
Authored: Fri Nov 7 20:53:03 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Fri Nov 7 20:53:34 2014 -0800

--
 python/pyspark/rdd.py   | 2 ++
 python/pyspark/tests.py | 3 +++
 2 files changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3b07c483/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 879655d..08d0474 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -521,6 +521,8 @@ class RDD(object):
 # the key-space into bins such that the bins have roughly the same
 # number of (key, value) pairs falling into them
 rddSize = self.count()
+if not rddSize:
+return self  # empty RDD
 maxSampleSize = numPartitions * 20.0  # constant from Spark's 
RangePartitioner
 fraction = min(maxSampleSize / max(rddSize, 1), 1.0)
 samples = self.sample(False, fraction, 1).map(lambda (k, v): 
k).collect()

http://git-wip-us.apache.org/repos/asf/spark/blob/3b07c483/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 9f625c5..491e445 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -649,6 +649,9 @@ class RDDTests(ReusedPySparkTestCase):
 self.assertEquals(result.getNumPartitions(), 5)
 self.assertEquals(result.count(), 3)
 
+def test_sort_on_empty_rdd(self):
+self.assertEqual([], self.sc.parallelize(zip([], 
[])).sortByKey().collect())
+
 def test_sample(self):
 rdd = self.sc.parallelize(range(0, 100), 4)
 wo = rdd.sample(False, 0.1, 2).collect()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD

2014-11-07 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 4fb26df87 - 4895f6544


[SPARK-4304] [PySpark] Fix sort on empty RDD

This PR fix sortBy()/sortByKey() on empty RDD.

This should be back ported into 1.1/1.2

Author: Davies Liu dav...@databricks.com

Closes #3162 from davies/fix_sort and squashes the following commits:

84f64b7 [Davies Liu] add tests
52995b5 [Davies Liu] fix sortByKey() on empty RDD

(cherry picked from commit 7779109796c90d789464ab0be35917f963bbe867)
Signed-off-by: Josh Rosen joshro...@databricks.com

Conflicts:
python/pyspark/tests.py


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4895f654
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4895f654
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4895f654

Branch: refs/heads/branch-1.1
Commit: 4895f65447aa2338729fccb5200efa29a9d62163
Parents: 4fb26df
Author: Davies Liu dav...@databricks.com
Authored: Fri Nov 7 20:53:03 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Fri Nov 7 20:55:12 2014 -0800

--
 python/pyspark/rdd.py   | 2 ++
 python/pyspark/tests.py | 3 +++
 2 files changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4895f654/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 3f81550..ac8ceff 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -598,6 +598,8 @@ class RDD(object):
 # the key-space into bins such that the bins have roughly the same
 # number of (key, value) pairs falling into them
 rddSize = self.count()
+if not rddSize:
+return self  # empty RDD
 maxSampleSize = numPartitions * 20.0  # constant from Spark's 
RangePartitioner
 fraction = min(maxSampleSize / max(rddSize, 1), 1.0)
 samples = self.sample(False, fraction, 1).map(lambda (k, v): 
k).collect()

http://git-wip-us.apache.org/repos/asf/spark/blob/4895f654/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 5cea1b0..b4a9c59 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -470,6 +470,9 @@ class TestRDDFunctions(PySparkTestCase):
 self.assertEquals(([1, b], [5]), rdd.histogram(1))
 self.assertRaises(TypeError, lambda: rdd.histogram(2))
 
+def test_sort_on_empty_rdd(self):
+self.assertEqual([], self.sc.parallelize(zip([], 
[])).sortByKey().collect())
+
 def test_sample(self):
 rdd = self.sc.parallelize(range(0, 100), 4)
 wo = rdd.sample(False, 0.1, 2).collect()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD (1.0 branch)

2014-11-07 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.0 18c8c3833 - d4aed266d


[SPARK-4304] [PySpark] Fix sort on empty RDD (1.0 branch)

This PR fix sortBy()/sortByKey() on empty RDD.

This should be back ported into 1.0

Author: Davies Liu dav...@databricks.com

Closes #3163 from davies/fix_sort_1.0 and squashes the following commits:

9be984f [Davies Liu] fix sort on empty RDD


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4aed266
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4aed266
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4aed266

Branch: refs/heads/branch-1.0
Commit: d4aed266d3db3cb3aea711f30aa058c74bfe60a5
Parents: 18c8c38
Author: Davies Liu dav...@databricks.com
Authored: Fri Nov 7 20:57:56 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Fri Nov 7 20:57:56 2014 -0800

--
 python/pyspark/rdd.py   | 2 ++
 python/pyspark/tests.py | 3 +++
 2 files changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d4aed266/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 368ab50..57c2cd7 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -496,6 +496,8 @@ class RDD(object):
 # number of (key, value) pairs falling into them
 if numPartitions  1:
 rddSize = self.count()
+if not rddSize:
+return self
 maxSampleSize = numPartitions * 20.0 # constant from Spark's 
RangePartitioner
 fraction = min(maxSampleSize / max(rddSize, 1), 1.0)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d4aed266/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 45284ee..8f5b48d 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -198,6 +198,9 @@ class TestRDDFunctions(PySparkTestCase):
 os.unlink(tempFile.name)
 self.assertRaises(Exception, lambda: filtered_data.count())
 
+def test_sort_on_empty_rdd(self):
+self.assertEqual([], self.sc.parallelize(zip([], 
[])).sortByKey().collect())
+
 def test_itemgetter(self):
 rdd = self.sc.parallelize([range(10)])
 from operator import itemgetter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API

2014-11-07 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 777910979 - 7e9d97567


[MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API

SPARK-1553 added alternating nonnegative least squares to MLLib, however it's 
not possible to access it via the python API.  This pull request resolves that.

Author: Michelangelo D'Agostino mdagost...@civisanalytics.com

Closes #3095 from mdagost/python_nmf and squashes the following commits:

a6743ad [Michelangelo D'Agostino] Use setters instead of static methods in 
PythonMLLibAPI.  Remove the new static methods I added.  Set seed in tests.  
Change ratings to ratingsRDD in both train and trainImplicit for consistency.
7cffd39 [Michelangelo D'Agostino] Swapped nonnegative and seed in a few more 
places.
3fdc851 [Michelangelo D'Agostino] Moved seed to the end of the python parameter 
list.
bdcc154 [Michelangelo D'Agostino] Change seed type to java.lang.Long so that it 
can handle null.
cedf043 [Michelangelo D'Agostino] Added in ability to set the seed from python 
and made that play nice with the nonnegative changes.  Also made the python ALS 
tests more exact.
a72fdc9 [Michelangelo D'Agostino] Expose nonnegative ALS in the python API.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e9d9756
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e9d9756
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e9d9756

Branch: refs/heads/master
Commit: 7e9d975676d56ace0e84c2200137e4cd4eba074a
Parents: 7779109
Author: Michelangelo D'Agostino mdagost...@civisanalytics.com
Authored: Fri Nov 7 22:53:01 2014 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Nov 7 22:53:01 2014 -0800

--
 .../spark/mllib/api/python/PythonMLLibAPI.scala | 39 ---
 python/pyspark/mllib/recommendation.py  | 40 
 2 files changed, 58 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7e9d9756/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
index d832ae3..70d7138 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
@@ -275,12 +275,25 @@ class PythonMLLibAPI extends Serializable {
* the Py4J documentation.
*/
   def trainALSModel(
-  ratings: JavaRDD[Rating],
+  ratingsJRDD: JavaRDD[Rating],
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  nonnegative: Boolean,
+  seed: java.lang.Long): MatrixFactorizationModel = {
+
+val als = new ALS()
+  .setRank(rank)
+  .setIterations(iterations)
+  .setLambda(lambda)
+  .setBlocks(blocks)
+  .setNonnegative(nonnegative)
+
+if (seed != null) als.setSeed(seed)
+
+val model =  als.run(ratingsJRDD.rdd)
+new MatrixFactorizationModelWrapper(model)
   }
 
   /**
@@ -295,9 +308,23 @@ class PythonMLLibAPI extends Serializable {
   iterations: Int,
   lambda: Double,
   blocks: Int,
-  alpha: Double): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(
-  ALS.trainImplicit(ratingsJRDD.rdd, rank, iterations, lambda, blocks, 
alpha))
+  alpha: Double,
+  nonnegative: Boolean,
+  seed: java.lang.Long): MatrixFactorizationModel = {
+
+val als = new ALS()
+  .setImplicitPrefs(true)
+  .setRank(rank)
+  .setIterations(iterations)
+  .setLambda(lambda)
+  .setBlocks(blocks)
+  .setAlpha(alpha)
+  .setNonnegative(nonnegative)
+
+if (seed != null) als.setSeed(seed)
+
+val model =  als.run(ratingsJRDD.rdd)
+new MatrixFactorizationModelWrapper(model)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/7e9d9756/python/pyspark/mllib/recommendation.py
--
diff --git a/python/pyspark/mllib/recommendation.py 
b/python/pyspark/mllib/recommendation.py
index e8b9984..e26b152 100644
--- a/python/pyspark/mllib/recommendation.py
+++ b/python/pyspark/mllib/recommendation.py
@@ -44,31 +44,39 @@ class MatrixFactorizationModel(JavaModelWrapper):
  r2 = (1, 2, 2.0)
  r3 = (2, 1, 2.0)
  ratings = sc.parallelize([r1, r2, r3])
- model = ALS.trainImplicit(ratings, 1)
- model.predict(2,2) is not None
-True
+ model = ALS.trainImplicit(ratings, 1, seed=10)
+

spark git commit: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API

2014-11-07 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 3b07c483a - 427d7911f


[MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API

SPARK-1553 added alternating nonnegative least squares to MLLib, however it's 
not possible to access it via the python API.  This pull request resolves that.

Author: Michelangelo D'Agostino mdagost...@civisanalytics.com

Closes #3095 from mdagost/python_nmf and squashes the following commits:

a6743ad [Michelangelo D'Agostino] Use setters instead of static methods in 
PythonMLLibAPI.  Remove the new static methods I added.  Set seed in tests.  
Change ratings to ratingsRDD in both train and trainImplicit for consistency.
7cffd39 [Michelangelo D'Agostino] Swapped nonnegative and seed in a few more 
places.
3fdc851 [Michelangelo D'Agostino] Moved seed to the end of the python parameter 
list.
bdcc154 [Michelangelo D'Agostino] Change seed type to java.lang.Long so that it 
can handle null.
cedf043 [Michelangelo D'Agostino] Added in ability to set the seed from python 
and made that play nice with the nonnegative changes.  Also made the python ALS 
tests more exact.
a72fdc9 [Michelangelo D'Agostino] Expose nonnegative ALS in the python API.

(cherry picked from commit 7e9d975676d56ace0e84c2200137e4cd4eba074a)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/427d7911
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/427d7911
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/427d7911

Branch: refs/heads/branch-1.2
Commit: 427d7911f527e00e75dec0498b4bbdbe164db7ca
Parents: 3b07c48
Author: Michelangelo D'Agostino mdagost...@civisanalytics.com
Authored: Fri Nov 7 22:53:01 2014 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Nov 7 22:53:22 2014 -0800

--
 .../spark/mllib/api/python/PythonMLLibAPI.scala | 39 ---
 python/pyspark/mllib/recommendation.py  | 40 
 2 files changed, 58 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/427d7911/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
index d832ae3..70d7138 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
@@ -275,12 +275,25 @@ class PythonMLLibAPI extends Serializable {
* the Py4J documentation.
*/
   def trainALSModel(
-  ratings: JavaRDD[Rating],
+  ratingsJRDD: JavaRDD[Rating],
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  nonnegative: Boolean,
+  seed: java.lang.Long): MatrixFactorizationModel = {
+
+val als = new ALS()
+  .setRank(rank)
+  .setIterations(iterations)
+  .setLambda(lambda)
+  .setBlocks(blocks)
+  .setNonnegative(nonnegative)
+
+if (seed != null) als.setSeed(seed)
+
+val model =  als.run(ratingsJRDD.rdd)
+new MatrixFactorizationModelWrapper(model)
   }
 
   /**
@@ -295,9 +308,23 @@ class PythonMLLibAPI extends Serializable {
   iterations: Int,
   lambda: Double,
   blocks: Int,
-  alpha: Double): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(
-  ALS.trainImplicit(ratingsJRDD.rdd, rank, iterations, lambda, blocks, 
alpha))
+  alpha: Double,
+  nonnegative: Boolean,
+  seed: java.lang.Long): MatrixFactorizationModel = {
+
+val als = new ALS()
+  .setImplicitPrefs(true)
+  .setRank(rank)
+  .setIterations(iterations)
+  .setLambda(lambda)
+  .setBlocks(blocks)
+  .setAlpha(alpha)
+  .setNonnegative(nonnegative)
+
+if (seed != null) als.setSeed(seed)
+
+val model =  als.run(ratingsJRDD.rdd)
+new MatrixFactorizationModelWrapper(model)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/427d7911/python/pyspark/mllib/recommendation.py
--
diff --git a/python/pyspark/mllib/recommendation.py 
b/python/pyspark/mllib/recommendation.py
index e8b9984..e26b152 100644
--- a/python/pyspark/mllib/recommendation.py
+++ b/python/pyspark/mllib/recommendation.py
@@ -44,31 +44,39 @@ class MatrixFactorizationModel(JavaModelWrapper):
  r2 = (1, 2, 2.0)
  r3 = (2, 1, 2.0)
  ratings = sc.parallelize([r1, r2, r3])
- model =

spark git commit: [SPARK-4291][Build] Rename network module projects

2014-11-07 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 427d7911f - fc51de339


[SPARK-4291][Build] Rename network module projects

The names of the recently introduced network modules are inconsistent with 
those of the other modules in the project. We should just drop the Code 
suffix since it doesn't sacrifice any meaning, especially before they get into 
an official release.

```
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Project Common Network Code
[INFO] Spark Project Shuffle Streaming Service Code
[INFO] Spark Project Core
[INFO] Spark Project Bagel
[INFO] Spark Project GraphX
[INFO] Spark Project Streaming
[INFO] Spark Project Catalyst
[INFO] Spark Project SQL
[INFO] Spark Project ML Library
[INFO] Spark Project Tools
[INFO] Spark Project Hive
[INFO] Spark Project REPL
[INFO] Spark Project YARN Parent POM
[INFO] Spark Project YARN Stable API
[INFO] Spark Project Assembly
[INFO] Spark Project External Twitter
[INFO] Spark Project External Kafka
[INFO] Spark Project External Flume Sink
[INFO] Spark Project External Flume
[INFO] Spark Project External ZeroMQ
[INFO] Spark Project External MQTT
[INFO] Spark Project Examples
[INFO] Spark Project Yarn Shuffle Service Code
```

Author: Andrew Or and...@databricks.com

Closes #3148 from andrewor14/build-drop-code and squashes the following commits:

eac839b [Andrew Or] Network - Networking
d01ad47 [Andrew Or] Rename network module project names

(cherry picked from commit 7afc8564f33eb2868f458f85046f59a51b516ed6)
Signed-off-by: Patrick Wendell pwend...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc51de33
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc51de33
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc51de33

Branch: refs/heads/branch-1.2
Commit: fc51de3395f25983052ae9d3c5c17891f6e6b8a7
Parents: 427d791
Author: Andrew Or and...@databricks.com
Authored: Fri Nov 7 23:16:13 2014 -0800
Committer: Patrick Wendell pwend...@gmail.com
Committed: Fri Nov 7 23:16:38 2014 -0800

--
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fc51de33/network/common/pom.xml
--
diff --git a/network/common/pom.xml b/network/common/pom.xml
index 6144548..8b24ebf 100644
--- a/network/common/pom.xml
+++ b/network/common/pom.xml
@@ -29,7 +29,7 @@
   groupIdorg.apache.spark/groupId
   artifactIdspark-network-common_2.10/artifactId
   packagingjar/packaging
-  nameSpark Project Common Network Code/name
+  nameSpark Project Networking/name
   urlhttp://spark.apache.org//url
   properties
 sbt.project.namenetwork-common/sbt.project.name

http://git-wip-us.apache.org/repos/asf/spark/blob/fc51de33/network/shuffle/pom.xml
--
diff --git a/network/shuffle/pom.xml b/network/shuffle/pom.xml
index fe5681d..27c8467 100644
--- a/network/shuffle/pom.xml
+++ b/network/shuffle/pom.xml
@@ -29,7 +29,7 @@
   groupIdorg.apache.spark/groupId
   artifactIdspark-network-shuffle_2.10/artifactId
   packagingjar/packaging
-  nameSpark Project Shuffle Streaming Service Code/name
+  nameSpark Project Shuffle Streaming Service/name
   urlhttp://spark.apache.org//url
   properties
 sbt.project.namenetwork-shuffle/sbt.project.name

http://git-wip-us.apache.org/repos/asf/spark/blob/fc51de33/network/yarn/pom.xml
--
diff --git a/network/yarn/pom.xml b/network/yarn/pom.xml
index e60d8c1..6e6f6f3 100644
--- a/network/yarn/pom.xml
+++ b/network/yarn/pom.xml
@@ -29,7 +29,7 @@
   groupIdorg.apache.spark/groupId
   artifactIdspark-network-yarn_2.10/artifactId
   packagingjar/packaging
-  nameSpark Project Yarn Shuffle Service Code/name
+  nameSpark Project YARN Shuffle Service/name
   urlhttp://spark.apache.org//url
   properties
 sbt.project.namenetwork-yarn/sbt.project.name


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

47 matches

Mail list logo